[PATCH] Add thread safety to cRingBufferLinear

List overview All Threads
Download

newer

older

Trying to run satip w/ Geniatech...

[PATCH] Fix cThread related race...

Patrick Lerda

2 Feb 2023 2 Feb '23

11:16 p.m.

Beside preventing crashes with vdr-2.6.3 this is required to get vdr to work properly with the gcc thread sanitizer. --- ringbuffer.c | 18 +++++++++++++++++- ringbuffer.h | 3 +++ 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/ringbuffer.c b/ringbuffer.c index 902c887..1c24df2 100644 --- a/ringbuffer.c +++ b/ringbuffer.c @@ -210,12 +210,16 @@ int cRingBufferLinear::DataReady(const uchar *Data, int Count)

int cRingBufferLinear::Available(void) { + Lock(); int diff = head - tail; - return (diff >= 0) ? diff : Size() + diff - margin; + int available = (diff >= 0) ? diff : Size() + diff - margin; + Unlock(); + return available; }

void cRingBufferLinear::Clear(void) { + Lock(); int Head = head; tail = Head; #ifdef DEBUGRINGBUFFERS @@ -224,11 +228,13 @@ void cRingBufferLinear::Clear(void) lastPut = lastGet = -1; #endif maxFill = 0; + Unlock(); EnablePut(); }

int cRingBufferLinear::Read(int FileHandle, int Max) { + Lock(); int Tail = tail; int diff = Tail - head; int free = (diff > 0) ? diff - 1 : Size() - head; @@ -259,6 +265,7 @@ int cRingBufferLinear::Read(int FileHandle, int Max) lastHead = head; lastPut = Count; #endif + Unlock(); EnableGet(); if (free == 0) WaitForPut(); @@ -267,6 +274,7 @@ int cRingBufferLinear::Read(int FileHandle, int Max)

int cRingBufferLinear::Read(cUnbufferedFile *File, int Max) { + Lock(); int Tail = tail; int diff = Tail - head; int free = (diff > 0) ? diff - 1 : Size() - head; @@ -297,6 +305,7 @@ int cRingBufferLinear::Read(cUnbufferedFile *File, int Max) lastHead = head; lastPut = Count; #endif + Unlock(); EnableGet(); if (free == 0) WaitForPut(); @@ -306,6 +315,7 @@ int cRingBufferLinear::Read(cUnbufferedFile *File, int Max) int cRingBufferLinear::Put(const uchar *Data, int Count) { if (Count > 0) { + Lock(); int Tail = tail; int rest = Size() - head; int diff = Tail - head; @@ -336,6 +346,7 @@ int cRingBufferLinear::Put(const uchar *Data, int Count) lastHead = head; lastPut = Count; #endif + Unlock(); EnableGet(); if (Count == 0) WaitForPut(); @@ -345,6 +356,7 @@ int cRingBufferLinear::Put(const uchar *Data, int Count)

uchar *cRingBufferLinear::Get(int &Count) { + Lock(); int Head = head; if (getThreadTid <= 0) getThreadTid = cThread::ThreadId(); @@ -362,14 +374,17 @@ uchar *cRingBufferLinear::Get(int &Count) uchar *p = buffer + tail; if ((cont = DataReady(p, cont)) > 0) { Count = gotten = cont; + Unlock(); return p; } + Unlock(); WaitForGet(); return NULL; }

void cRingBufferLinear::Del(int Count) { + Lock(); if (Count > gotten) { esyslog("ERROR: invalid Count in cRingBufferLinear::Del: %d (limited to %d)", Count, gotten); Count = gotten; @@ -387,6 +402,7 @@ void cRingBufferLinear::Del(int Count) lastTail = tail; lastGet = Count; #endif + Unlock(); }

// --- cFrame ---------------------------------------------------------------- diff --git a/ringbuffer.h b/ringbuffer.h index 746fc51..cbaa12c 100644 --- a/ringbuffer.h +++ b/ringbuffer.h @@ -58,10 +58,13 @@ public: static void PrintDebugRBL(void); #endif private: + cMutex mutex; int margin, head, tail; int gotten; uchar *buffer; char *description; + void Lock(void) { mutex.Lock(); } + void Unlock(void) { mutex.Unlock(); } protected: virtual int DataReady(const uchar *Data, int Count); ///< By default a ring buffer has data ready as soon as there are at least

-- 2.39.1

Show replies by date

Klaus Schmidinger

2 Feb 2 Feb

11:27 p.m.

On 02.02.23 23:16, Patrick Lerda wrote:

...

Beside preventing crashes with vdr-2.6.3 this is required to get vdr to work properly with the gcc thread sanitizer.

cRingBufferLinear was designed to be thread safe without locking. What "crashes with vdr-2.6.3" are you referring to?

Klaus

patrick9876＠free.fr

11:47 p.m.

On 02/02/2023 23:27, Klaus Schmidinger wrote:

...

With a -fsanitize=thread compiled version of vdr, I had some crashes that happened quickly, for instance: ThreadSanitizer:DEADLYSIGNAL ==6496==ERROR: ThreadSanitizer: SEGV on unknown address 0x7f6ca48884a4 (pc 0x7f6cbf45726d bp 0x7f6ca3dbfb40 sp 0x7f6ca3dbf038 T7926) ==6496==The signal is caused by a WRITE memory access. #0 <null> <null> (libc.so.6+0x16926d) #1 __interceptor_memcpy <null> (libtsan.so.0+0x619fc) #2 cRingBufferLinear::Put(unsigned char const*, int) /dosy/src/vdr-2.6.3/ringbuffer.c:329 (vdr+0x5724ab) #3 cRemuxDummy::Put(unsigned char const*, int) /dosy/src/vdr-2.6.3/PLUGINS/src/streamdev/server/streamer.c:117 (libvdr-streamdev-server.so.2.6.3+0x504cd) #4 cStreamdevStreamer::Put(unsigned char const*, int) /dosy/src/vdr-2.6.3/PLUGINS/src/streamdev/server/streamer.c:182 (libvdr-streamdev-server.so.2.6.3+0x504cd) #5 cStreamdevLiveStreamer::Put(unsigned char const*, int) /dosy/src/vdr-2.6.3/PLUGINS/src/streamdev/server/livestreamer.c:604 (libvdr-streamdev-server.so.2.6.3+0x55565) #6 cStreamdevStreamer::Action() /dosy/src/vdr-2.6.3/PLUGINS/src/streamdev/server/streamer.c:174 (libvdr-streamdev-server.so.2.6.3+0x50718) #7 cStreamdevLiveStreamer::Action() /dosy/src/vdr-2.6.3/PLUGINS/src/streamdev/server/livestreamer.c:589 (libvdr-streamdev-server.so.2.6.3+0x55495) #8 cThread::StartThread(cThread*) /dosy/src/vdr-2.6.3/thread.c:293 (vdr+0x5be852) #9 <null> <null> (libtsan.so.0+0x333bf) #10 <null> <null> (libc.so.6+0x8a729) #11 <null> <null> (libc.so.6+0x1070bb)

Here are two detected data races: WARNING: ThreadSanitizer: data race (pid=6496) Write of size 4 at 0x7b4400003084 by thread T24: #0 cRingBufferLinear::Put(unsigned char const*, int) vdr-2.6.3/ringbuffer.c:330 (vdr+0x5724c6) #1 cIptvDevice::WriteData(unsigned char*, int) vdr-2.6.3/PLUGINS/src/iptv/device.c:384 (libvdr-iptv.so.2.6.3+0x16384) #2 cIptvStreamer::Action() vdr-2.6.3/PLUGINS/src/iptv/streamer.c:51 (libvdr-iptv.so.2.6.3+0x2bc46) #3 cThread::StartThread(cThread*) vdr-2.6.3/thread.c:293 (vdr+0x5be852) #4 <null> <null> (libtsan.so.0+0x333bf)

Previous read of size 4 at 0x7b4400003084 by thread T23: #0 cRingBufferLinear::Get(int&) vdr-2.6.3/ringbuffer.c:348 (vdr+0x572534) #1 cIptvDevice::GetData(int*) vdr-2.6.3/PLUGINS/src/iptv/device.c:408 (libvdr-iptv.so.2.6.3+0x1772c) #2 cIptvDevice::GetTSPacket(unsigned char*&) vdr-2.6.3/PLUGINS/src/iptv/device.c:456 (libvdr-iptv.so.2.6.3+0x17a21) #3 cDevice::Action() vdr-2.6.3/device.c:1718 (vdr+0x4bf86f) #4 cThread::StartThread(cThread*) vdr-2.6.3/thread.c:293 (vdr+0x5be852) #5 <null> <null> (libtsan.so.0+0x333bf)

Location is heap block of size 288 at 0x7b4400002f80 allocated by main thread: #0 operator new(unsigned long) <null> (libtsan.so.0+0x8d98c) #1 cIptvDevice::cIptvDevice(unsigned int) vdr-2.6.3/PLUGINS/src/iptv/device.c:29 (libvdr-iptv.so.2.6.3+0x183b4) #2 cIptvDevice::Initialize(unsigned int) vdr-2.6.3/PLUGINS/src/iptv/device.c:88 (libvdr-iptv.so.2.6.3+0x188b7) #3 cPluginIptv::Initialize() vdr-2.6.3/PLUGINS/src/iptv/iptv.c:109 (libvdr-iptv.so.2.6.3+0x140f2) #4 cPluginManager::InitializePlugins() vdr-2.6.3/plugin.c:375 (vdr+0x5536f2) #5 main vdr-2.6.3/vdr.c:825 (vdr+0x490ccb)

Thread T24 'IPTV streamer' (tid=6535, running) created by thread T23 at: #0 pthread_create <null> (libtsan.so.0+0x5fea5) #1 cThread::Start() vdr-2.6.3/thread.c:327 (vdr+0x5be393) #2 cIptvStreamer::Open() vdr-2.6.3/PLUGINS/src/iptv/streamer.c:66 (libvdr-iptv.so.2.6.3+0x2bf46) #3 cIptvDevice::OpenDvr() vdr-2.6.3/PLUGINS/src/iptv/device.c:342 (libvdr-iptv.so.2.6.3+0x15ed3) #4 cDevice::Action() vdr-2.6.3/device.c:1714 (vdr+0x4bf7b8) #5 cThread::StartThread(cThread*) vdr-2.6.3/thread.c:293 (vdr+0x5be852) #6 <null> <null> (libtsan.so.0+0x333bf)

Thread T23 'device 3 receiv' (tid=6534, running) created by thread T13 at: #0 pthread_create <null> (libtsan.so.0+0x5fea5) #1 cThread::Start() vdr-2.6.3/thread.c:327 (vdr+0x5be393) #2 cDevice::AttachReceiver(cReceiver*) vdr-2.6.3/device.c:1844 (vdr+0x4c055d) #3 cStreamdevLiveStreamer::Attach() vdr-2.6.3/PLUGINS/src/streamdev/server/livestreamer.c:614 (libvdr-streamdev-server.so.2.6.3+0x51761) #4 cStreamdevStreamer::Start(cTBSocket*) vdr-2.6.3/PLUGINS/src/streamdev/server/streamer.c:152 (libvdr-streamdev-server.so.2.6.3+0x508f4) #5 cConnectionHTTP::Flushed() vdr-2.6.3/PLUGINS/src/streamdev/server/connectionHTTP.c:439 (libvdr-streamdev-server.so.2.6.3+0x38e33) #6 cServerConnection::Write() vdr-2.6.3/PLUGINS/src/streamdev/server/connection.c:171 (libvdr-streamdev-server.so.2.6.3+0x2d1bf) #7 cStreamdevServer::Action() vdr-2.6.3/PLUGINS/src/streamdev/server/server.c:146 (libvdr-streamdev-server.so.2.6.3+0x2b94e) #8 cThread::StartThread(cThread*) vdr-2.6.3/thread.c:293 (vdr+0x5be852) #9 <null> <null> (libtsan.so.0+0x333bf)

SUMMARY: ThreadSanitizer: data race vdr-2.6.3/ringbuffer.c:330 in cRingBufferLinear::Put(unsigned char const*, int)

================== ================== WARNING: ThreadSanitizer: data race (pid=6496) Read of size 1 at 0x7f6cae7ff0bc by thread T23: #0 cIptvDevice::GetData(int*) vdr-2.6.3/PLUGINS/src/iptv/device.c:410 (libvdr-iptv.so.2.6.3+0x17755) #1 cIptvDevice::GetTSPacket(unsigned char*&) vdr-2.6.3/PLUGINS/src/iptv/device.c:456 (libvdr-iptv.so.2.6.3+0x17a21) #2 cDevice::Action() vdr-2.6.3/device.c:1718 (vdr+0x4bf86f) #3 cThread::StartThread(cThread*) vdr-2.6.3/thread.c:293 (vdr+0x5be852) #4 <null> <null> (libtsan.so.0+0x333bf)

Previous write of size 1 at 0x7f6cae7ff0bc by thread T24: #0 memcpy <null> (libtsan.so.0+0x619ce) #1 cRingBufferLinear::Put(unsigned char const*, int) vdr-2.6.3/ringbuffer.c:329 (vdr+0x5724ab) #2 cIptvDevice::WriteData(unsigned char*, int) vdr-2.6.3/PLUGINS/src/iptv/device.c:384 (libvdr-iptv.so.2.6.3+0x16384) #3 cIptvStreamer::Action() vdr-2.6.3/PLUGINS/src/iptv/streamer.c:51 (libvdr-iptv.so.2.6.3+0x2bc46) #4 cThread::StartThread(cThread*) vdr-2.6.3/thread.c:293 (vdr+0x5be852) #5 <null> <null> (libtsan.so.0+0x333bf)

Location is heap block of size 2097141 at 0x7f6cae7ff000 allocated by main thread: #0 malloc <null> (libtsan.so.0+0x367d9) #1 cRingBufferLinear::cRingBufferLinear(int, int, bool, char const*) vdr-2.6.3/ringbuffer.c:179 (vdr+0x571e2f) #2 cIptvDevice::cIptvDevice(unsigned int) vdr-2.6.3/PLUGINS/src/iptv/device.c:29 (libvdr-iptv.so.2.6.3+0x183ce) #3 cIptvDevice::Initialize(unsigned int) vdr-2.6.3/PLUGINS/src/iptv/device.c:88 (libvdr-iptv.so.2.6.3+0x188b7) #4 cPluginIptv::Initialize() vdr-2.6.3/PLUGINS/src/iptv/iptv.c:109 (libvdr-iptv.so.2.6.3+0x140f2) #5 cPluginManager::InitializePlugins() vdr-2.6.3/plugin.c:375 (vdr+0x5536f2) #6 main vdr-2.6.3/vdr.c:825 (vdr+0x490ccb)

SUMMARY: ThreadSanitizer: data race vdr-2.6.3/PLUGINS/src/iptv/device.c:410 in IptvDevice::GetData(int*) ================== ==================

Klaus Schmidinger

3 Feb 3 Feb

10:36 a.m.

On 02.02.23 23:47, patrick9876@free.fr wrote:

...

Before making such deep changes to code that has been running flawlessly for years (or even decades) I need to be convinced that this is absolutely necessary.

Is there a problem that occurs if you run VDR *without* -fsanitize=thread?

Klaus

Patrick Lerda

6 Feb 6 Feb

9:11 p.m.

On 03/02/2023 10:36, Klaus Schmidinger wrote:

...

I had in the past a crash from time to time, with vdr-2.6.3 this seems to be worse. Anyway, I was checking with vdr-2.4.7 and the problem is the same. This class is shared by at least 2 threads with more than one shared object; this means that without a mutex, the behavior is undefined from a C++ perspective. With -fsanitize=thread the compiler could add some jitter and that seems to trigger quickly a crash. You should check in your environment with -fsanitize=thread, this is fastest way to check for thread safety.

Patrick

Klaus Schmidinger

11:29 p.m.

On 06.02.23 21:11, Patrick Lerda wrote:

...

It is supposed to be shared by *exactly* two threads. One only writing 'head', the other only writing 'tail'. The concept was taken from the original DVB driver, which also implemented the ring buffer without locking.

...

with more than one shared object; this means that without a mutex, the behavior is undefined from a C++ perspective. With -fsanitize=thread the compiler could add some jitter and that seems to trigger quickly a crash. You should check in your environment with -fsanitize=thread, this is fastest way to check for thread safety.

If there really were such a big problem, I would expect it would happen a lot more often. However, this is the first time I hear of a problem that might stem from the implmentation of the ring buffer. What exactly is it that you're doing that causes this? Does it also happen if you don't do that?

Klaus

Udo Richter

7 Feb 7 Feb

12:54 a.m.

On 06.02.23 23:29, Klaus Schmidinger wrote:

...

It is supposed to be shared by *exactly* two threads. One only writing 'head', the other only writing 'tail'.

Two-ended buffers are pretty good when used correctly, but nowadays they have a small chance of triggering memory ordering issues, where it is possible that written data to the buffer is still stuck in a distant cache, while the updated head pointer is already known to other CPU cores. To be absolutely safe, the head pointer would need atomic read/write, forcing ordering with a memory barrier.

But in general the timing for such a bug is pretty narrow, and advanced CPUs (esp. intel/amd) make sure that things get stored in order anyway. Worst thing would also just be false data read, not a crash.

Cheers,

Udo

Marko Mäkelä

7:59 a.m.

Tue, Feb 07, 2023 at 12:54:16AM +0100, Udo Richter wrote:

...

Two-ended buffers are pretty good when used correctly, but nowadays they have a small chance of triggering memory ordering issues, where it is possible that written data to the buffer is still stuck in a distant cache, while the updated head pointer is already known to other CPU cores. To be absolutely safe, the head pointer would need atomic read/write, forcing ordering with a memory barrier.

A nice explanation. Before C++11 (ISO/IEC 14882:2011), to my understanding, multi-threaded programs or data races were not part of the standard. Starting with C++11, data races actually are undefined behavior, and not just "implementation defined". I would recommend the following resources:

https://en.cppreference.com/w/cpp/atomic/memory_order https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html

The latter only covers loads and stores, not read-modify-write like std::atomic::fetch_add(), but it gives an idea how things look like in different processors.

...

But in general the timing for such a bug is pretty narrow, and advanced CPUs (esp. intel/amd) make sure that things get stored in order anyway.

With the IA-32 and AMD64 a.k.a. x86-64 ISA, the general expectation is that implementations use Total Store Order (TSO). This is also how the SPARC and the RISC-V RVTSO works.

However, on ARM (and POWER and RISC-V RVWMO), weak memory ordering is the norm. On the ARM, some atomic operations would use retry loops, or with the recent LSE (Large System Extensions), special instructions. It should be remembered that there are VDR systems running on ARM, like the Raspberry Pi.

...

Worst thing would also just be false data read, not a crash.

If that false data were a stale pointer that points to now-freed memory, you could get a crash. Or you could get a hang or an assertion failure catching an inconsistent state.

Somewhat related to this, I recently learned about Linux "restartable sequences" https://lwn.net/Articles/883104/ which could be an alternative to using mutexes in some lock-free data structures. I think it could work for things where data would be "published" by updating just one pointer at the end. I didn't think of an application of that in VDR yet, and I am not sure if there is any. It is better to keep things simple if the performance is acceptable.

Marko

Patrick Lerda

9 Feb 9 Feb

11:31 p.m.

On 07/02/2023 07:59, Marko Mäkelä wrote:

...

Tue, Feb 07, 2023 at 12:54:16AM +0100, Udo Richter wrote:

...
Two-ended buffers are pretty good when used correctly, but nowadays they have a small chance of triggering memory ordering issues, where it is possible that written data to the buffer is still stuck in a distant cache, while the updated head pointer is already known to other CPU cores. To be absolutely safe, the head pointer would need atomic read/write, forcing ordering with a memory barrier.

A nice explanation. Before C++11 (ISO/IEC 14882:2011), to my understanding, multi-threaded programs or data races were not part of the standard. Starting with C++11, data races actually are undefined behavior, and not just "implementation defined". I would recommend the following resources:

https://en.cppreference.com/w/cpp/atomic/memory_order https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html

The latter only covers loads and stores, not read-modify-write like std::atomic::fetch_add(), but it gives an idea how things look like in different processors.

...
But in general the timing for such a bug is pretty narrow, and advanced CPUs (esp. intel/amd) make sure that things get stored in order anyway.

With the IA-32 and AMD64 a.k.a. x86-64 ISA, the general expectation is that implementations use Total Store Order (TSO). This is also how the SPARC and the RISC-V RVTSO works.

However, on ARM (and POWER and RISC-V RVWMO), weak memory ordering is the norm. On the ARM, some atomic operations would use retry loops, or with the recent LSE (Large System Extensions), special instructions. It should be remembered that there are VDR systems running on ARM, like the Raspberry Pi.

...
Worst thing would also just be false data read, not a crash.

If that false data were a stale pointer that points to now-freed memory, you could get a crash. Or you could get a hang or an assertion failure catching an inconsistent state.

Somewhat related to this, I recently learned about Linux "restartable sequences" https://lwn.net/Articles/883104/ which could be an alternative to using mutexes in some lock-free data structures. I think it could work for things where data would be "published" by updating just one pointer at the end. I didn't think of an application of that in VDR yet, and I am not sure if there is any. It is better to keep things simple if the performance is acceptable.

Marko

vdr mailing list vdr@linuxtv.org https://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr

This is really related to the C++ thread safety model, the patch below fixes the main cRingBufferLinear issue:

diff --git a/ringbuffer.h b/ringbuffer.h index 746fc51..a3fa499 100644 --- a/ringbuffer.h +++ b/ringbuffer.h @@ -10,6 +10,7 @@ #ifndef __RINGBUFFER_H #define __RINGBUFFER_H

+#include <atomic> #include "thread.h" #include "tools.h"

@@ -58,7 +59,8 @@ public: static void PrintDebugRBL(void); #endif private: - int margin, head, tail; + int margin; + std::atomic_int head, tail; int gotten; uchar *buffer; char *description;

Patrick

Klaus Schmidinger

15 Feb 15 Feb

4:51 p.m.

On 09.02.23 23:31, Patrick Lerda wrote:

...

Is there an actual, reproducable, real world problem that makes this necessary? I mean without any "thread sanitizer" etc.

Klaus

Patrick Lerda

19 Feb 19 Feb

6:29 p.m.

On 15/02/2023 16:51, Klaus Schmidinger wrote:

...

I had definitively a few crashes related to this class. Thread safety issues are often not easily reproducible. Is your environment 100% reliable?

Patrick

...

Klaus Schmidinger

21 Feb 21 Feb

10:47 a.m.

On 19.02.23 18:29, Patrick Lerda wrote:

...

... I had definitively a few crashes related to this class. Thread safety issues are often not easily reproducible. Is your environment 100% reliable?

My VDR runs for weeks, even months 24/7 without problems. I only restart it when I have a new version.

Klaus

Marko Mäkelä

25 Feb 25 Feb

3:36 p.m.

Tue, Feb 21, 2023 at 10:47:28AM +0100, Klaus Schmidinger wrote:

...

How many threads would be created or destroyed per day, in your typical usage? If we assume a couple thousand such events per day, that would be roughly a million events per year. It could take a thousand or a million years before a low-probability crash could be reproduced in this way. Even if it occurred, would you be guaranteed to thoroughly debug it? With the next scheduled recording approaching in a few minutes?

I was thinking that it could be helpful to implement some automated testing of restarts. I made a simple experiment, with a tuner stick plugged into the USB port of an AMD64 laptop (ARM would be much better for reproducing many race conditions), and no aerial cable:

mkdir /dev/shm/v touch /dev/shm/v/sources.conf /dev/shm/v/channels.conf i=0 while ./vdr --no-kbd -L. -Pskincurses -c /dev/shm/v -v /dev/shm/v do echo -n "$i" i=$((i+1)) done

First, I thought of using an unpatched VDR. The easiest way to trigger shutdown would seem to be SIGHUP. I did not figure out how to automate the sending of that signal. Instead, I thought I would apply a crude patch to the code, like this:

diff --git a/vdr.c b/vdr.c index 1bdc51ab..b35c4aeb 100644 --- a/vdr.c +++ b/vdr.c @@ -1024,6 +1024,7 @@ int main(int argc, char *argv[]) dsyslog("SD_WATCHDOG ping"); } #endif + EXIT(0); // Handle channel and timer modifications: { // Channels and timers need to be stored in a consistent manner,

I did not check if this would actually exercise the thread creation and shutdown. Maybe not sufficiently, since I do not see any skincurses output on the screen.

Several such test loops against a vanilla VDR code base could be run concurrently, using different DVB tuners, configuration directories, and SVDRP ports. The test harness could issue HITK commands to randomly switch channels, start and stop recordings, and finally restart VDR. As long as the process keeps returning the expected exit status on restart, the harness would restart it.

It should be possible to cover tens or hundreds of thousands VDR restarts per day, and much more if the startup and shutdown logic was streamlined to shorten any timeouts. In my environment, each iteration with the above patch took about 3 seconds, which I find somewhat excessive.

Should a problem be caught in this way, we should be able to get a core dump of a crash, or we could attach GDB to a hung process to examine what is going on.

Patrick, did you try reproducing any VDR problems under "rr record" (https://rr-project.org/)? Debugging in "rr replay" would give access to the exact sequence of events. For those race conditions that can be reproduced in that way, debugging becomes almost trivial. (Just set some data watchpoints and reverse-continue from the final state.)

Marko

800

Age (days ago)

823

Last active (days ago)

vdr@linuxtv.org

12 comments

5 participants

tags (0)

participants (5)

Klaus Schmidinger
Marko Mäkelä
Patrick Lerda
patrick9876＠free.fr
Udo Richter