Tue, Feb 21, 2023 at 10:47:28AM +0100, Klaus Schmidinger wrote:
On 19.02.23 18:29, Patrick Lerda wrote:
... I had definitively a few crashes related to this class. Thread safety issues are often not easily reproducible. Is your environment 100% reliable?
My VDR runs for weeks, even months 24/7 without problems. I only restart it when I have a new version.
How many threads would be created or destroyed per day, in your typical usage? If we assume a couple thousand such events per day, that would be roughly a million events per year. It could take a thousand or a million years before a low-probability crash could be reproduced in this way. Even if it occurred, would you be guaranteed to thoroughly debug it? With the next scheduled recording approaching in a few minutes?
I was thinking that it could be helpful to implement some automated testing of restarts. I made a simple experiment, with a tuner stick plugged into the USB port of an AMD64 laptop (ARM would be much better for reproducing many race conditions), and no aerial cable:
mkdir /dev/shm/v touch /dev/shm/v/sources.conf /dev/shm/v/channels.conf i=0 while ./vdr --no-kbd -L. -Pskincurses -c /dev/shm/v -v /dev/shm/v do echo -n "$i" i=$((i+1)) done
First, I thought of using an unpatched VDR. The easiest way to trigger shutdown would seem to be SIGHUP. I did not figure out how to automate the sending of that signal. Instead, I thought I would apply a crude patch to the code, like this:
diff --git a/vdr.c b/vdr.c index 1bdc51ab..b35c4aeb 100644 --- a/vdr.c +++ b/vdr.c @@ -1024,6 +1024,7 @@ int main(int argc, char *argv[]) dsyslog("SD_WATCHDOG ping"); } #endif + EXIT(0); // Handle channel and timer modifications: { // Channels and timers need to be stored in a consistent manner,
I did not check if this would actually exercise the thread creation and shutdown. Maybe not sufficiently, since I do not see any skincurses output on the screen.
Several such test loops against a vanilla VDR code base could be run concurrently, using different DVB tuners, configuration directories, and SVDRP ports. The test harness could issue HITK commands to randomly switch channels, start and stop recordings, and finally restart VDR. As long as the process keeps returning the expected exit status on restart, the harness would restart it.
It should be possible to cover tens or hundreds of thousands VDR restarts per day, and much more if the startup and shutdown logic was streamlined to shorten any timeouts. In my environment, each iteration with the above patch took about 3 seconds, which I find somewhat excessive.
Should a problem be caught in this way, we should be able to get a core dump of a crash, or we could attach GDB to a hung process to examine what is going on.
Patrick, did you try reproducing any VDR problems under "rr record" (https://rr-project.org/)? Debugging in "rr replay" would give access to the exact sequence of events. For those race conditions that can be reproduced in that way, debugging becomes almost trivial. (Just set some data watchpoints and reverse-continue from the final state.)
Marko