Hi,
It seems that the new shutdown procedure introduced for versions 1.5.x is (at least for me) not working as expected. The problem is that it does not recognize correctly whether VDR was started manually or not. In fact it thinks every time that it has been started manually. I traced the problem to the SystemExec call in cShutdownHandler::CallShutdownCommand where Setup.NextWakeupTime is updated only if the SystemExec call returns 0. i changed it to accept all values greater or equal to 0 and now Setup.NextWakeupTime gets updated properly. Can some explain me why 0 is expected?
Here is the patch that I came up with:
--- shutdown.c.orig 2007-08-02 00:14:49.000000000 +0300 +++ shutdown.c 2007-08-02 00:15:10.000000000 +0300 @@ -126,7 +126,7 @@ time_t Delta = WakeupTime ? WakeupTime - time(NULL) : 0; cString cmd = cString::sprintf("%s %ld %ld %d "%s" %d", shutdownCommand, WakeupTime, Delta, Channel, *strescape(File, ""$"), UserShutdown); isyslog("executing '%s'", *cmd); - if (SystemExec(cmd, true) == 0) + if (SystemExec(cmd, true) >= 0) Setup.NextWakeupTime = WakeupTime; // Remember this wakeup time for comparison on reboot }
-Petri
Petri Helin wrote:
I traced the problem to the SystemExec call in cShutdownHandler::CallShutdownCommand where Setup.NextWakeupTime is updated only if the SystemExec call returns 0. i changed it to accept all values greater or equal to 0 and now Setup.NextWakeupTime gets updated properly. Can some explain me why 0 is expected?
SystemExec called with true as second parameter should only return -1 in case of an error (logged to syslog) or the status of waitpid() in case of success. (see thread.c) Since the direct child instantly does exit(0), I assumed that status should also be 0.
On the other hand, the man page of waitpid mentions lots of macros to inspect the status return of waitpid. Maybe checking on these helps here.
Cheers,
Udo
Udo Richter wrote:
Petri Helin wrote:
I traced the problem to the SystemExec call in cShutdownHandler::CallShutdownCommand where Setup.NextWakeupTime is updated only if the SystemExec call returns 0. i changed it to accept all values greater or equal to 0 and now Setup.NextWakeupTime gets updated properly. Can some explain me why 0 is expected?
SystemExec called with true as second parameter should only return -1 in case of an error (logged to syslog) or the status of waitpid() in case of success. (see thread.c) Since the direct child instantly does exit(0), I assumed that status should also be 0.
On the other hand, the man page of waitpid mentions lots of macros to inspect the status return of waitpid. Maybe checking on these helps here.
It seems to me that VDR still fails writing the NextWakeupTime every now and then, even after I made the change described earlier. Do you have suggestions on how to shutdown properly in order to get the NextWakeupTime written to setup.conf? I run VDR as a daemon and shut it down in shutdown script by calling start-stop-daemon (I use Ubuntu Feisty).
-Petri
Petri Helin wrote:
It seems to me that VDR still fails writing the NextWakeupTime every now and then, even after I made the change described earlier. Do you have suggestions on how to shutdown properly in order to get the NextWakeupTime written to setup.conf? I run VDR as a daemon and shut it down in shutdown script by calling start-stop-daemon (I use Ubuntu Feisty).
You can try something like this for standards conformity:
int ret = SystemExec(cmd, true); if (WIFEXITED(ret) && WEXITSTATUS(ret) == 0) Setup.NextWakeupTime = WakeupTime; // Remember this wakeup time
(not tested) Or, you can try the permissive way and test for != -1, which will only catch fatal errors like fork failing.
In any case, try dumping the return value to syslog, I would really like to know what additional flag is set on the return value.
Cheers,
Udo
Petri Helin wrote:
Udo Richter wrote:
In any case, try dumping the return value to syslog, I would really like to know what additional flag is set on the return value.
I got 6, 9 and 11 when I did some debugging earlier.
I *think* that these are kill signals received by the child process. Which is strange, as the child does an exit immediately. (Unless you're somewhere between 1.5.1 and 1.5.3 - this changed in 1.5.4)
6 is SIGABRT, 11 is SIGSEGV and 9 is SIGKILL.
Cheers,
Udo
Udo Richter wrote:
Petri Helin wrote:
Udo Richter wrote:
In any case, try dumping the return value to syslog, I would really like to know what additional flag is set on the return value.
I got 6, 9 and 11 when I did some debugging earlier.
I *think* that these are kill signals received by the child process. Which is strange, as the child does an exit immediately. (Unless you're somewhere between 1.5.1 and 1.5.3 - this changed in 1.5.4)
6 is SIGABRT, 11 is SIGSEGV and 9 is SIGKILL.
I use version 1.5.6. I tested with my normal shutdown script and with a script that just does an "exit 0". No difference there. Can the script itself have any influence on the return value?
-Petri
Petri Helin wrote:
Udo Richter wrote:
I *think* that these are kill signals received by the child process. Which is strange, as the child does an exit immediately. (Unless you're somewhere between 1.5.1 and 1.5.3 - this changed in 1.5.4)
6 is SIGABRT, 11 is SIGSEGV and 9 is SIGKILL.
I use version 1.5.6. I tested with my normal shutdown script and with a script that just does an "exit 0". No difference there. Can the script itself have any influence on the return value?
Since 1.5.4, VDR forks into a child that does nothing but fork off a second child, and do an exit(0). The second child exec's the shutdown script:
VDR +-- Child 1 -> exits +-- Child 2 -> runs shutdown
That way the waitpid quickly returns, the second child becomes orphaned, and no process ends up being a zombie.
Since the shutdown script runs orphaned, the return value of the script is not evaluated, nor does anyone wait for the script to terminate.
Cheers,
Udo
On 8/9/07, Udo Richter udo_richter@gmx.de wrote:
Petri Helin wrote:
Udo Richter wrote:
I *think* that these are kill signals received by the child process. Which is strange, as the child does an exit immediately. (Unless you're somewhere between 1.5.1 and 1.5.3 - this changed in 1.5.4)
6 is SIGABRT, 11 is SIGSEGV and 9 is SIGKILL.
I use version 1.5.6. I tested with my normal shutdown script and with a script that just does an "exit 0". No difference there. Can the script itself have any influence on the return value?
Since 1.5.4, VDR forks into a child that does nothing but fork off a second child, and do an exit(0). The second child exec's the shutdown script:
VDR +-- Child 1 -> exits +-- Child 2 -> runs shutdown
That way the waitpid quickly returns, the second child becomes orphaned, and no process ends up being a zombie.
Since the shutdown script runs orphaned, the return value of the script is not evaluated, nor does anyone wait for the script to terminate.
I noticed my vdr-1.5.6 did an emergency shutdown when I had poor signal reception during a recording, which in itself is a fine thing to do... but my "runvdr" script didn't seem to catch the bad exit code (exit 1) when vdr did that, so VDR never restarted. This code snippet used to work for me, but doesn't now:
eval "screen -D -m -S vdr $VDRCMD &" # Remember PID of VDR process PID=$! # Wait for VDR to end or signal to arrive wait $PID # Remember return value of VDR RET=$? if test $RET -eq 0 -o $RET -eq 2; then exit; fi TIMEOFDEATH=$(date +%s)
so, something is wrong with RET. Does this have to do with the children spawning the exit code?
Groeten.
Stone wrote:
I noticed my vdr-1.5.6 did an emergency shutdown when I had poor signal reception during a recording, which in itself is a fine thing to do... but my "runvdr" script didn't seem to catch the bad exit code (exit 1) when vdr did that, so VDR never restarted.
Does this have to do with the children spawning the exit code?
No, the exit code of VDR is independent of what the script returns. You're sure that it did not exit with 0 or 2? Maybe some other problem prevented the next restart?
Cheers,
Udo
On 8/10/07, Udo Richter udo_richter@gmx.de wrote:
Stone wrote:
I noticed my vdr-1.5.6 did an emergency shutdown when I had poor signal reception during a recording, which in itself is a fine thing to do... but my "runvdr" script didn't seem to catch the bad exit code (exit 1) when vdr did that, so VDR never restarted.
Does this have to do with the children spawning the exit code?
No, the exit code of VDR is independent of what the script returns. You're sure that it did not exit with 0 or 2? Maybe some other problem prevented the next restart?
It is possible there was something else going on. I will investigate further, but thanks for the info. Perhaps I will skip my runvdr script and just use your "extreme" runvdr version instead :)
Groeten.