Following on from my previous work on fixing the way in which the UDP socket is opened for receiving notification messages I have been looking at why the LCFG component just hangs when the rdxprof process fails to daemonise.
It turns out that the LCFG client component uses an obscure ngeneric feature of the
Start function which is that the final step is to call a
StartWait function if it has been defined. In the client component this
StartWait function sits waiting forever for a client context change even when the rdxprof process failed to start…
I think the problem comes from an expectation that the call to the
Daemon function, which starts and backgrounds the rdxprof process, will fail if rdxprof fails to run. It does not fail (
$? is zero) and the PID of the rdxprof process is always accessible through the
$! variable, even if it was very short-lived.
There is, thankfully, a very simple solution here. The client component already has a
IsProcessRunning function which can be used to check if the process associated with a PID is still active. This has to be used carefully, I have put a short sleep after the daemonisation stage to ensure that the process is fully started before doing the check. The check is also fairly naive so there is the slight risk that if the system is under resource pressure the rdxprof process could fail and then the PID could be immediately reused. For now I think it’s reasonable to just accept the risks attached and revisit the issue later if it causes us problems. Associated with this, clearly the
StartWait function really ought to eventually give up.