[dev.icinga.com #4427] Persistent ido2db process after an ido2db service restart #1312
Comments
Updated by mfriedrich on 2013-07-27 17:37:40 +00:00 i had that once, but i cannot reproduce it easily. |
Updated by bigon on 2013-12-17 11:23:56 +00:00 Hi, I'm experiencing this issue quite often on my infrastructure when the database is busy. This could actually lead to a problem where the new ido2db process is getting stuck and then blocking everything in the core. I looked at the code and the problem is IMHO in the ido2db_parent_sighandler() function which is racy. If the child is busy writing to the database, it might miss the kill signal, this means in return that the parent will never recieved the SIGCHLD and thus never wait for the child to die. In this condition, the rest of the function is called and in ido2db_cleanup_socket() both the socket and the pidfile are removed. Most of the initscript are relying on the pid file to see if the process has properly exited and otherwise try to kill -9 the processes, this is not working as the pidfile is already gone. IMHO, wait()/waitpid() should be called just after calling kill() function and wait until all the children have died. Edit: The same code seems to be present in nagios codebase |
Updated by mfriedrich on 2014-01-03 20:03:18 +00:00
In regards of waitpid() you're truly right, the parent processes should make sure to wait for all child processes to exit properly before terminating itself (and return early if there are no children). I cannot reproduce it easily, but I've pushed your proposed fix to the current development tree. |
Updated by Anonymous on 2014-01-03 20:03:52 +00:00
Applied in changeset icinga-core:238aa46023953de0e16c197a83851e317b97aaa6. |
Updated by bigon on 2014-01-09 14:10:04 +00:00 Would it be possible to backport this for the next 1.10 point release? |
Updated by mfriedrich on 2014-01-09 22:29:31 +00:00 cherry picked into support/1.10 |
Updated by mfriedrich on 2014-01-27 19:30:45 +00:00
|
Updated by mfriedrich on 2014-12-08 14:38:12 +00:00
|
This issue has been migrated from Redmine: https://dev.icinga.com/issues/4427
Created by tontonitch on 2013-07-18 07:52:09 +00:00
Assignee: mfriedrich
Status: Resolved (closed on 2014-01-03 20:03:52 +00:00)
Target Version: 1.10.3
Last Update: 2014-12-08 14:38:12 +00:00 (in Redmine)
Hi,
Since I've upgraded from icinga 1.8 to 1.9 (currently 1.9.3), it appears that one ido2db process is not correctly stopped after an ido2db restart. Problem started to occur with icinga version 1.9.0.
Consequently, sometimes there are 3 ido2db processes running:
# ps -ef | grep ido2db icinga 52898 1 0 Jul17 ? 00:03:10 /Monitoring/icinga/bin/ido2db -c /Monitoring/icinga/etc/ido2db.cfg icinga 79274 1 0 01:00 ? 00:00:07 /Monitoring/icinga/bin/ido2db -c /Monitoring/icinga/etc/ido2db.cfg icinga 79298 79274 0 01:00 ? 00:01:53 /Monitoring/icinga/bin/ido2db -c /Monitoring/icinga/etc/ido2db.cfg
Even if I stop the ido2db service, one process remains and I need to kill it (kill -9 52898)
This situation doesn't appear at each ido2db restart. I try to reproduce the problem with debug, but no success yet.
Regards,
Yannick
Changesets
2014-01-03 19:46:41 +00:00 by (unknown) 9516b8c
2014-01-03 19:59:04 +00:00 by (unknown) 238aa46
2014-01-03 20:01:38 +00:00 by (unknown) b1ed17b
2014-01-09 22:28:36 +00:00 by (unknown) 5164ca6
2014-01-23 15:15:33 +00:00 by (unknown) 144a0b7
The text was updated successfully, but these errors were encountered: