[dev.icinga.com #4427] Persistent ido2db process after an ido2db service restart #1312

icinga-migration · 2013-07-18T07:52:09Z

This issue has been migrated from Redmine: https://dev.icinga.com/issues/4427

Created by tontonitch on 2013-07-18 07:52:09 +00:00

Assignee: mfriedrich
Status: Resolved (closed on 2014-01-03 20:03:52 +00:00)
Target Version: 1.10.3
Last Update: 2014-12-08 14:38:12 +00:00 (in Redmine)

Icinga Version: 1.10.0
OS Version: any

Hi,

Since I've upgraded from icinga 1.8 to 1.9 (currently 1.9.3), it appears that one ido2db process is not correctly stopped after an ido2db restart. Problem started to occur with icinga version 1.9.0.

Consequently, sometimes there are 3 ido2db processes running:
# ps -ef | grep ido2db icinga 52898 1 0 Jul17 ? 00:03:10 /Monitoring/icinga/bin/ido2db -c /Monitoring/icinga/etc/ido2db.cfg icinga 79274 1 0 01:00 ? 00:00:07 /Monitoring/icinga/bin/ido2db -c /Monitoring/icinga/etc/ido2db.cfg icinga 79298 79274 0 01:00 ? 00:01:53 /Monitoring/icinga/bin/ido2db -c /Monitoring/icinga/etc/ido2db.cfg

Even if I stop the ido2db service, one process remains and I need to kill it (kill -9 52898)

This situation doesn't appear at each ido2db restart. I try to reproduce the problem with debug, but no success yet.

Regards,
Yannick

Changesets

2014-01-03 19:46:41 +00:00 by (unknown) 9516b8c

idoutils: wait for child processes on exit preventing zombies

Refs #4427

2014-01-03 19:59:04 +00:00 by (unknown) 238aa46

Merge branch 'fix/ido2db-kill-waitpid-4427' into next

Fixes #4427

2014-01-03 20:01:38 +00:00 by (unknown) b1ed17b

Update Changelog/THANKS.

Refs #4427

2014-01-09 22:28:36 +00:00 by (unknown) 5164ca6

idoutils: wait for child processes on exit preventing zombies

Refs #4427

Conflicts:
	Changelog

2014-01-23 15:15:33 +00:00 by (unknown) 144a0b7

Update Changelog.

Refs #4968
Refs #5434
Refs #4427
Refs #4825
Refs #5263
Refs #5545

The text was updated successfully, but these errors were encountered:

icinga-migration · 2013-07-27T17:37:40Z

Updated by mfriedrich on 2013-07-27 17:37:40 +00:00

i had that once, but i cannot reproduce it easily.

icinga-migration · 2013-12-17T11:23:56Z

Updated by bigon on 2013-12-17 11:23:56 +00:00

Hi,

I'm experiencing this issue quite often on my infrastructure when the database is busy. This could actually lead to a problem where the new ido2db process is getting stuck and then blocking everything in the core.

I looked at the code and the problem is IMHO in the ido2db_parent_sighandler() function which is racy. If the child is busy writing to the database, it might miss the kill signal, this means in return that the parent will never recieved the SIGCHLD and thus never wait for the child to die. In this condition, the rest of the function is called and in ido2db_cleanup_socket() both the socket and the pidfile are removed. Most of the initscript are relying on the pid file to see if the process has properly exited and otherwise try to kill -9 the processes, this is not working as the pidfile is already gone.

IMHO, wait()/waitpid() should be called just after calling kill() function and wait until all the children have died.

Edit: The same code seems to be present in nagios codebase

icinga-migration · 2014-01-03T20:03:18Z

Updated by mfriedrich on 2014-01-03 20:03:18 +00:00

Status changed from New to Assigned
Assigned to set to mfriedrich
Target Version set to 1.11

In regards of waitpid() you're truly right, the parent processes should make sure to wait for all child processes to exit properly before terminating itself (and return early if there are no children). I cannot reproduce it easily, but I've pushed your proposed fix to the current development tree.

icinga-migration · 2014-01-03T20:03:52Z

Updated by Anonymous on 2014-01-03 20:03:52 +00:00

Status changed from Assigned to Resolved
Done % changed from 0 to 100

Applied in changeset icinga-core:238aa46023953de0e16c197a83851e317b97aaa6.

icinga-migration · 2014-01-09T14:10:04Z

Updated by bigon on 2014-01-09 14:10:04 +00:00

Would it be possible to backport this for the next 1.10 point release?

icinga-migration · 2014-01-09T22:29:31Z

Updated by mfriedrich on 2014-01-09 22:29:31 +00:00

cherry picked into support/1.10

icinga-migration · 2014-01-27T19:30:45Z

Updated by mfriedrich on 2014-01-27 19:30:45 +00:00

Target Version changed from 1.11 to 1.10.3

icinga-migration · 2014-12-08T14:38:12Z

Updated by mfriedrich on 2014-12-08 14:38:12 +00:00

Project changed from 18 to Core, Classic UI, IDOUtils
Category changed from 79 to IDOUtils
Icinga Version changed from 1 to 1
OS Version set to any

icinga-migration closed this as completed Jan 3, 2014

icinga-migration added bug IDOUtils labels Jan 17, 2017

icinga-migration added this to the 1.10.3 milestone Jan 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dev.icinga.com #4427] Persistent ido2db process after an ido2db service restart #1312

[dev.icinga.com #4427] Persistent ido2db process after an ido2db service restart #1312

icinga-migration commented Jul 18, 2013

icinga-migration commented Jul 27, 2013

icinga-migration commented Dec 17, 2013

icinga-migration commented Jan 3, 2014

icinga-migration commented Jan 3, 2014

icinga-migration commented Jan 9, 2014

icinga-migration commented Jan 9, 2014

icinga-migration commented Jan 27, 2014

icinga-migration commented Dec 8, 2014

[dev.icinga.com #4427] Persistent ido2db process after an ido2db service restart #1312

[dev.icinga.com #4427] Persistent ido2db process after an ido2db service restart #1312

Comments

icinga-migration commented Jul 18, 2013

icinga-migration commented Jul 27, 2013

icinga-migration commented Dec 17, 2013

icinga-migration commented Jan 3, 2014

icinga-migration commented Jan 3, 2014

icinga-migration commented Jan 9, 2014

icinga-migration commented Jan 9, 2014

icinga-migration commented Jan 27, 2014

icinga-migration commented Dec 8, 2014