[dev.icinga.com #2688] triggered downtimes for child hosts are missing after icinga restart #1006

icinga-migration · 2012-06-14T15:39:21Z

This issue has been migrated from Redmine: https://dev.icinga.com/issues/2688

Created by mlucka on 2012-06-14 15:39:21 +00:00

Assignee: mfriedrich
Status: Resolved (closed on 2013-04-10 18:48:09 +00:00)
Target Version: 1.9
Last Update: 2013-04-10 18:48:09 +00:00 (in Redmine)

Icinga Version: 1.6.0
OS Version: Debian

Hi,

there's an issue on triggered downtime feature, seen on icinga 1.7.0 and nagios 3.2.3 and above...

triggered downtimes (type: fixed, child hosts: schedule triggered downtime for all child hosts) for child hosts will be delete during icinga restart. the downtime on the master host (parent) is not affected.
This should be easy to reproduce with just 2 hosts. If you need further information on that subject, don't hesitate to get touch with me.

Best regards

Michael

Attachments

icinga-bug-2688.tgz mlucka - 2012-06-15 14:54:43 +00:00
2688.cfg mfriedrich - 2012-10-30 20:10:34 +00:00
status_retention_dat_2688.zip mfriedrich - 2012-10-30 20:10:35 +00:00
99_fix_triggered_downtimes.dpatch mlucka - 2013-02-28 15:44:21 +00:00
icinga_1.9_fix_child_downtimes_2688.png mfriedrich - 2013-03-10 14:51:59 +00:00

Changesets

2012-10-30 20:12:47 +00:00 by mfriedrich 2b671f4

add test case refs #2688

The text was updated successfully, but these errors were encountered:

icinga-migration · 2012-06-14T15:42:00Z

Updated by mlucka on 2012-06-14 15:42:00 +00:00

I did not test this on earlier icinga versions, but nagios 3.0.6 and 3.2.1 doesn't have this issue. Maybe this information will help you a little bit while investigating...

icinga-migration · 2012-06-14T15:45:43Z

Updated by mfriedrich on 2012-06-14 15:45:43 +00:00

please provide some sample configs, as well as logs generated out of this.

icinga-migration · 2012-06-14T15:48:58Z

Updated by mlucka on 2012-06-14 15:48:58 +00:00

This could be the reason/solution: http://tracker.nagios.org/view.php?id=338

Found some seconds ago...

icinga-migration · 2012-06-14T16:07:35Z

Updated by mfriedrich on 2012-06-14 16:07:35 +00:00

Target Version deleted ~~1.7~~

no. nagios 3.4.x took an icinga patch, from 2 years ago which has been rewritten ever since in icinga upstream.

icinga handles a restart (and therefore not being in effect downtime) differently, see common/downtime.c starting with

        /* else we are just starting the scheduled downtime */
        else {
...
                /* this happens after restart of icinga */
                if (temp_downtime->is_in_effect != TRUE) {

that patch addresses hosts in downtime not being persistent anymore after restart.

you are talking about child hosts triggered by the parent, which is a different story. so please provide your configs, and logs (plus debug logs in that special case) in order to see if your bug report is valid and reproducable.

icinga-migration · 2012-06-15T14:54:43Z

Updated by mlucka on 2012-06-15 14:54:43 +00:00

File added icinga-bug-2688.tgz

Hi,

please find attached sample config, logs and screen shots...

I reproduced the behavior as follows (debian squeeze, up2date, 32bit):

setup fresh icinga 1.7.0 installation from backports-squeeze
adjusted icinga config for test setup, forced checks, stopped icinga, cleaned-up logs, started icinga again
saved status.dat into status.dat_after_first_start
scheduled a fixed downtime for localhost, triggered for all child hosts (test in this case)
saved status.dat into status.dat_before_first_stop
stopped icinga
saved retention.dat into retention.dat_after_first_stop
started icinga
the triggered host downtime (for child host test) was missing on the downtime page (icinga-bug-2688-02.jpg) but not on the host itself (icinga-bug-2688-01.jpg)
triggered downtime was also included in the current status file (status.dat_after_first_start)
i stopped icinga again and saved retention.dat into retention.dat_after_second_stop
the triggered downtime child host test was still included in retention.dat_after_second_stop
started up icinga again
triggered downtime for child host test was missing on downtime page (icinga-bug-2688-04.jpg) and the host itself (icinga-bug-2688-03.jpg)
saved status.dat into status.dat_after_second_start, triggered downtime for child host test was missing here as well
stopped icinga
saved retention.dat into retention.dat_after_third_stop, just one downtime from the parent included

I think you can reproduce it by yourself easily. An icinga 1.6.2 installation was tested as well showing the same results.

Best regards

Michael

icinga-migration · 2012-06-15T15:33:31Z

Updated by mfriedrich on 2012-06-15 15:33:31 +00:00

Status changed from New to Assigned
Assigned to set to mfriedrich

thanks for the detailed report, i'll put it on my todo list after 1.7.1 is out, plus when i am a puppet master.

icinga-migration · 2012-10-01T17:20:59Z

Updated by mlucka on 2012-10-01 17:20:59 +00:00

Hi,

is there any schedule available, when this issue could be fixed?

Best regards

Michael

icinga-migration · 2012-10-02T08:06:32Z

Updated by mfriedrich on 2012-10-02 08:06:32 +00:00

Icinga Version set to 1
OS Version set to Debian

haven't had the time yet. hopefully others have - otherwise it will remain a todo.

icinga-migration · 2012-10-24T18:37:05Z

Updated by mfriedrich on 2012-10-24 18:37:05 +00:00

Target Version set to 1.9

icinga-migration · 2012-10-30T20:10:35Z

Updated by mfriedrich on 2012-10-30 20:10:35 +00:00

File added 2688.cfg
File added status_retention_dat_2688.zip

testing with f78e443 as latest commit.

status_dat_before_first_stop

hoststatus {
        host_name=2688localhost-test

        scheduled_downtime_depth=1
        }

hoststatus {
        host_name=2688localhost-test-p1

        scheduled_downtime_depth=1
        }

hoststatus {
        host_name=2688localhost-test-p1-2


        scheduled_downtime_depth=1
        }

hoststatus {
        host_name=2688localhost-test-p2

        scheduled_downtime_depth=1
        }

retention_dat_after_first_stop

hostdowntime {
host_name=2688localhost-test
downtime_id=63
entry_time=1351624983
start_time=1351624960
end_time=1351628560
triggered_by=0
fixed=1
duration=3600
is_in_effect=1
author=icingademo
comment=test2688
trigger_time=1351624983
}
hostdowntime {
host_name=2688localhost-test-p2
downtime_id=64
entry_time=1351624983
start_time=1351624960
end_time=1351628560
triggered_by=63
fixed=1
duration=3600
is_in_effect=1
author=icingademo
comment=test2688
trigger_time=1351624983
}
hostdowntime {
host_name=2688localhost-test-p1-2
downtime_id=65
entry_time=1351624983
start_time=1351624960
end_time=1351628560
triggered_by=63
fixed=1
duration=3600
is_in_effect=1
author=icingademo
comment=test2688
trigger_time=1351624983
}
hostdowntime {
host_name=2688localhost-test-p1
downtime_id=66
entry_time=1351624983
start_time=1351624960
end_time=1351628560
triggered_by=63
fixed=1
duration=3600
is_in_effect=1
author=icingademo
comment=test2688
trigger_time=1351624983
}

status_dat_after_first_start

hoststatus {
        host_name=2688localhost-test

        scheduled_downtime_depth=1
        }

hoststatus {
        host_name=2688localhost-test-p1

        scheduled_downtime_depth=1
        }

hoststatus {
        host_name=2688localhost-test-p1-2


        scheduled_downtime_depth=1
        }

hoststatus {
        host_name=2688localhost-test-p2

        scheduled_downtime_depth=1
        }

retention_dat_after_second_stop

hostdowntime {
host_name=2688localhost-test-p1
downtime_id=66
entry_time=1351624983
start_time=1351624960
end_time=1351628560
triggered_by=63
fixed=1
duration=3600
is_in_effect=1
author=icingademo
comment=test2688
trigger_time=1351624983
}
hostdowntime {
host_name=2688localhost-test-p1-2
downtime_id=65
entry_time=1351624983
start_time=1351624960
end_time=1351628560
triggered_by=63
fixed=1
duration=3600
is_in_effect=1
author=icingademo
comment=test2688
trigger_time=1351624983
}
hostdowntime {
host_name=2688localhost-test-p2
downtime_id=64
entry_time=1351624983
start_time=1351624960
end_time=1351628560
triggered_by=63
fixed=1
duration=3600
is_in_effect=1
author=icingademo
comment=test2688
trigger_time=1351624983
}
hostdowntime {
host_name=2688localhost-test
downtime_id=63
entry_time=1351624983
start_time=1351624960
end_time=1351628560
triggered_by=0
fixed=1
duration=3600
is_in_effect=1
author=icingademo
comment=test2688
trigger_time=1351624983
}

status_dat_after_second_start

hoststatus {
        host_name=2688localhost-test

        scheduled_downtime_depth=1
        }

hoststatus {
        host_name=2688localhost-test-p1

        scheduled_downtime_depth=0
        }

hoststatus {
        host_name=2688localhost-test-p1-2

        scheduled_downtime_depth=0
        }


hoststatus {
        host_name=2688localhost-test-p2

        scheduled_downtime_depth=0
        }

retention_dat_after_third_stop

hostdowntime {
host_name=2688localhost-test
downtime_id=63
entry_time=1351624983
start_time=1351624960
end_time=1351628560
triggered_by=0
fixed=1
duration=3600
is_in_effect=1
author=icingademo
comment=test2688
trigger_time=1351624983
}

icinga-migration · 2012-10-30T20:23:19Z

Updated by mfriedrich on 2012-10-30 20:23:19 +00:00

so, as it's a bit late today - i can reproduce and see it, but i am not sure where this exactly is being hit, or ignored. might need some deep down debug sessions.

icinga-migration · 2013-02-28T15:44:21Z

Updated by mlucka on 2013-02-28 15:44:21 +00:00

File added 99_fix_triggered_downtimes.dpatch

Hallo,

anbei die gesammelten Werke eines Kollegen, der von Nagios auf Icinga 1.7.1 (Debian 7) umzustellen versucht. Mit der Bitte um Prüfung und Integration des Patches.

Grüße, Micha.

beigefügt ein Patch, der das Downtime Problem bei Icinga
behebt (gegen icinga 1.7.1, die entsprechende Stelle ist
im aktuellen Git aber identisch).

Problem ist:

Eine Child-Downtime wird beim Einlesen der retention.dat/status.dat
nicht übernommen, wenn die Parent-Downtime ("Trigger ID" in der
klassischen GUI) nicht existiert.

Zitat common/downtime.c:add_downtime
/* don't add triggered downtimes that don't have a valid parent */

Die Downtimes werden zeitlich aufsteigend und ansonsten "ungünstig"
sortiert und nicht -- wie im Nagios bisher -- nach der downtime_id
aufsteigend.
Deshalb werden die Child- vor den Parent-Downtimes in die
status.dat gespeichert (sieht man auch in der klassischen UI).

Die retention.dat/status.dat wird aber nur sequentiell gelesen
und bearbeitet. Deshalb wird versucht, die Cild-Downtimes zuerst
zu erstellen.

Annahmen (des Patches):

Parent- und Child-Downtimes haben die gleiche Start-Zeit
downtime_id wird numerisch aufsteigend vergeben, Parent/Trigger ID
vor dem Child

Randbemerkung: Warum die Sortier-Funktion so programmiert war
wie sie war, ist etwas unverständlich:
Statt
(d1~~start_time < d2>start_time) ? ~~1 : (d1~~>start_time - d2->start_time);
hätte auch
(d1start_time < d2~~>start_time)
ausgereicht.

icinga-migration · 2013-02-28T16:41:49Z

Updated by mlucka on 2013-02-28 16:41:49 +00:00

KORREKTUR

Statt
(d1~~start_time < d2>start_time) ? ~~1 : (d1~~>start_time - d2->start_time);
hätte auch
(d1start_time~~ d2->start_time)
ausgereicht

icinga-migration · 2013-03-04T19:58:27Z

Updated by mfriedrich on 2013-03-04 19:58:27 +00:00

that looks like a hell of an idea, thanks.

once i get a little more dev time, i'll try re-think and test it.

icinga-migration · 2013-03-10T14:51:59Z

Updated by mfriedrich on 2013-03-10 14:51:59 +00:00

File added icinga_1.9_fix_child_downtimes_2688.png

currently exists in my mfriedrich/core dev branch.

a final test after committing shows that the child triggered downtimes are still there.

icinga-migration · 2013-03-13T23:18:34Z

Updated by mfriedrich on 2013-03-13 23:18:34 +00:00

Status changed from Assigned to 7
Done % changed from 0 to 70

icinga-migration · 2013-03-13T23:20:41Z

Updated by mfriedrich on 2013-03-13 23:20:41 +00:00

used the wrong commit id
https://dev.icinga.org/projects/icinga-core/repository/revisions/161d5117fa585e6cbbbef27b51a6701dfa2a8eeb

icinga-migration · 2013-04-06T22:02:32Z

Updated by mfriedrich on 2013-04-06 22:02:32 +00:00

@MLucka

are you able to test current git master/next?

icinga-migration · 2013-04-10T18:48:09Z

Updated by mfriedrich on 2013-04-10 18:48:09 +00:00

Status changed from 7 to Resolved
Done % changed from 70 to 100

icinga-migration closed this as completed Apr 10, 2013

icinga-migration added bug Downtimes labels Jan 17, 2017

icinga-migration added this to the 1.9 milestone Jan 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dev.icinga.com #2688] triggered downtimes for child hosts are missing after icinga restart #1006

[dev.icinga.com #2688] triggered downtimes for child hosts are missing after icinga restart #1006

icinga-migration commented Jun 14, 2012

icinga-migration commented Jun 14, 2012

icinga-migration commented Jun 14, 2012

icinga-migration commented Jun 14, 2012

icinga-migration commented Jun 14, 2012

icinga-migration commented Jun 15, 2012

icinga-migration commented Jun 15, 2012

icinga-migration commented Oct 1, 2012

icinga-migration commented Oct 2, 2012

icinga-migration commented Oct 24, 2012

icinga-migration commented Oct 30, 2012

icinga-migration commented Oct 30, 2012

icinga-migration commented Feb 28, 2013

icinga-migration commented Feb 28, 2013

icinga-migration commented Mar 4, 2013

icinga-migration commented Mar 10, 2013

icinga-migration commented Mar 13, 2013

icinga-migration commented Mar 13, 2013

icinga-migration commented Apr 6, 2013

icinga-migration commented Apr 10, 2013

[dev.icinga.com #2688] triggered downtimes for child hosts are missing after icinga restart #1006

[dev.icinga.com #2688] triggered downtimes for child hosts are missing after icinga restart #1006

Comments

icinga-migration commented Jun 14, 2012

icinga-migration commented Jun 14, 2012

icinga-migration commented Jun 14, 2012

icinga-migration commented Jun 14, 2012

icinga-migration commented Jun 14, 2012

icinga-migration commented Jun 15, 2012

icinga-migration commented Jun 15, 2012

icinga-migration commented Oct 1, 2012

icinga-migration commented Oct 2, 2012

icinga-migration commented Oct 24, 2012

icinga-migration commented Oct 30, 2012

icinga-migration commented Oct 30, 2012

icinga-migration commented Feb 28, 2013

icinga-migration commented Feb 28, 2013

icinga-migration commented Mar 4, 2013

icinga-migration commented Mar 10, 2013

icinga-migration commented Mar 13, 2013

icinga-migration commented Mar 13, 2013

icinga-migration commented Apr 6, 2013

icinga-migration commented Apr 10, 2013