[dev.icinga.com #1978] read last_program_stop from retention.dat and use that for freshness calculations on startup instead of event_time #755

icinga-migration · 2011-10-06T11:23:50Z

This issue has been migrated from Redmine: https://dev.icinga.com/issues/1978

Created by mfriedrich on 2011-10-06 11:23:50 +00:00

Assignee: mfriedrich
Status: Closed (closed on 2012-08-22 16:31:06 +00:00)
Target Version: (none)
Last Update: 2012-08-22 16:31:06 +00:00 (in Redmine)

this is a pretty epic idea, because long shutdown icinga cores will have the problem that the freshness checks on startup is being dependent on the expiration time.

is_service_result_fresh 

if(temp_service->has_been_checked == FALSE)
      expiration_time = (time_t)(event_start + freshness_threshold);

which then results in

/* the results for the last check of this service are stale */
if(expiration_time < current_time) {

the main problem with this attempt - if there is no retention.dat this logic would fail then being changed in this way. a not accurate solution would be to always write retention.dat - as we need that currently. or introduce a token to indicate the program stop either way. but it should be added to the docs that retained state information now also contains the indication for the last program stop and will be therefore marked mandatory for freshness checks on passive checks (i.e. on passive slaves in distributed setups).

https://github.com/dnsmichi/nagios-fixed/commit/472d92ac81218f85c81571e31963545ebec7a988
https://github.com/dnsmichi/nagios-fixed/commit/8a8238f37a46f2ca73bebcf728a610385d49acd4

Relations:

relates #2136

The text was updated successfully, but these errors were encountered:

icinga-migration · 2011-11-11T15:10:51Z

Updated by mfriedrich on 2011-11-11 15:10:51 +00:00

Category set to Passive Checks
Status changed from New to Resolved
Assigned to set to mfriedrich
Target Version set to 1.6
Done % changed from 0 to 100

icinga-migration · 2011-12-02T15:47:02Z

Updated by mfriedrich on 2011-12-02 15:47:02 +00:00

Status changed from Resolved to Feedback
Target Version deleted ~~1.6~~
Done % changed from 100 to 0

this is the cause for #2136, needs a proper rework and tested version.

icinga-migration · 2011-12-08T12:26:15Z

Updated by mfriedrich on 2011-12-08 12:26:15 +00:00

as analyzed at first glance in #2136, the program_stop+60 remains the wrong assumption in that case.

possible fix below, needs deeper testing.

Revision: 1848
          http://nagios.svn.sourceforge.net/nagios/?rev=1848&view=rev
Author:   ageric
Date:     2011-12-08 12:12:02 +0000 (Thu, 08 Dec 2011)
Log Message:
-----------
core: Fix passive check result freshness test after restart

The last version of the code to avoid sending notifications about stale
checks on start confused event_start and last_check - it would trigger
whenever nagios took less than 60 seconds to start, and it had been
turned off for some time before, and would override the last check
timestamp with the nagios start time.

Signed-off-by: Robin Sonefors 

Modified Paths:
--------------
    nagioscore/trunk/base/checks.c

Modified: nagioscore/trunk/base/checks.c
===================================================================
--- nagioscore/trunk/base/checks.c  2011-12-08 11:39:34 UTC (rev 1847)
+++ nagioscore/trunk/base/checks.c  2011-12-08 12:12:02 UTC (rev 1848)
@@ -2093,15 +2093,15 @@
     * If the check was last done passively, we assume it's going
     * to continue that way and we need to handle the fact that
     * Nagios might have been shut off for quite a long time. If so,
-    * we mustn't spam freshness notifications but use program_start_time
+    * we mustn't spam freshness notifications but use event_start
     * instead of last_check to determine freshness expiration time.
     * The threshold for "long time" is determined as 61.8% of the normal
     * freshness threshold based on vast heuristical research (ie, "some
     * guy once told me the golden ratio is good for loads of stuff").
     */
    if (temp_service->check_type == SERVICE_CHECK_PASSIVE) {
-       if (event_start < program_start + 60 &&
-           event_start - last_program_stop < (freshness_threshold * 0.618))
+       if (temp_service->last_check < event_start &&
+           event_start - last_program_stop < freshness_threshold * 0.618)
        {
            expiration_time = event_start + freshness_threshold;
        }
@@ -2521,15 +2521,15 @@
     * If the check was last done passively, we assume it's going
     * to continue that way and we need to handle the fact that
     * Nagios might have been shut off for quite a long time. If so,
-    * we mustn't spam freshness notifications but use program_start_time
+    * we mustn't spam freshness notifications but use event_start
     * instead of last_check to determine freshness expiration time.
     * The threshold for "long time" is determined as 61.8% of the normal
     * freshness threshold based on vast heuristical research (ie, "some
     * guy once told me the golden ratio is good for loads of stuff").
     */
    if (temp_host->check_type == HOST_CHECK_PASSIVE) {
-       if (event_start < program_start + 60 &&
-           event_start - last_program_stop < (freshness_threshold * 0.618))
+       if (temp_host->last_check < event_start &&
+           event_start - last_program_stop > freshness_threshold * 0.618)
        {
            expiration_time = event_start + freshness_threshold;
        }

This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.

icinga-migration · 2012-08-22T16:31:06Z

Updated by mfriedrich on 2012-08-22 16:31:06 +00:00

Status changed from Feedback to Closed

i don't see the need for that.

icinga-migration closed this as completed Aug 22, 2012

icinga-migration added enhancement Passive Checks labels Jan 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dev.icinga.com #1978] read last_program_stop from retention.dat and use that for freshness calculations on startup instead of event_time #755

[dev.icinga.com #1978] read last_program_stop from retention.dat and use that for freshness calculations on startup instead of event_time #755

icinga-migration commented Oct 6, 2011

icinga-migration commented Nov 11, 2011

icinga-migration commented Dec 2, 2011

icinga-migration commented Dec 8, 2011

icinga-migration commented Aug 22, 2012

[dev.icinga.com #1978] read last_program_stop from retention.dat and use that for freshness calculations on startup instead of event_time #755

[dev.icinga.com #1978] read last_program_stop from retention.dat and use that for freshness calculations on startup instead of event_time #755

Comments

icinga-migration commented Oct 6, 2011

icinga-migration commented Nov 11, 2011

icinga-migration commented Dec 2, 2011

icinga-migration commented Dec 8, 2011

icinga-migration commented Aug 22, 2012