Skip to content
This repository has been archived by the owner on Jan 15, 2019. It is now read-only.

[dev.icinga.com #4809] no broker event is created when old/stale downtimes are wiped from the core data #1353

Closed
icinga-migration opened this issue Oct 7, 2013 · 7 comments

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/4809

Created by mfrosch on 2013-10-07 11:01:09 +00:00

Assignee: (none)
Status: Rejected (closed on 2015-02-15 01:08:29 +00:00)
Target Version: (none)
Last Update: 2015-02-15 01:08:29 +00:00 (in Redmine)

Icinga Version: 1.9.3
OS Version: all

Sometimes Icinga don't catch the end of a downtime, e.g. when Icinga is don't at that moment.

The downtime itself stays inside the core and status.dat for a bit, and even the info comment.

After some time the core seems to cleanup that data, but no broker event is generated, so idomod doesn't know the downtime is now gone.

Considering for 1.10, if there is enough time to track it down.

The bug was initially opened against Icinga Web (#3822), and also see #4808 for not clearing that data on startup.


Relations:

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2013-10-07 20:01:21 +00:00

likely related but not a fix imho: https://github.com/dnsmichi/nagioscore/commit/b81d8280c801ac18e49838a541d049b0c201b736

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2013-10-07 20:49:33 +00:00

the event for expiring a downtime is only scheduled for flexible downtimes which may never trigger and therefore not being active within their start-end window (EVENT_EXPIRE_DOWNTIME). for fixed downtimes it does not make much sense as they get removed at their end_time anyways.

deleting a downtime happens in delete_service_downtime() which removes the downtime from the list in memory and also triggers a neb callback with broker_downtime_data(NEBTYPE_DOWNTIME_DELETE - the status.dat update isn't done immediately but left for the aggregated status update

there may of course happen an error, which is not respected in its return code in delete_{host,service}_downtime() ... the status update calls are pretty useless afterall.

/******************************************************************/
/********************** DELETION FUNCTIONS ************************/
/******************************************************************/

/* deletes a scheduled host downtime entry */
int xdddefault_delete_host_downtime(unsigned long downtime_id) {
        int result;

        result = xdddefault_delete_downtime(HOST_DOWNTIME, downtime_id);

        return result;
}


/* deletes a scheduled service downtime entry */
int xdddefault_delete_service_downtime(unsigned long downtime_id) {
        int result;

        result = xdddefault_delete_downtime(SERVICE_DOWNTIME, downtime_id);

        return result;
}


/* deletes a scheduled host or service downtime entry */
int xdddefault_delete_downtime(int type, unsigned long downtime_id) {

        /* rewrite the downtime file (downtime was already removed from memory) */
        xdddefault_save_downtime_data();

        return OK;
}



/******************************************************************/
/****************** DOWNTIME OUTPUT FUNCTIONS *********************/
/******************************************************************/

/* writes downtime data to file */
int xdddefault_save_downtime_data(void) {

        /* don't update the status file now (too inefficent), let aggregated status updates do it */
        return OK;
}

would be interesting to get a reproducible sample downtime, as well as (debug) logs for that.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2013-10-08 08:38:50 +00:00

  • Target Version changed from 1.10 to 1.11

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2013-10-08 17:40:06 +00:00

imho the other issue should take care of the general wipe/insert problem. this one here is special and i am not sure if those events can be triggered accurately given the information provided after reading retention.dat

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2014-01-25 16:23:24 +00:00

  • Status changed from Assigned to Feedback
  • Target Version deleted 1.11

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2014-01-27 19:24:00 +00:00

maybe helps

naemon/naemon-core@4fa9af5

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-02-15 01:08:29 +00:00

  • Status changed from Feedback to Rejected
  • Assigned to deleted mfrosch

I'm unable to reproduce this. During startup old stale downtimes are deleted, and therefore the event broker is triggered on every deletion.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant