[dev.icinga.com #2993] duplicate events when scheduling forced host|service check #1079
Comments
Updated by imriz on 2012-08-18 13:24:04 +00:00 I am referring to the schedule_service_check function in check.c, of course. Also, options & CHECK_OPTION_FORCE_EXECUTION never evaluates as true - should be options == CHECK_OPTION_FORCE_EXECUTION ? |
Updated by imriz on 2012-08-18 14:21:27 +00:00 Because temp_event is NULL, the code never reach the point where it assign new_event to svc->next_check_event. svc->next_check_event = new_event; should be moved to the place where the new_event is assigned with values (later down the function). |
Updated by imriz on 2012-08-18 14:30:29 +00:00 Just saw https://dev.icinga.org/issues/2676 This fix should probably be released before 1.8, as with busy configurations, with many people and commands from the UI, this issue could be a real problem. |
Updated by mfriedrich on 2012-08-18 14:50:10 +00:00 well i've seen some reports on the portal as well, but still i haven't been able to reproduce the bug itsself, only the fix, sourcing from another problem, remains logical. it might be related to #2964 as well, so i'd be happy if you could test the fix under your conditions and report back. i've prepared the possible fix as on-top 1.7.1 in git r1.7 |
Updated by imriz on 2012-08-18 14:59:16 +00:00 Hi, What about the second issue? shouldn't the bitwise AND be something like
The current if statement never evalutes as true (easy to reprocude - just enable debug logs, and see that it always logged as non-forced). |
Updated by mfriedrich on 2012-08-18 15:36:32 +00:00 that's a bitwise comparison against those options, yes.
why would
never result in true, if the bit was set accordingly? rather than checking the debug log, i would check if the gui (cmd.cgi logging, external commands) did send the correct options. btw
is the long version of
and
is wrong, as it compares only direct values. what if options contain other than the forced one, i.e. orphaned? |
Updated by imriz on 2012-08-18 21:50:55 +00:00
Hi, I've just tested the patch, and it doesn't solve the bug (maybe svc->next_check_event = new_event; is too early?). As for the bitwise AND, you're of course right - I mixed some debug output I've added :) |
Updated by imriz on 2012-08-18 22:22:33 +00:00
When icinga initially starts, nor the init_timing_loop() or schedule_new_event logic deals with populating the next_check_event field. |
Updated by imriz on 2012-08-19 07:20:15 +00:00 After re-testing, the patch at https://git.icinga.org/?p=icinga-core.git;a=commit;h=379b71295b4846195590350ccb309b3ec79212da works well, but we still need to deal with schedule_new_event() in events.c |
Updated by mfriedrich on 2012-08-19 10:12:37 +00:00
|
Updated by mfriedrich on 2012-08-19 14:58:28 +00:00 ok, to summarize that a bit.
that problem is somewhat complicated. |
Updated by imriz on 2012-08-19 15:08:02 +00:00 dnsmichi wrote:
After applying the change to events.c I can successfully force a service check before its first run. |
Updated by imriz on 2012-08-19 15:22:10 +00:00 [1345389657.812786] [2048.1] [pid=20073] Processing: 'INITIAL SERVICE STATE: isp-radius-1;CPU Load;$SERVICESTATE$;$SERVICESTATETYPE$;$SERVICEATTEMPT$;OK - load average: 0.00, 0.00, 0.00 The last line is my edited version of the log_debug_info in schedule_service_check() just after the if (temp_event != NULL) line. |
Updated by mfriedrich on 2012-08-19 16:31:33 +00:00 yep, i fell into the pointer to event_data, which is only casts to a service, setting an attribute. lemme see how this can be cleaned up. though, you're right on getting the value back into the event list, the problem to understand is the reverse logic - put the created new_event ptr onto (service *)event_data->next_check_event. i need to do a little more testing on that, will post later on. thanks for the help! |
Updated by mfriedrich on 2012-08-19 16:32:58 +00:00
|
Updated by mfriedrich on 2012-08-19 17:00:45 +00:00 patch as is works like a charm, also for hosts. only the debug output is pretty annoying. will include an enhanced version then. |
Updated by mfriedrich on 2012-08-19 17:02:42 +00:00
|
Updated by mfriedrich on 2012-08-19 17:07:51 +00:00
|
Updated by mfriedrich on 2012-08-19 17:27:29 +00:00 it's in my dev branch for 1.8 but i will cherry-pick it into r1.7 then. 1 week testing for 1.7.2 should be sufficient imho. https://git.icinga.org/?p=icinga-core.git;a=commit;h=ec9c5e3 |
Updated by mfriedrich on 2012-08-19 17:54:48 +00:00
|
Updated by mfriedrich on 2012-08-25 13:36:10 +00:00
|
Updated by mfriedrich on 2012-08-27 12:49:54 +00:00
thanks for your awesome help, release 1.7.2 will be out soon. |
Updated by mfriedrich on 2012-08-27 12:50:34 +00:00 http://www.monitoring-portal.org/wbb/index.php?page=Thread&postID=176407#post176407 |
This issue has been migrated from Redmine: https://dev.icinga.com/issues/2993
Created by imriz on 2012-08-18 13:21:14 +00:00
Assignee: mfriedrich
Status: Resolved (closed on 2012-08-27 12:49:54 +00:00)
Target Version: 1.7.2
Last Update: 2012-08-27 12:50:34 +00:00 (in Redmine)
Hi,
When you submit a forced service check, temp_event is always NULL, and therefore the previous event is not removed, resulting with multiple events for the same service, running at different schedules.
Attachments
Changesets
2012-08-18 14:37:52 +00:00 by mfriedrich 379b712
2012-08-19 17:09:21 +00:00 by mfriedrich ec9c5e3
2012-08-19 17:29:57 +00:00 by mfriedrich f32fbf8
Relations:
The text was updated successfully, but these errors were encountered: