[dev.icinga.com #13215] Newly configured already running ScheduledDowntime not put into effect #4790

icinga-migration · 2016-11-16T14:41:29Z

This issue has been migrated from Redmine: https://dev.icinga.com/issues/13215

Created by edgar_fuss on 2016-11-16 14:41:29 +00:00

Assignee: (none)
Status: New
Target Version: (none)
Last Update: 2017-01-10 13:22:05 +00:00 (in Redmine)

Icinga Version: 2.5.4
Backport?: Not yet backported
Include in Changelog: 1

When you start Icinga2 while a newly configured ScheduledDowntime should be in effect, it won't.
Basically, the same goes for already existing downtimes, but the problem is normaly hidden by cacheing.

The problem is that scheduled downtimes seem to be sort-of piggy-backed on top of other downtimes. There's a timer-controlled procedure running once a minute queueing the next scheduled runtime, and it only consideres segments in the future.

I must admit I neither fully understand the underlying code nor all of it's intentions. In particular, it's not clear to me if overlapping downtimes (on the same object) are possible/intended and if yes, whether the behaviour is defined. I would expect them all to be put into effect; after reading the code, I guess that if you have three scheduled downtimes, where 1 and 3 overlap and 2 overlaps that overlapping interval, 3 is never put into effect.

Nevertheless, I can think of two ways to work around/solve the problem, which I both have implemented and somewhat tested; patches attached:

In both LegacyTimePeriod::FindNextSegment() and ScheduledDowntime::FindNextSegment(), first try to find an already-running segment before looking for segents in the future. Add a minEnd parameter to the latter routine so that ScheduledDowntime::CreateNextDowntime() can avoid an already-active downtime (or another ending earlier) to be queued over and over again.
This seems to work, but I'm not sure about the implications for other downtimes. However, ScheduledDowntime::FindNextSegment() seems to be the only caller of LegacyTimePeriod::FindNextSegment().
After loading the configuration, explicitly look for a downtime that should already be running and queue it before the timer queues a segment to come. This has the advantage of being explicit in what to do, but (as implemented by me) leads to an already-running downtime being added twice if already cached. It also needs a close-to duplication of the two FindNextSegment() routines (as FindRunningSegment()).

Various remarks on things I stumbled over while trying to fix this:

The code seems to be undecisive on "exactly now", i.e. time comparisons evaluating to equal.

In ScheduledDowntime::FindNextSegment(), the following contains dead code:

 if (segment.first == 0 && segment.second == 0) {
     tm reference = Utility::LocalTime(Utility::GetTime());
     reference.tm_mday++;
     reference.tm_hour = 0;
     reference.tm_min = 0;
     reference.tm_sec = 0;

     return;
 }

I don't get how reference can ever be used.

I don't understand what the two-pass code in LegacyTimePeriod::FindNextSegment() is supposed to do. Unless stride is non-zero, the first IsInTimeRange() should always evaluate to true (iter = begin) and I can't make up a documented case where stride is non-zero (you would need to put smething like "friday/2" in a ScheduledDowntime Object's dictionary key, I guess).
ScheduledDowntime::FindNextSegment() disregards segments which are alredy running but my impression is that LegacyTimePeriod::FindNextSegment() doesn't deliver them.

In ScheduledDowntime::FindNextSegment(), there's a debug message

     Log(LogDebug, "ScheduledDowntime")
         << "Evaluating segment: " << kv.first << ": " << kv.second << " at ";

where I don't understand what the " at " is supposed to mean.

In ScheduledDowntime::CreateNextDowntime(),

 BOOST_FOREACH(const Downtime::Ptr& downtime, GetCheckable()->GetDowntimes()) {
     if (downtime->GetScheduledBy() != GetName() ||
         downtime->GetStartTime() < Utility::GetTime())
         continue;

     /* We've found a downtime that is owned by us and that hasn't started yet - we're done. */
     return;
 }

seems a rather involved way of writing

    BOOST_FOREACH(const Downtime::Ptr& downtime, GetCheckable()->GetDowntimes()) {
        if (downtime->GetScheduledBy() == GetName() &&
            downtime->GetStartTime() >= Utility::GetTime())
            /* We've found a downtime that is owned by us and that hasn't started yet - we're done. */
            return;
    }

Attachments

NextRunning edgar_fuss - 2016-11-16 14:29:54 +00:00 - Variant 1 (also consider running segments in FindNextSegment)
RunningSegment edgar_fuss - 2016-11-16 14:30:01 +00:00 - Variant 2 (explicitly queue running segments after configuration load)

The text was updated successfully, but these errors were encountered:

icinga-migration · 2017-01-10T13:22:05Z

Updated by tombstone on 2017-01-10 13:22:05 +00:00

I am experiencing the same issue with newly ScheduledDowntime configured with start time already in past (and end time in future) not being put into effect...very annoying.. :-/ and actually hard to work around..

marcofl · 2017-02-01T07:47:43Z

Same here: We have some test (not in production) server which we automatically put into Downtime by a configured (via puppet) downtime with a 24x7 range:

      'monday' => '00:00-24:00',
      'tuesday' => '00:00-24:00',
      'wednesday' => '00:00-24:00',
      'thursday' => '00:00-24:00',
      'friday' => '00:00-24:00',
      'saturday' => '00:00-24:00',
      'sunday' => '00:00-24:00',

on the first icinga2 reload, one downtime gets created for tomorrow. but it does not set a downtime for today / now (even if "now" is clearly included in the range). on the next day, there is a 2nd downtime starting tomorrow and so on, means there are always 2 downtimes per host and all services of the host.

is there really no way to have a configured downtime for more than one day? I think this is somehow related to how the range is specified, but I could not find a way to do something like monday-sundayrange. you can create a downtime in icingaweb2 which spans over multiple days, weeks, etc. and starts "now". so this is a bit strange to me.

efuss · 2018-02-07T18:42:52Z

I've refactored mentioned Variant 1 (NextRunning), moving the “consider an already running segment” code into separate FindRunningSegment() functions.
Additionally, I've added code to subsequently merge upcoming adjacent downtime segments into an already running one. This prevents downtimes spanning midnight from ending/beginning and stops gratuitous notifications.
Note that the patch's fourth hunk looks strange because it first turns the existing ScheduledDowntime::FindNextSegment() into the new FindRunningSegment() and then adds the original FindNextSegment() again.

dnsmichi · 2018-02-15T07:29:00Z

@efuss

Can you turn that in a PR please? The patch is really hard to find when attached into an issue comment. Furthermore a PR allows for fine granular review and assignments.

efuss · 2018-08-29T17:05:50Z

I just created Pull Request #6579 which fixes this issue plus the one that downtimes spanning midnight technically are two downtimes, leading to spurious DowntimeEnd/DowntimeStart notifications.

mcktr · 2018-10-23T09:13:03Z

This is fixed in #6704.

icinga-migration added bug Something isn't working libicinga labels Jan 17, 2017

gunnarbeutner removed the libicinga label Feb 7, 2017

dnsmichi mentioned this issue Jun 20, 2017

Change in ranges for ScheduledDowntime is not applied. #5352

Closed

dnsmichi added the low-priority label Jan 16, 2018

dgoetz mentioned this issue Feb 15, 2018

ScheduledDowntime objects do create a new Downtime immediately #6091

Closed

dnsmichi added the area/notifications Notification events label Jul 3, 2018

Crunsher mentioned this issue Sep 4, 2018

[dev.icinga.com #12540] ScheduledDowntime always use next day #4561

Closed

mcktr mentioned this issue Oct 19, 2018

Put newly configured already running ScheduledDowntime immediately in effect #6704

Merged

mcktr closed this as completed Oct 23, 2018

mcktr added this to the 2.11.0 milestone Oct 24, 2018

dnsmichi modified the milestones: 2.11.0, 2.10.3 Dec 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dev.icinga.com #13215] Newly configured already running ScheduledDowntime not put into effect #4790

[dev.icinga.com #13215] Newly configured already running ScheduledDowntime not put into effect #4790

icinga-migration commented Nov 16, 2016

icinga-migration commented Jan 10, 2017

marcofl commented Feb 1, 2017

efuss commented Feb 7, 2018

dnsmichi commented Feb 15, 2018

efuss commented Aug 29, 2018

mcktr commented Oct 23, 2018

[dev.icinga.com #13215] Newly configured already running ScheduledDowntime not put into effect #4790

[dev.icinga.com #13215] Newly configured already running ScheduledDowntime not put into effect #4790

Comments

icinga-migration commented Nov 16, 2016

icinga-migration commented Jan 10, 2017

marcofl commented Feb 1, 2017

efuss commented Feb 7, 2018

dnsmichi commented Feb 15, 2018

efuss commented Aug 29, 2018

mcktr commented Oct 23, 2018