Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev.icinga.com #13215] Newly configured already running ScheduledDowntime not put into effect #4790

Closed
icinga-migration opened this issue Nov 16, 2016 · 6 comments
Labels
area/notifications Notification events bug Something isn't working
Milestone

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/13215

Created by edgar_fuss on 2016-11-16 14:41:29 +00:00

Assignee: (none)
Status: New
Target Version: (none)
Last Update: 2017-01-10 13:22:05 +00:00 (in Redmine)

Icinga Version: 2.5.4
Backport?: Not yet backported
Include in Changelog: 1

When you start Icinga2 while a newly configured ScheduledDowntime should be in effect, it won't.
Basically, the same goes for already existing downtimes, but the problem is normaly hidden by cacheing.

The problem is that scheduled downtimes seem to be sort-of piggy-backed on top of other downtimes. There's a timer-controlled procedure running once a minute queueing the next scheduled runtime, and it only consideres segments in the future.

I must admit I neither fully understand the underlying code nor all of it's intentions. In particular, it's not clear to me if overlapping downtimes (on the same object) are possible/intended and if yes, whether the behaviour is defined. I would expect them all to be put into effect; after reading the code, I guess that if you have three scheduled downtimes, where 1 and 3 overlap and 2 overlaps that overlapping interval, 3 is never put into effect.

Nevertheless, I can think of two ways to work around/solve the problem, which I both have implemented and somewhat tested; patches attached:

  1. In both LegacyTimePeriod::FindNextSegment() and ScheduledDowntime::FindNextSegment(), first try to find an already-running segment before looking for segents in the future. Add a minEnd parameter to the latter routine so that ScheduledDowntime::CreateNextDowntime() can avoid an already-active downtime (or another ending earlier) to be queued over and over again.
    This seems to work, but I'm not sure about the implications for other downtimes. However, ScheduledDowntime::FindNextSegment() seems to be the only caller of LegacyTimePeriod::FindNextSegment().

  2. After loading the configuration, explicitly look for a downtime that should already be running and queue it before the timer queues a segment to come. This has the advantage of being explicit in what to do, but (as implemented by me) leads to an already-running downtime being added twice if already cached. It also needs a close-to duplication of the two FindNextSegment() routines (as FindRunningSegment()).

Various remarks on things I stumbled over while trying to fix this:

  1. The code seems to be undecisive on "exactly now", i.e. time comparisons evaluating to equal.

  2. In ScheduledDowntime::FindNextSegment(), the following contains dead code:

     if (segment.first == 0 && segment.second == 0) {
         tm reference = Utility::LocalTime(Utility::GetTime());
         reference.tm_mday++;
         reference.tm_hour = 0;
         reference.tm_min = 0;
         reference.tm_sec = 0;
    
         return;
     }
    

I don't get how reference can ever be used.

  1. I don't understand what the two-pass code in LegacyTimePeriod::FindNextSegment() is supposed to do. Unless stride is non-zero, the first IsInTimeRange() should always evaluate to true (iter = begin) and I can't make up a documented case where stride is non-zero (you would need to put smething like "friday/2" in a ScheduledDowntime Object's dictionary key, I guess).

  2. ScheduledDowntime::FindNextSegment() disregards segments which are alredy running but my impression is that LegacyTimePeriod::FindNextSegment() doesn't deliver them.

  3. In ScheduledDowntime::FindNextSegment(), there's a debug message

         Log(LogDebug, "ScheduledDowntime")
             << "Evaluating segment: " << kv.first << ": " << kv.second << " at ";
    

where I don't understand what the " at " is supposed to mean.

  1. In ScheduledDowntime::CreateNextDowntime(),

     BOOST_FOREACH(const Downtime::Ptr& downtime, GetCheckable()->GetDowntimes()) {
         if (downtime->GetScheduledBy() != GetName() ||
             downtime->GetStartTime() < Utility::GetTime())
             continue;
    
         /* We've found a downtime that is owned by us and that hasn't started yet - we're done. */
         return;
     }
    

seems a rather involved way of writing

    BOOST_FOREACH(const Downtime::Ptr& downtime, GetCheckable()->GetDowntimes()) {
        if (downtime->GetScheduledBy() == GetName() &&
            downtime->GetStartTime() >= Utility::GetTime())
            /* We've found a downtime that is owned by us and that hasn't started yet - we're done. */
            return;
    }

Attachments

  • NextRunning edgar_fuss - 2016-11-16 14:29:54 +00:00 - Variant 1 (also consider running segments in FindNextSegment)
  • RunningSegment edgar_fuss - 2016-11-16 14:30:01 +00:00 - Variant 2 (explicitly queue running segments after configuration load)
@icinga-migration
Copy link
Author

Updated by tombstone on 2017-01-10 13:22:05 +00:00

I am experiencing the same issue with newly ScheduledDowntime configured with start time already in past (and end time in future) not being put into effect...very annoying.. :-/ and actually hard to work around..

@icinga-migration icinga-migration added bug Something isn't working libicinga labels Jan 17, 2017
@marcofl
Copy link

marcofl commented Feb 1, 2017

Same here: We have some test (not in production) server which we automatically put into Downtime by a configured (via puppet) downtime with a 24x7 range:

      'monday' => '00:00-24:00',
      'tuesday' => '00:00-24:00',
      'wednesday' => '00:00-24:00',
      'thursday' => '00:00-24:00',
      'friday' => '00:00-24:00',
      'saturday' => '00:00-24:00',
      'sunday' => '00:00-24:00',

on the first icinga2 reload, one downtime gets created for tomorrow. but it does not set a downtime for today / now (even if "now" is clearly included in the range). on the next day, there is a 2nd downtime starting tomorrow and so on, means there are always 2 downtimes per host and all services of the host.

is there really no way to have a configured downtime for more than one day? I think this is somehow related to how the range is specified, but I could not find a way to do something like monday-sundayrange. you can create a downtime in icingaweb2 which spans over multiple days, weeks, etc. and starts "now". so this is a bit strange to me.

@efuss
Copy link
Contributor

efuss commented Feb 7, 2018

I've refactored mentioned Variant 1 (NextRunning), moving the “consider an already running segment” code into separate FindRunningSegment() functions.
Additionally, I've added code to subsequently merge upcoming adjacent downtime segments into an already running one. This prevents downtimes spanning midnight from ending/beginning and stops gratuitous notifications.
Note that the patch's fourth hunk looks strange because it first turns the existing ScheduledDowntime::FindNextSegment() into the new FindRunningSegment() and then adds the original FindNextSegment() again.

@dnsmichi
Copy link
Contributor

@efuss

Can you turn that in a PR please? The patch is really hard to find when attached into an issue comment. Furthermore a PR allows for fine granular review and assignments.

@dnsmichi dnsmichi added the area/notifications Notification events label Jul 3, 2018
@efuss
Copy link
Contributor

efuss commented Aug 29, 2018

I just created Pull Request #6579 which fixes this issue plus the one that downtimes spanning midnight technically are two downtimes, leading to spurious DowntimeEnd/DowntimeStart notifications.

@mcktr
Copy link
Member

mcktr commented Oct 23, 2018

This is fixed in #6704.

@mcktr mcktr closed this as completed Oct 23, 2018
@mcktr mcktr added this to the 2.11.0 milestone Oct 24, 2018
@dnsmichi dnsmichi modified the milestones: 2.11.0, 2.10.3 Dec 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/notifications Notification events bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants