Skip to content
This repository has been archived by the owner on Jan 15, 2019. It is now read-only.

[dev.icinga.com #306] state-based escalation ranges #130

Closed
icinga-migration opened this issue Feb 26, 2010 · 7 comments
Closed

[dev.icinga.com #306] state-based escalation ranges #130

icinga-migration opened this issue Feb 26, 2010 · 7 comments

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/306

Created by mfriedrich on 2010-02-26 22:59:03 +00:00

Assignee: magellanic
Status: Resolved (closed on 2010-05-21 10:25:06 +00:00)
Target Version: 1.0.2
Last Update: 2010-05-21 10:25:06 +00:00 (in Redmine)


from nagios-devel

This patch is to address the issue I asked about in this thread:

http://article.gmane.org/gmane.network.nagios.user/65141

Currently, Nagios does not distinguish between warnings and criticals in service escalations.  This can cause problems with escalation chains as shown by this example.

   define serviceescalation {
      host         hostname
      service     servicename
      first_notification 3
      last_notification 0
      escalation_options c,u,r
   }

Currently, a service that is in WARNING for 3 notifications, then enters CRITICAL will match this service escalation.  The behavior I am looking for (and was expecting) is that after the 3rd critical or unknown, match this escalation.

The attached patch (patches cleanly to 3.0.6 and HEAD as of yesterday) adds the ability to specify service escalations that match after a specified number of critical or warning notifications.  IE:

   define serviceescalation {
      host         hostname
      service     servicename
      first_critical_notification 3
      last_critical_notification 0
      escalation_options c,u,r
   }

The patch adds 4 configuration directives to service escalations definitions:

   first_warning_notification #
   last_warning_notification #
   first_critical_notification #
   last_critical_notification #

Behavior is identical to (first|last)_notification, except that they check against the count of warning/critical notifications instead of the number of total notifications.

The behavior of the current directives is unchanged.  Existing deployments should not need to be modified with this patch applied.

I've run some tests for this patch off the 3.0.6 stable release, and it seems to be working fine.  Ran overnight without any complaints from the logs, and the behavior is as I expect it to be.

Suggestions for improvements welcome.

-Gius 

-----------

Hi,

I really like this but what about unknown state notifications which
might be used?

What about hostescalations? Would you patch them too?

- -
Hendrik

-----------

I thought of unknowns right after I sent the patch.  We don't really use 
them in our deployment, so I wasn't thinking about them during 
implementation.  It's easy enough to add.
> > What about hostescalations? Would you patch them too?
> >   
I was going to question the usefulness of this, but they can be both 
"down" and "unreachable."  I'll get an updated patch up sometime next week.

I'd like some feedback on how I modified the CGIs to display the new 
variables.  I was hesitant to add a bunch more columns to that table, 
considering it's already pretty large, so I just put all 4 of the new 
thresholds in the same table entry (all, warn, crit, unknown).  Putting 
all of the thresholds in the same column is unclear without reading the 
source.  I can add new table columns if that's the "proper" thing to 
do.  Or if somebody has a solution I'm not thinking of I'd love for the 
config output to be very clear.

-Gius

-----------

And now I'm done.  I've done some sanity checks on this (retention, cgi behavior, notification behavior, objects.cache).  I'm running this code on our testing instance now, and I'll report back if something awful happens.

The Patch adds the following directives to host escalations

   first_down_notification #
   last_down_notification #
   first_unreachable_notification #
   last_unreachable_notification #

Behavior is identical to (first_last)_notification, except that they check against the count of down/unreachable notifications instead of the total.

I've also added directives to service escalations to handle unknown states

   first_unknown_notification #
   last_unknown_notification #

-Gius 

-----------

I've now done testing on the new directives, they behave as I expect.  I've tested against a patched 3.0.6 with no modifications to original config files and they retain their original behavior as well.  This new patch contains updates to the html documentation (which for some reason doesn't patch cleanly against 3.0.6.  patches to code still patches clean) as well.

I haven't heard anything from the devs since the last time I posted.  If this is in a queue somewhere slated to be looked at that's fine, I just want to make sure that there isn't something I should fix up to make the patch eligible for integration.

-Gius 

Attachments

Changesets

2010-05-25 11:06:36 +00:00 by (unknown) 59eeccb

add state-based escalation ranges (Mark Gius)

fixes #306

2010-06-28 22:16:05 +00:00 by mfriedrich 53b0014

make state based escalation ranges optional by configure

currently, the object definitions used by mk_livestatus
are directly copied from nagios 3.2.0 which leads to the
problem that different exported symbols and variables are
expected.

the state based escalation ranges change that, and this
will lead into mk_livestatus throwing a segfault and
producing a core dump.

in order to give the mk_livestatus developer more time to
resolve this issue, the original patch for #306 has been
reworked into optional selection through configure.

this will be changed when mk_livestatus becomes ready
to fully support icinga core.

refs #306
refs #531
refs #535

Relations:

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2010-02-26 22:59:55 +00:00

  • File added nagios.patch

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2010-02-26 23:21:11 +00:00

-------- Original Message --------
Subject:    [Nagios-devel] [PATCH] Distinguish between warning and critical notifications
Date:   Tue, 17 Nov 2009 16:02:34 -0800
From:   Mark Gius 
Reply-To:   Nagios Developers List 
To:     nagios-devel@lists.sourceforge.net
CC:     neil.ramsay@market-source.com

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2010-03-08 18:03:16 +00:00

  • Assigned to set to mfriedrich
  • Target Version set to 1.0.2
  • Done % changed from 0 to 90

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2010-05-16 17:42:54 +00:00

  • Assigned to changed from mfriedrich to magellanic

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2010-05-19 14:50:52 +00:00

  • Subject changed from Distinguish between warning and critical notifications to state-based escalation ranges

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2010-05-19 17:30:11 +00:00

taken from my previous commit msg:

        The directives first_notification and last_notification apply to the
        total count of notifications on a particular service or host.  It is
        sometimes desirable to escalate after the Nth critical notification,
        rather than after a total number of N notifications have been sent.
        Service Escalation:
                first/last_warning_notification
                first/last_critical_notification
                first/last_unknown_notification
        Host Escalation:
                first/last_down_notification
                first/last_unreachable_notification

@icinga-migration
Copy link
Author

Updated by Anonymous on 2010-05-21 10:25:06 +00:00

  • Status changed from New to Resolved
  • Done % changed from 90 to 100

Applied in changeset commit:"8eb53674188acec4ea77cd6ba41e5b6c9a7b35d9".

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant