[dev.icinga.com #3441] wrong escalation notification due to state based escalation range behaviour changes #1165

icinga-migration · 2012-11-13T17:18:07Z

This issue has been migrated from Redmine: https://dev.icinga.com/issues/3441

Created by fmbiete on 2012-11-13 17:18:07 +00:00

Assignee: mfriedrich
Status: Resolved (closed on 2012-11-28 15:11:20 +00:00)
Target Version: 1.8.2
Last Update: 2012-11-28 15:11:20 +00:00 (in Redmine)

Icinga Version: 1.8.1
OS Version: Debian Squeeze

Hi,

I have some tests with notification escalation.

1st notification goes to a dummy contact
3rd notification goes to a level 1 contact
10th notification goes to a level 2 contact
15th notification goes to a level 3 contact

We are seeing a warning notification to level 1.
Before it gets to level 2 limit the problem gets resolved.
A recovery notification is sent to level 1, level 2 and level 3.

Why??

Icinga 1.7.2 didn't have that problem.

Specs:
Debian Squeeze 32 bits
Icinga 1.8.1 + IDOUtils
Mysql 5.5

Changesets

2012-11-28 14:37:56 +00:00 by mfriedrich a881643

core: fix wrong escalation notification due to state based escalation range behaviour changes

re-enabling the state based escalation ranges lead into a weird
behavorial change, as the general "is the escalation valid for a
notification" condition was met, but another filter was added (the state
checks and their counters).
Since the default users do not use state based escalation ranges, there
is no other way revoking that behaviour change than making this fully
optional, and reverting to the old known default behaviour by
introducing a new config option, which remains disabled by default.

enable_state_based_escalation_ranges=0

this may not be the best idea within a bugfix release either, but still
it allows those actually wanting to use the state based escalation
ranges to use it without recompiling as we had the request to change
within #2878 already.

reverting to the old known behaviour will probably fix #3441 as well, as
it turns out to be the possible root cause for the faulty condition
checks when an escalation is valid for a notification.

refs #2878
refs #3441

Relations:

relates #2878

The text was updated successfully, but these errors were encountered:

icinga-migration · 2012-11-13T17:21:18Z

Updated by fmbiete on 2012-11-13 17:21:18 +00:00

host-name service-name OK 13-11-2012 18:04:27 fmbiete-n1 service-notify-email 80.54
host-name service-name OK 13-11-2012 18:04:25 fmbiete-n2 service-notify-xmpp 80.54
host-name service-name OK 13-11-2012 18:04:19 fmbiete-n3 service-notify-gtalk 80.54
host-name service-name CRITICAL 13-11-2012 18:03:24 contacto-dummy service-notify-dummy 100.32
host-name service-name WARNING 13-11-2012 17:58:09 contacto-dummy service-notify-dummy 99.57

icinga-migration · 2012-11-25T12:45:34Z

Updated by mfriedrich on 2012-11-25 12:45:34 +00:00

Status changed from New to Feedback

any test configs and/or debug logs for that? it possibly requires more debugging, so don't expect it to be fixed within 1.8.2 as this is already in the release cycle.

#2878 might be related to that one. will try to debug that one next week myself once i got better connection.

icinga-migration · 2012-11-25T17:19:55Z

Updated by fmbiete on 2012-11-25 17:19:55 +00:00

Config. Add some host to hostgroup domain-routers-cpds, and contacts to the contacts_group

# Service Templates
define service {
        name       pnp-svc
        register   0
        action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$' class='tips' rel='/pnp4nagios/popup?host=$HOSTNAME$&srv=$SERVICEDESC$
}

define service {
        name                            domain-service
        check_interval                  5
        retry_interval                  1
        max_check_attempts              3
        notification_interval           5
        notification_options            w,c,r,s
        contact_groups                  dummy-group ;notification: dummy echo
        process_perf_data               1
        register                        0
}

define service {
        name                            domain-service-pnp
        use                             pnp-svc
        process_perf_data               1
        register                        0
}
define service {
        name                            domain-service-24x7
        use                             domain-service
        check_period                    24x7
        notification_period             24x7
        register                        0
}

define service {
        name                            domain-service-passive-24x7
        use                             domain-service-24x7
        active_checks_enabled           0
        passive_checks_enabled          1
        initial_state                   o
        max_check_attempts              1
        check_command                   check_dummy!0 ;always return ok
        check_freshness                 0
        register                        0
}

define service {
        name                            domain-service-24x7-5
        use                             domain-service-24x7
        check_interval                  5
        notification_interval           5
        register                        0
}
define service {
        name                            domain-service-passive-24x7-5
        use                             domain-service-24x7-5,domain-service-passive-24x7
        register                        0
}

define service {
        name                            domain-service-passive-pnp-24x7-5
        use                             domain-service-passive-24x7-5,domain-service-pnp
        register                        0
}


# Service itself
define service {
        use                             domain-service-passive-pnp-24x7-5
        hostgroup_name                  domain-routers-cpds
        servicegroups                   domain-escalation
        service_description             Service Name
}


# Service group for escalation
define servicegroup {
        servicegroup_name       domain-escalation
        alias                   ServiceGroup Name
        register                0
}


# Escalation steps
define serviceescalation {
        servicegroup_name       domain-escalation
        first_notification      3
        last_notification       0
        contact_groups          domain-level1
}

define serviceescalation {
        servicegroup_name       domain-escalation
        first_notification      10
        last_notification       0
        contact_groups          domain-level2
}

define serviceescalation {
        servicegroup_name       domain-escalation
        first_notification      20
        last_notification       0
        contact_groups          domain-level3
}

icinga-migration · 2012-11-25T17:29:21Z

Updated by fmbiete on 2012-11-25 17:29:21 +00:00

Log: Warning is set to 95, Critical to 100

[1353690732] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;hostname.domain;Bandwidth;0;93.32|mbps=93.32;95;100
[1353690805] PASSIVE SERVICE CHECK: hostname.domain;Bandwidth;0;93.32
[1353690846] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;hostname.domain;Bandwidth;1;97.76|mbps=97.76;95;100
[1353690853] PASSIVE SERVICE CHECK: hostname.domain;Bandwidth;1;97.76
[1353690853] SERVICE ALERT: hostname.domain;Bandwidth;WARNING;SOFT;1;97.76
[1353690966] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;hostname.domain;Bandwidth;1;98.86|mbps=98.86;95;100
[1353690966] PASSIVE SERVICE CHECK: hostname.domain;Bandwidth;1;98.86
[1353690966] SERVICE ALERT: hostname.domain;Bandwidth;WARNING;SOFT;2;98.86
[1353691086] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;hostname.domain;Bandwidth;2;102.40|mbps=102.40;95;100
[1353691086] PASSIVE SERVICE CHECK: hostname.domain;Bandwidth;2;102.40
[1353691086] SERVICE ALERT: hostname.domain;Bandwidth;CRITICAL;HARD;3;102.40
[1353691087] SERVICE NOTIFICATION: contacto-dummy;hostname.domain;Bandwidth;CRITICAL;domain-service-notify-dummy;102.40
[1353691219] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;hostname.domain;Bandwidth;2;101.09|mbps=101.09;95;100
[1353691281] PASSIVE SERVICE CHECK: hostname.domain;Bandwidth;2;101.09
[1353691339] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;hostname.domain;Bandwidth;2;102.29|mbps=102.29;95;100
[1353691398] PASSIVE SERVICE CHECK: hostname.domain;Bandwidth;2;102.29
[1353691398] SERVICE NOTIFICATION: contacto-dummy;hostname.domain;Bandwidth;CRITICAL;domain-service-notify-dummy;102.29
[1353691448] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;hostname.domain;Bandwidth;2;100.29|mbps=100.29;95;100
[1353691448] PASSIVE SERVICE CHECK: hostname.domain;Bandwidth;2;100.29
[1353691567] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;hostname.domain;Bandwidth;0;91.25|mbps=91.25;95;100
[1353691567] PASSIVE SERVICE CHECK: hostname.domain;Bandwidth;0;91.25
[1353691567] SERVICE ALERT: hostname.domain;Bandwidth;OK;HARD;3;91.25
[1353691567] SERVICE NOTIFICATION: domain-contact1-n1;hostname.domain;Bandwidth;OK;domain-service-notify-email;91.25
[1353691567] SERVICE NOTIFICATION: domain-contact2-n1;hostname.domain;Bandwidth;OK;domain-service-notify-email;91.25
[1353691567] SERVICE NOTIFICATION: domain-contact3-n1;hostname.domain;Bandwidth;OK;domain-service-notify-email;91.25
[1353691567] SERVICE NOTIFICATION: domain-contact4-n1;hostname.domain;Bandwidth;OK;domain-service-notify-email;91.25
[1353691567] SERVICE NOTIFICATION: domain-contact5-n1;hostname.domain;Bandwidth;OK;domain-service-notify-email;91.25
[1353691567] SERVICE NOTIFICATION: domain-contact1-n2;hostname.domain;Bandwidth;OK;domain-service-notify-gtalk;91.25
[1353691569] SERVICE NOTIFICATION: domain-contact2-n2;hostname.domain;Bandwidth;OK;domain-service-notify-gtalk;91.25
[1353691570] SERVICE NOTIFICATION: domain-contact3-n2;hostname.domain;Bandwidth;OK;domain-service-notify-xmpp;91.25
[1353691572] SERVICE NOTIFICATION: domain-contact1-n3;hostname.domain;Bandwidth;OK;domain-service-notify-gtalk;91.25
[1353691683] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;hostname.domain;Bandwidth;0;92.91|mbps=92.91;95;100
[1353691683] PASSIVE SERVICE CHECK: hostname.domain;Bandwidth;0;92.91
[1353691819] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;hostname.domain;Bandwidth;0;84.85|mbps=84.85;95;100

icinga-migration · 2012-11-25T23:16:14Z

Updated by mfriedrich on 2012-11-25 23:16:14 +00:00

File added 0001-add-enable_state_based_escalation_ranges-and-disable.patch

can you test the attached git patch? it applies on top of 'mfriedrich/core' and likely 'next' too. it reverts some changes and makes state based escalation ranges on the escalation is valid for notification checks an optional filter then. haven't tested that now, as i am lacking off time to do so.

icinga-migration · 2012-11-25T23:25:11Z

Updated by mfriedrich on 2012-11-25 23:25:11 +00:00

Status changed from Feedback to Assigned
Assigned to set to mfriedrich

icinga-migration · 2012-11-26T09:52:27Z

Updated by fmbiete on 2012-11-26 09:52:27 +00:00

I have applied the path without enabling the new parameter

enable_state_based_escalation_ranges=0

I will post the results.

Thank you very much

icinga-migration · 2012-11-26T13:08:15Z

Updated by mfriedrich on 2012-11-26 13:08:15 +00:00

Subject changed from Wrong escalation notification to wrong escalation notification due to state based escalation range behaviour changes
Category set to Escalations
Target Version set to 1.8.2

it's likely the behaviour state change, so the revert to the disabled default should fix it. but as usual, test it til tuesday night, so it could be added to 1.8.2

icinga-migration · 2012-11-26T14:39:22Z

Updated by alexbrueckel on 2012-11-26 14:39:22 +00:00

I've applied the patch to a dev system and now the original problem seems to be solved.

But: as far as i can see, if notifications escalate, only the last escalation group gets the recovery, the normal contact and escalation groups in between get nothing after critical.

icinga-migration · 2012-11-26T17:47:06Z

Updated by fmbiete on 2012-11-26 17:47:06 +00:00

That seems to fix the problem in my system.

The recovery is sent only to the last level in the escalation.

Thank you very much

icinga-migration · 2012-11-28T14:25:31Z

Updated by mfriedrich on 2012-11-28 14:25:31 +00:00

File deleted 0001-add-enable_state_based_escalation_ranges-and-disable.patch

icinga-migration · 2012-11-28T14:35:41Z

Updated by mfriedrich on 2012-11-28 14:35:41 +00:00

alexbrueckel wrote:

But: as far as i can see, if notifications escalate, only the last escalation group gets the recovery, the normal contact and escalation groups in between get nothing after critical.

that's likely to be reproduced in a different issue, please report so, and add all valuable test config, logs, tests, etc.

i've added 3441.cfg to the issue which contains partly config from the reporter, but actually working with the default 'make install-testconfig'.

thanks for the tests, then it will apply to 1.8.2

icinga-migration · 2012-11-28T14:41:37Z

Updated by mfriedrich on 2012-11-28 14:41:37 +00:00

Status changed from Assigned to 7
Done % changed from 0 to 100

icinga-migration · 2012-11-28T15:11:20Z

Updated by mfriedrich on 2012-11-28 15:11:20 +00:00

Status changed from 7 to Resolved

icinga-migration closed this as completed Nov 28, 2012

icinga-migration added High bug Escalations labels Jan 17, 2017

icinga-migration added this to the 1.8.2 milestone Jan 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dev.icinga.com #3441] wrong escalation notification due to state based escalation range behaviour changes #1165

[dev.icinga.com #3441] wrong escalation notification due to state based escalation range behaviour changes #1165

icinga-migration commented Nov 13, 2012

icinga-migration commented Nov 13, 2012

icinga-migration commented Nov 25, 2012

icinga-migration commented Nov 25, 2012

icinga-migration commented Nov 25, 2012

icinga-migration commented Nov 25, 2012

icinga-migration commented Nov 25, 2012

icinga-migration commented Nov 26, 2012

icinga-migration commented Nov 26, 2012

icinga-migration commented Nov 26, 2012

icinga-migration commented Nov 26, 2012

icinga-migration commented Nov 28, 2012

icinga-migration commented Nov 28, 2012

icinga-migration commented Nov 28, 2012

icinga-migration commented Nov 28, 2012

[dev.icinga.com #3441] wrong escalation notification due to state based escalation range behaviour changes #1165

[dev.icinga.com #3441] wrong escalation notification due to state based escalation range behaviour changes #1165

Comments

icinga-migration commented Nov 13, 2012

icinga-migration commented Nov 13, 2012

icinga-migration commented Nov 25, 2012

icinga-migration commented Nov 25, 2012

icinga-migration commented Nov 25, 2012

icinga-migration commented Nov 25, 2012

icinga-migration commented Nov 25, 2012

icinga-migration commented Nov 26, 2012

icinga-migration commented Nov 26, 2012

icinga-migration commented Nov 26, 2012

icinga-migration commented Nov 26, 2012

icinga-migration commented Nov 28, 2012

icinga-migration commented Nov 28, 2012

icinga-migration commented Nov 28, 2012

icinga-migration commented Nov 28, 2012