Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev.icinga.com #12892] Flapping notifications sent for soft state changes #4726

Closed
icinga-migration opened this issue Oct 12, 2016 · 8 comments
Labels
area/notifications Notification events bug Something isn't working
Milestone

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/12892

Created by xert on 2016-10-12 08:22:41 +00:00

Assignee: mfriedrich
Status: Resolved (closed on 2016-11-10 13:16:37 +00:00)
Target Version: 2.6.0
Last Update: 2017-01-02 18:26:42 +00:00 (in Redmine)

Icinga Version: 2.5.4
Backport?: Not yet backported
Include in Changelog: 1

Flappingstart / Flappingend Notifications are sent after soft state change:

Log from icingaweb2:

NOTIFICATION 08:50:29 [rk] PING OK - Packet loss = 0%, RTA = 11.84 ms
OK 08:20:31 [ 1/5 ] PING OK - Packet loss = 0%, RTA = 15.51 ms
NOTIFICATION 08:20:31 [rk] PING OK - Packet loss = 0%, RTA = 15.51 ms
CRITICAL 08:19:29 [ 1/5 ] PING CRITICAL - Packet loss = 20%, RTA = 15.43 ms

Service is defined as:

Object 'host.example.org!ping4' of type 'Service':
  * __name = "host.example.org!ping4"
  * action_url = ""
  * check_command = "ping4"
  * check_interval = 600
  * check_period = ""
  * check_timeout = null
  * command_endpoint = ""
  * display_name = "ping4"
  * enable_active_checks = true
  * enable_event_handler = true
  * enable_flapping = true
  * enable_notifications = true
  * enable_passive_checks = true
  * enable_perfdata = true
  * event_command = ""
  * flapping_threshold = 30
  * groups = [ ]
  * host_name = "host.example.org"
  * icon_image = ""
  * icon_image_alt = ""
  * max_check_attempts = 5
  * name = "ping4"
  * notes = ""
  * notes_url = ""
  * package = "_etc"
  * retry_interval = 60
  * templates = [ "ping4", "generic-service" ]
  * type = "Service"
  * vars = null
  * volatile = false
  * zone = ""

debug.log:

[2016-10-12 08:19:23 +0200] debug/CheckerComponent: Executing check for 'host.example.org!ping4'
[2016-10-12 08:19:23 +0200] notice/Process: Running command '/usr/local/libexec/nagios/check_ping' '-4' '-H' '10.11.12.13' '-c' '200,15%' '-w' '100,5%': PID 11341
[2016-10-12 08:19:23 +0200] debug/CheckerComponent: Check finished for object 'host.example.org!ping4'
[2016-10-12 08:19:29 +0200] notice/Process: PID 11341 ('/usr/local/libexec/nagios/check_ping' '-4' '-H' '10.11.12.13' '-c' '200,15%' '-w' '100,5%') terminated with exit code 2
[2016-10-12 08:19:29 +0200] debug/DbEvents: add log entry history for 'host.example.org!ping4'
[2016-10-12 08:19:29 +0200] debug/DbEvents: add checkable check history for 'host.example.org!ping4'
[2016-10-12 08:19:29 +0200] debug/DbEvents: add state change history for 'host.example.org!ping4'
[2016-10-12 08:19:29 +0200] notice/Checkable: State Change: Checkable host.example.org!ping4 soft state change from OK to CRITICAL detected.
[2016-10-12 08:19:29 +0200] debug/IdoMysqlConnection: Query: UPDATE icinga_servicestatus SET acknowledgement_type = '0',  active_checks_enabled = '1',  check_command = 'ping4',  check_source = 'icinga.example.org',  check_type = '0',  current_check_attempt = '1',  current_notification_number = '0',  current_state = '2',  endpoint_object_id = 312,  event_handler = '',  event_handler_enabled = '1',  execution_time = '6.008021',  flap_detection_enabled = '1',  has_been_checked = '1',  instance_id = 1,  is_flapping = '0',  is_reachable = '1',  last_check = FROM_UNIXTIME(1476253169),  last_hard_state = '0',  last_hard_state_change = FROM_UNIXTIME(1476207732),  last_notification = FROM_UNIXTIME(1476208887),  last_state_change = FROM_UNIXTIME(1476253169),  last_time_critical = FROM_UNIXTIME(1476253169),  last_time_ok = FROM_UNIXTIME(1476252567),  last_time_warning = FROM_UNIXTIME(1474118199),  latency = '0.000307',  long_output = '',  max_check_attempts = '5',  next_check = FROM_UNIXTIME(1476253227),  next_notification = FROM_UNIXTIME(1476255296),  normal_check_interval = '10',  notifications_enabled = '1',  original_attributes = 'null',  output = 'PING CRITICAL - Packet loss = 20%, RTA = 15.43 ms',  passive_checks_enabled = '1',  percent_state_change = '0',  perfdata = 'rta=15.425000ms;100.000000;200.000000;0.000000 pl=20%;5;15;0',  problem_has_been_acknowledged = '0',  process_performance_data = '1',  retry_check_interval = '1',  scheduled_downtime_depth = '0',  service_object_id = 1056,  should_be_scheduled = '1',  state_type = '0',  status_update_time = FROM_UNIXTIME(1476253169) WHERE service_object_id = 1056
[2016-10-12 08:19:29 +0200] debug/IdoMysqlConnection: Query: INSERT INTO icinga_statehistory (check_source, current_check_attempt, endpoint_object_id, instance_id, last_hard_state, last_state, long_output, max_check_attempts, object_id, output, state, state_change, state_time, state_time_usec, state_type) VALUES ('icinga.example.org', '1', 312, 1, '0', '0', '', '5', 1056, 'PING CRITICAL - Packet loss = 20%, RTA = 15.43 ms', '2', '1', FROM_UNIXTIME(1476253169), '131369', '0')
[2016-10-12 08:20:31 +0200] debug/DbEvents: add state change history for 'host.example.org!ping4'
[2016-10-12 08:20:31 +0200] notice/Checkable: State Change: Checkable host.example.org!ping4 hard state change from CRITICAL to OK detected.
[2016-10-12 08:20:31 +0200] information/Checkable: Checking for configured notifications for object 'host.example.org!ping4'
[2016-10-12 08:20:31 +0200] debug/Checkable: Checkable 'host.example.org!ping4' has 1 notification(s).
[2016-10-12 08:20:31 +0200] notice/Notification: Attempting to send  notifications for notification object 'host.example.org!ping4!mail-vanguard'.
[2016-10-12 08:20:31 +0200] debug/Notification: Type 'FlappingStart', TypeFilter: Acknowledgement, Custom, DowntimeEnd, DowntimeRemoved, DowntimeStart, FlappingEnd, FlappingStart, Problem and Recovery (FType=128, TypeFilter=511)
[2016-10-12 08:20:31 +0200] debug/Notification: State 'OK', StateFilter: Critical, OK, Unknown and Warning (FState=1, StateFilter=15)
[2016-10-12 08:20:31 +0200] debug/Notification: User notification, Type 'FlappingStart', TypeFilter: Acknowledgement, Custom, DowntimeEnd, DowntimeRemoved, DowntimeStart, FlappingEnd, FlappingStart, Problem and Recovery (FType=128, TypeFilter=511)
[2016-10-12 08:20:31 +0200] debug/Notification: User notification, State 'OK', StateFilter: Critical, Down, OK, Unknown, Up and Warning (FState=1, StateFilter=-1)
[2016-10-12 08:20:31 +0200] information/Notification: Sending 'FlappingStart' notification 'host.example.org!ping4!mail-vanguard for user 'rk'
[2016-10-12 08:20:31 +0200] debug/DbEvents: add notification history for 'host.example.org!ping4'
[2016-10-12 08:20:31 +0200] debug/DbEvents: add contact notification history for service 'host.example.org!ping4' and user 'rk'.
[2016-10-12 08:20:31 +0200] notice/Checkable: Flapping: Checkable host.example.org!ping4 started flapping (30% < 100%).
[2016-10-12 08:20:31 +0200] debug/DbEvents: add log entry history for 'host.example.org!ping4'
[2016-10-12 08:20:31 +0200] debug/DbEvents: add flapping history for 'host.example.org!ping4'
[2016-10-12 08:20:31 +0200] notice/Process: Running command '/usr/local/etc/icinga2/scripts/mail-service-notification.sh': PID 11439
[2016-10-12 08:20:31 +0200] debug/DbEvents: add log entry history for 'host.example.org!ping4'
[2016-10-12 08:20:31 +0200] information/Notification: Completed sending 'FlappingStart' notification 'host.example.org!ping4!mail-vanguard' for checkable 'host.example.org!ping4' and user 'rk'.

...

[2016-10-12 08:30:26 +0200] debug/CheckerComponent: Executing check for 'host.example.org!ping4'
[2016-10-12 08:30:26 +0200] notice/Process: Running command '/usr/local/libexec/nagios/check_ping' '-4' '-H' '10.11.12.13' '-c' '200,15%' '-w' '100,5%': PID 12250
[2016-10-12 08:30:26 +0200] debug/CheckerComponent: Check finished for object 'host.example.org!ping4'
[2016-10-12 08:30:30 +0200] notice/Process: PID 12250 ('/usr/local/libexec/nagios/check_ping' '-4' '-H' '10.11.12.13' '-c' '200,15%' '-w' '100,5%') terminated with exit code 0
[2016-10-12 08:30:30 +0200] debug/DbEvents: add checkable check history for 'host.example.org!ping4'
[2016-10-12 08:30:31 +0200] debug/IdoMysqlConnection: Query: UPDATE icinga_servicestatus SET acknowledgement_type = '0',  active_checks_enabled = '1',  check_command = 'ping4',  check_source = 'icinga.example.org',  check_type = '0',  current_check_attempt = '1',  current_notification_number = '0',  current_state = '0',  endpoint_object_id = 312,  event_handler = '',  event_handler_enabled = '1',  execution_time = '4.021782',  flap_detection_enabled = '1',  has_been_checked = '1',  instance_id = 1,  is_flapping = '1',  is_reachable = '1',  last_check = FROM_UNIXTIME(1476253830),  last_hard_state = '0',  last_hard_state_change = FROM_UNIXTIME(1476253231),  last_notification = FROM_UNIXTIME(1476253231),  last_state_change = FROM_UNIXTIME(1476253231),  last_time_critical = FROM_UNIXTIME(1476253169),  last_time_ok = FROM_UNIXTIME(1476253830),  last_time_warning = FROM_UNIXTIME(1474118199),  latency = '0.000488',  long_output = '',  max_check_attempts = '5',  next_check = FROM_UNIXTIME(1476254426),  next_notification = FROM_UNIXTIME(1476255296),  normal_check_interval = '10',  notifications_enabled = '1',  original_attributes = 'null',  output = 'PING OK - Packet loss = 0%, RTA = 15.57 ms',  passive_checks_enabled = '1',  percent_state_change = '52',  perfdata = 'rta=15.572000ms;100.000000;200.000000;0.000000 pl=0%;5;15;0',  problem_has_been_acknowledged = '0',  process_performance_data = '1',  retry_check_interval = '1',  scheduled_downtime_depth = '0',  service_object_id = 1056,  should_be_scheduled = '1',  state_type = '1',  status_update_time = FROM_UNIXTIME(1476253830) WHERE service_object_id = 1056

...

[2016-10-12 08:40:26 +0200] debug/CheckerComponent: Executing check for 'host.example.org!ping4'
[2016-10-12 08:40:26 +0200] notice/Process: Running command '/usr/local/libexec/nagios/check_ping' '-4' '-H' '10.11.12.13' '-c' '200,15%' '-w' '100,5%': PID 13097
[2016-10-12 08:40:26 +0200] debug/CheckerComponent: Check finished for object 'host.example.org!ping4'
[2016-10-12 08:40:30 +0200] notice/Process: PID 13097 ('/usr/local/libexec/nagios/check_ping' '-4' '-H' '10.11.12.13' '-c' '200,15%' '-w' '100,5%') terminated with exit code 0
[2016-10-12 08:40:30 +0200] debug/DbEvents: add checkable check history for 'host.example.org!ping4'
[2016-10-12 08:40:30 +0200] debug/IdoMysqlConnection: Query: UPDATE icinga_servicestatus SET acknowledgement_type = '0',  active_checks_enabled = '1',  check_command = 'ping4',  check_source = 'icinga.example.org',  check_type = '0',  current_check_attempt = '1',  current_notification_number = '0',  current_state = '0',  endpoint_object_id = 312,  event_handler = '',  event_handler_enabled = '1',  execution_time = '4.017304',  flap_detection_enabled = '1',  has_been_checked = '1',  instance_id = 1,  is_flapping = '1',  is_reachable = '1',  last_check = FROM_UNIXTIME(1476254430),  last_hard_state = '0',  last_hard_state_change = FROM_UNIXTIME(1476253231),  last_notification = FROM_UNIXTIME(1476253231),  last_state_change = FROM_UNIXTIME(1476253231),  last_time_critical = FROM_UNIXTIME(1476253169),  last_time_ok = FROM_UNIXTIME(1476254430),  last_time_warning = FROM_UNIXTIME(1474118199),  latency = '0.000359',  long_output = '',  max_check_attempts = '5',  next_check = FROM_UNIXTIME(1476255025),  next_notification = FROM_UNIXTIME(1476255296),  normal_check_interval = '10',  notifications_enabled = '1',  original_attributes = 'null',  output = 'PING OK - Packet loss = 0%, RTA = 11.46 ms',  passive_checks_enabled = '1',  percent_state_change = '35',  perfdata = 'rta=11.462000ms;100.000000;200.000000;0.000000 pl=0%;5;15;0',  problem_has_been_acknowledged = '0',  process_performance_data = '1',  retry_check_interval = '1',  scheduled_downtime_depth = '0',  service_object_id = 1056,  should_be_scheduled = '1',  state_type = '1',  status_update_time = FROM_UNIXTIME(1476254430) WHERE service_object_id = 1056

...

[2016-10-12 08:50:25 +0200] debug/CheckerComponent: Executing check for 'host.example.org!ping4'
[2016-10-12 08:50:25 +0200] notice/Process: Running command '/usr/local/libexec/nagios/check_ping' '-4' '-H' '10.11.12.13' '-c' '200,15%' '-w' '100,5%': PID 13915
[2016-10-12 08:50:25 +0200] debug/CheckerComponent: Check finished for object 'host.example.org!ping4'
[2016-10-12 08:50:29 +0200] notice/Process: PID 13915 ('/usr/local/libexec/nagios/check_ping' '-4' '-H' '10.11.12.13' '-c' '200,15%' '-w' '100,5%') terminated with exit code 0
[2016-10-12 08:50:29 +0200] debug/DbEvents: add checkable check history for 'host.example.org!ping4'
[2016-10-12 08:50:29 +0200] information/Checkable: Checking for configured notifications for object 'host.example.org!ping4'
[2016-10-12 08:50:29 +0200] debug/Checkable: Checkable 'host.example.org!ping4' has 1 notification(s).
[2016-10-12 08:50:29 +0200] notice/Notification: Attempting to send  notifications for notification object 'host.example.org!ping4!mail-vanguard'.
[2016-10-12 08:50:29 +0200] debug/Notification: Type 'FlappingEnd', TypeFilter: Acknowledgement, Custom, DowntimeEnd, DowntimeRemoved, DowntimeStart, FlappingEnd, FlappingStart, Problem and Recovery (FType=256, TypeFilter=511)
[2016-10-12 08:50:29 +0200] debug/Notification: State 'OK', StateFilter: Critical, OK, Unknown and Warning (FState=1, StateFilter=15)
[2016-10-12 08:50:29 +0200] debug/Notification: User notification, Type 'FlappingEnd', TypeFilter: Acknowledgement, Custom, DowntimeEnd, DowntimeRemoved, DowntimeStart, FlappingEnd, FlappingStart, Problem and Recovery (FType=256, TypeFilter=511)
[2016-10-12 08:50:29 +0200] debug/Notification: User notification, State 'OK', StateFilter: Critical, Down, OK, Unknown, Up and Warning (FState=1, StateFilter=-1)
[2016-10-12 08:50:29 +0200] information/Notification: Sending 'FlappingEnd' notification 'host.example.org!ping4!mail-vanguard for user 'rk'
[2016-10-12 08:50:29 +0200] debug/DbEvents: add notification history for 'host.example.org!ping4'
[2016-10-12 08:50:29 +0200] debug/DbEvents: add contact notification history for service 'host.example.org!ping4' and user 'rk'.
[2016-10-12 08:50:29 +0200] notice/Checkable: Flapping: Checkable host.example.org!ping4 stopped flapping (30% >= 26%).
[2016-10-12 08:50:29 +0200] debug/DbEvents: add log entry history for 'host.example.org!ping4'
[2016-10-12 08:50:29 +0200] debug/DbEvents: add flapping history for 'host.example.org!ping4'
[2016-10-12 08:50:29 +0200] notice/Process: Running command '/usr/local/etc/icinga2/scripts/mail-service-notification.sh': PID 13918
[2016-10-12 08:50:29 +0200] debug/DbEvents: add log entry history for 'host.example.org!ping4'
[2016-10-12 08:50:29 +0200] information/Notification: Completed sending 'FlappingEnd' notification 'host.example.org!ping4!mail-vanguard' for checkable 'host.example.org!ping4' and user 'rk'.
[2016-10-12 08:50:29 +0200] notice/Process: PID 13918 ('/usr/local/etc/icinga2/scripts/mail-service-notification.sh') terminated with exit code 0
[2016-10-12 08:50:29 +0200] debug/IdoMysqlConnection: Query: UPDATE icinga_servicestatus SET acknowledgement_type = '0',  active_checks_enabled = '1',  check_command = 'ping4',  check_source = 'icinga.example.org',  check_type = '0',  current_check_attempt = '1',  current_notification_number = '0',  current_state = '0',  endpoint_object_id = 312,  event_handler = '',  event_handler_enabled = '1',  execution_time = '4.018656',  flap_detection_enabled = '1',  has_been_checked = '1',  instance_id = 1,  is_flapping = '0',  is_reachable = '1',  last_check = FROM_UNIXTIME(1476255029),  last_hard_state = '0',  last_hard_state_change = FROM_UNIXTIME(1476253231),  last_notification = FROM_UNIXTIME(1476253231),  last_state_change = FROM_UNIXTIME(1476253231),  last_time_critical = FROM_UNIXTIME(1476253169),  last_time_ok = FROM_UNIXTIME(1476255029),  last_time_warning = FROM_UNIXTIME(1474118199),  latency = '0.000472',  long_output = '',  max_check_attempts = '5',  next_check = FROM_UNIXTIME(1476255624),  next_notification = FROM_UNIXTIME(1476255296),  normal_check_interval = '10',  notifications_enabled = '1',  original_attributes = 'null',  output = 'PING OK - Packet loss = 0%, RTA = 11.84 ms',  passive_checks_enabled = '1',  percent_state_change = '26',  perfdata = 'rta=11.843000ms;100.000000;200.000000;0.000000 pl=0%;5;15;0',  problem_has_been_acknowledged = '0',  process_performance_data = '1',  retry_check_interval = '1',  scheduled_downtime_depth = '0',  service_object_id = 1056,  should_be_scheduled = '1',  state_type = '1',  status_update_time = FROM_UNIXTIME(1476255029) WHERE service_object_id = 1056
[2016-10-12 08:50:29 +0200] debug/IdoMysqlConnection: Query: UPDATE icinga_servicestatus SET current_notification_number = '1',  last_notification = FROM_UNIXTIME(1476255029),  next_notification = FROM_UNIXTIME(1476255296) WHERE instance_id = 1 AND service_object_id = 1056
[2016-10-12 08:50:29 +0200] debug/IdoMysqlConnection: Query: INSERT INTO icinga_notifications (contacts_notified, end_time, end_time_usec, endpoint_object_id, escalated, instance_id, long_output, notification_reason, notification_type, object_id, output, start_time, start_time_usec, state) VALUES ('1', FROM_UNIXTIME(1476255029), '533262', 312, '0', 1, '', '3', '1', 1056, 'PING OK - Packet loss = 0%, RTA = 11.84 ms', FROM_UNIXTIME(1476255029), '533262', '0')
[2016-10-12 08:50:29 +0200] debug/IdoMysqlConnection: Query: UPDATE icinga_servicestatus SET is_flapping = '0',  percent_state_change = '26' WHERE instance_id = 1 AND service_object_id = 1056
[2016-10-12 08:50:29 +0200] debug/IdoMysqlConnection: Query: INSERT INTO icinga_flappinghistory (endpoint_object_id, event_time, event_time_usec, event_type, flapping_type, high_threshold, instance_id, low_threshold, object_id, percent_state_change, reason_type) VALUES (312, FROM_UNIXTIME(1476255029), '533421', '1001', '1', '30', 1, '30', 1056, '26', '1')

https://monitoring-portal.org/index.php?thread/37581-flapping-notifications-starting-from-icinga-2-5/

Changesets

2016-11-10 13:16:02 +00:00 by mfriedrich 7e0c486

Fix Flapping{Start,End} notifications in SOFT states or downtimes

fixes #12560
fixes #12892

2016-11-17 12:29:27 +00:00 by mfriedrich 0610652

Fix Flapping{Start,End} notifications in SOFT states or downtimes

fixes #12560
fixes #12892

Relations:

@icinga-migration
Copy link
Author

Updated by xert on 2016-10-12 08:24:17 +00:00

Notifications were sent as:

(8:20) FLAPPINGSTART - host.example.org - ping4 is OK
(8:50) FLAPPINGEND - host.example.org - ping4 is OK

@icinga-migration
Copy link
Author

Updated by DevMusic on 2016-10-25 08:14:03 +00:00

Flappingstart / Flappingend Notifications are sent after soft state change

We have the same behaviour since we upgraded our Icinga version to 2.5.4 two weeks ago.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-11-07 15:52:57 +00:00

  • Relates set to 11899

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-11-07 16:01:30 +00:00

TODO:

  • Add a unit test
  • Check whether GetStateType() == StateTypeHard fixes the problem (previously hidden with "send_notification")

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-11-09 15:32:21 +00:00

  • Relates set to 12560

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-11-10 13:16:18 +00:00

  • Status changed from New to Assigned
  • Assigned to set to mfriedrich
  • Target Version set to 2.6.0

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-11-10 13:16:37 +00:00

  • Status changed from Assigned to Resolved
  • Done % changed from 0 to 100

Applied in changeset 7e0c486.

@icinga-migration
Copy link
Author

Updated by sru on 2017-01-02 18:26:42 +00:00

For my understanding, FlappingStart notifications should be send in soft states because:

  • Flapping detection needs multiple check results, so the problem is not a "one-timer".
  • The fact that the check is flapping my hinder it to reach a hard state
  • A flapping check is something abnormal, so some person should get a note on that.

@icinga-migration icinga-migration added bug Something isn't working area/notifications Notification events labels Jan 17, 2017
@icinga-migration icinga-migration added this to the 2.6.0 milestone Jan 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/notifications Notification events bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant