Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev.icinga.com #12518] getting spurious recovery notifications #4545

Closed
icinga-migration opened this issue Aug 23, 2016 · 2 comments
Closed
Labels
bug Something isn't working

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/12518

Created by phil_fry on 2016-08-23 12:42:19 +00:00

Assignee: (none)
Status: Rejected (closed on 2016-08-23 13:23:53 +00:00)
Target Version: (none)
Last Update: 2016-08-23 13:23:53 +00:00 (in Redmine)

Icinga Version: 2.5.0
Backport?: Not yet backported
Include in Changelog: 1

Hi,

after upgrading to icinga 2.5.0-3.el7 I get recovery notifications that, in my understanding, should not exist, because the critical state never got "hard". This happens randomly.

[2016-08-23 13:58:51] SERVICE NOTIFICATION: phil;xxx;ping6;RECOVERY;mail-service-notification;FPING OK - fc00::1 (loss=0%, rta=5.530000 ms);
Service Ok[2016-08-23 13:58:51] SERVICE ALERT: xxx;ping6;OK;HARD;1;FPING OK - fc00::1 (loss=0%, rta=5.530000 ms)
Service Critical[2016-08-23 13:58:20] SERVICE ALERT: xxx;ping6;CRITICAL;SOFT;1;FPING CRITICAL - fc00::1 (loss=20%, rta=5.420000 ms)
[...]
[2016-08-23 12:07:57] SERVICE NOTIFICATION: phil;yyy;ping6;RECOVERY;mail-service-notification;FPING OK - fc00::2 (loss=0%, rta=5.340000 ms);
Service Ok[2016-08-23 12:07:57] SERVICE ALERT: yyy;ping6;OK;HARD;1;FPING OK - fc00::2 (loss=0%, rta=5.340000 ms)
Service Critical[2016-08-23 12:07:27] SERVICE ALERT: yyy;ping6;CRITICAL;SOFT;1;FPING CRITICAL - fc00::2 (loss=20%, rta=5.300000 ms)

and it's reproducible:

[2016-08-23 14:39:43] SERVICE NOTIFICATION: phil;zzz;ping6;RECOVERY;mail-service-notification;FPING OK - fc00::3 (loss=0%, rta=5.810000 ms);
Service Ok[2016-08-23 14:39:43] SERVICE ALERT: zzz;ping6;OK;HARD;1;FPING OK - fc00::3 (loss=0%, rta=5.810000 ms)
Service Critical[2016-08-23 14:39:13] SERVICE ALERT: zzz;ping6;CRITICAL;SOFT;1;FPING CRITICAL - fc00::3 (loss=100% )

here's one of the objects (irrelevant parts stripped):

{
        __name = "xxx!ping6"
        acknowledgement = 0.000000
        acknowledgement_expiry = 0.000000
        active = true
        check_attempt = 1.000000
        check_command = "fping6"
        check_interval = 60.000000
        check_period = ""
        check_timeout = null
        command_endpoint = ""
        display_name = "fping over ipv6"
        downtime_depth = 0.000000
        enable_active_checks = true
        enable_event_handler = true
        enable_flapping = true
        enable_notifications = true
        enable_passive_checks = true
        enable_perfdata = true
        event_command = ""
        extensions = {
                DbObject = {
                        type = "Object"
                }
        }
        flapping = false
        flapping_last_change = 1471954886.452189
        flapping_negative = 2286.000000
        flapping_positive = 90.000000
        flapping_threshold = 30.000000
        force_next_check = false
        force_next_notification = false
        groups = [ "ping" ]
        ha_mode = 0.000000
        host = {
                __name = "xxx"
                acknowledgement = 0.000000
                acknowledgement_expiry = 0.000000
                active = true
                address = "172.16.1.1"
                address6 = "fc00::1"
                check_attempt = 1.000000
                check_command = "hostalive"
                check_interval = 60.000000
                check_period = ""
                check_timeout = null
                command_endpoint = ""
                display_name = "xxx"
                downtime_depth = 0.000000
                enable_active_checks = true
                enable_event_handler = true
                enable_flapping = true
                enable_notifications = true
                enable_passive_checks = true
                enable_perfdata = true
                event_command = ""
                extensions = {
                        DbObject = {
                                type = "Object"
                        }
                }
                flapping = false
                flapping_last_change = 1471954893.411340
                flapping_negative = 2383.000000
                flapping_positive = 0.000000
                flapping_threshold = 30.000000
                force_next_check = false
                force_next_notification = false
                ha_mode = 0.000000
                last_hard_state = 0.000000
                last_hard_state_change = 1464393580.551641
                last_hard_state_raw = 0.000000
                last_in_downtime = false
                last_reachable = true
                last_state = 0.000000
                last_state_change = 1464393575.269301
                last_state_down = 1464393545.699872
                last_state_raw = 0.000000
                last_state_type = 1.000000
                last_state_unreachable = 0.000000
                last_state_up = 1471954893.410912
                max_check_attempts = 3.000000
                name = "xxx"
                next_check = 1471954951.111343
                notes = ""
                notes_url = ""
                original_attributes = null
                package = "_etc"
                pause_called = false
                paused = false
                resume_called = true
                retry_interval = 30.000000
                start_called = true
                state = 0.000000
                state_loaded = true
                state_raw = 0.000000
                state_type = 1.000000
                stop_called = false
                type = "Host"
                version = 0.000000
                volatile = false
                zone = "master"
        }
        host_name = "xxx"
        last_hard_state = 0.000000
        last_hard_state_change = 1471953531.355629
        last_hard_state_raw = 0.000000
        last_in_downtime = false
        last_reachable = true
        last_state = 0.000000
        last_state_change = 1471953531.355629
        last_state_critical = 1471953500.938814
        last_state_ok = 1471954886.452176
        last_state_raw = 0.000000
        last_state_type = 1.000000
        last_state_unknown = 0.000000
        last_state_unreachable = 1464335231.366201
        last_state_warning = 1460311451.163178
        max_check_attempts = 3.000000
        name = "ping6"
        next_check = 1471954943.352192
        original_attributes = {
                enable_notifications = true
        }
        package = "_etc"
        pause_called = false
        paused = false
        resume_called = true
        retry_interval = 30.000000
        start_called = true
        state = 0.000000
        state_loaded = true
        state_raw = 0.000000
        state_type = 1.000000
        stop_called = false
        type = "Service"
        vars = {
                pnp_check_arg1 = ""
        }
        version = 1469383465.025078
        volatile = false
        zone = "master"
}

Kind regards

Philippe


Relations:

@icinga-migration
Copy link
Author

Updated by gbeutner on 2016-08-23 12:56:57 +00:00

  • Duplicated set to 12517

@icinga-migration
Copy link
Author

Updated by gbeutner on 2016-08-23 13:23:53 +00:00

  • Status changed from New to Rejected

This will be fixed in 2.5.1 shortly, sorry. :(

@icinga-migration icinga-migration added bug Something isn't working Checker labels Jan 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant