Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev.icinga.com #11825] Problems with check scheduling for HARD state changes (standalone/command_endpoint) #4232

Closed
icinga-migration opened this issue May 21, 2016 · 5 comments
Labels
bug Something isn't working
Milestone

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/11825

Created by mfriedrich on 2016-05-21 16:51:25 +00:00

Assignee: gbeutner
Status: Resolved (closed on 2016-05-24 09:10:05 +00:00)
Target Version: 2.5.0
Last Update: 2016-08-08 11:14:28 +00:00 (in Redmine)

Icinga Version: 2.4.10
Backport?: Not yet backported
Include in Changelog: 1

x. SOFT state -> retry_interval

  1. HARD state -> retry_interval +1m (does not call UpdateNextCheck() in ProcessCheckResult())

  2. HARD state -> ??? interval + 2m (seems to have been corrected with the 3m check_interval window from previous check; obviously called by ExecuteCheck/UpdateNextCheck)

  3. HARD state -> check_interval +3m

    2016-05-21 18:14:08 - {"check_result":{"active":true,"check_source":"mbmif.int.netways.de","command":["/usr/local/sbin/check_tcp","-H","","-M","warn","-p","10101","-r","crit","-t","10"],"execution_end":1463847248.8582880497,"execution_start":1463847248.8462469578,"exit_status":2.0,"output":"TCP CRITICAL - Invalid hostname, address or socket:","performance_data":[],"schedule_end":1463847248.8587419987,"schedule_start":1463847248.8419499397,"state":2.0,"type":"CheckResult","vars_after":{"attempt":1.0,"reachable":true,"state":2.0,"state_type":0.0},"vars_before":{"attempt":1.0,"reachable":true,"state":0.0,"state_type":1.0}},"host":"hard-interval","timestamp":1463847248.8593370914,"type":"CheckResult"}

    2016-05-21 18:14:55 - {"check_result":{"active":true,"check_source":"mbmif.int.netways.de","command":["/usr/local/sbin/check_tcp","-H","","-M","warn","-p","10101","-r","crit","-t","10"],"execution_end":1463847295.3668069839,"execution_start":1463847295.3532509804,"exit_status":2.0,"output":"TCP CRITICAL - Invalid hostname, address or socket:","performance_data":[],"schedule_end":1463847295.3672609329,"schedule_start":1463847295.3500001431,"state":2.0,"type":"CheckResult","vars_after":{"attempt":2.0,"reachable":true,"state":2.0,"state_type":0.0},"vars_before":{"attempt":1.0,"reachable":true,"state":2.0,"state_type":0.0}},"host":"hard-interval","timestamp":1463847295.3680989742,"type":"CheckResult"}

    2016-05-21 18:15:55 - {"check_result":{"active":true,"check_source":"mbmif.int.netways.de","command":["/usr/local/sbin/check_tcp","-H","","-M","warn","-p","10101","-r","crit","-t","10"],"execution_end":1463847355.3658659458,"execution_start":1463847355.3562619686,"exit_status":2.0,"output":"TCP CRITICAL - Invalid hostname, address or socket:","performance_data":[],"schedule_end":1463847355.3662919998,"schedule_start":1463847355.3500001431,"state":2.0,"type":"CheckResult","vars_after":{"attempt":3.0,"reachable":true,"state":2.0,"state_type":0.0},"vars_before":{"attempt":2.0,"reachable":true,"state":2.0,"state_type":0.0}},"host":"hard-interval","timestamp":1463847355.3669650555,"type":"CheckResult"}

    2016-05-21 18:16:55 - {"check_result":{"active":true,"check_source":"mbmif.int.netways.de","command":["/usr/local/sbin/check_tcp","-H","","-M","warn","-p","10101","-r","crit","-t","10"],"execution_end":1463847415.3653900623,"execution_start":1463847415.3531889915,"exit_status":2.0,"output":"TCP CRITICAL - Invalid hostname, address or socket:","performance_data":[],"schedule_end":1463847415.3656980991,"schedule_start":1463847415.3500001431,"state":2.0,"type":"CheckResult","vars_after":{"attempt":1.0,"reachable":true,"state":2.0,"state_type":1.0},"vars_before":{"attempt":3.0,"reachable":true,"state":2.0,"state_type":0.0}},"host":"hard-interval","timestamp":1463847415.3662559986,"type":"CheckResult"}

    1. HARD state +1m

    2016-05-21 18:17:55 - {"check_result":{"active":true,"check_source":"mbmif.int.netways.de","command":["/usr/local/sbin/check_tcp","-H","","-M","warn","-p","10101","-r","crit","-t","10"],"execution_end":1463847475.372895956,"execution_start":1463847475.3587040901,"exit_status":2.0,"output":"TCP CRITICAL - Invalid hostname, address or socket:","performance_data":[],"schedule_end":1463847475.3730199337,"schedule_start":1463847475.3500001431,"state":2.0,"type":"CheckResult","vars_after":{"attempt":1.0,"reachable":true,"state":2.0,"state_type":1.0},"vars_before":{"attempt":1.0,"reachable":true,"state":2.0,"state_type":1.0}},"host":"hard-interval","timestamp":1463847475.3733570576,"type":"CheckResult"}

    1. HARD state - +2m

    2016-05-21 18:19:55 - {"check_result":{"active":true,"check_source":"mbmif.int.netways.de","command":["/usr/local/sbin/check_tcp","-H","","-M","warn","-p","10101","-r","crit","-t","10"],"execution_end":1463847595.3624830246,"execution_start":1463847595.352396965,"exit_status":2.0,"output":"TCP CRITICAL - Invalid hostname, address or socket:","performance_data":[],"schedule_end":1463847595.362817049,"schedule_start":1463847595.3500001431,"state":2.0,"type":"CheckResult","vars_after":{"attempt":1.0,"reachable":true,"state":2.0,"state_type":1.0},"vars_before":{"attempt":1.0,"reachable":true,"state":2.0,"state_type":1.0}},"host":"hard-interval","timestamp":1463847595.3629999161,"type":"CheckResult"}

    1. HARD state - +3m

    2016-05-21 18:22:55 - {"check_result":{"active":true,"check_source":"mbmif.int.netways.de","command":["/usr/local/sbin/check_tcp","-H","","-M","warn","-p","10101","-r","crit","-t","10"],"execution_end":1463847775.3754639626,"execution_start":1463847775.3603029251,"exit_status":2.0,"output":"TCP CRITICAL - Invalid hostname, address or socket:","performance_data":[],"schedule_end":1463847775.3755888939,"schedule_start":1463847775.3500001431,"state":2.0,"type":"CheckResult","vars_after":{"attempt":1.0,"reachable":true,"state":2.0,"state_type":1.0},"vars_before":{"attempt":1.0,"reachable":true,"state":2.0,"state_type":1.0}},"host":"hard-interval","timestamp":1463847775.3759551048,"type":"CheckResult"}

Log from over here: https://monitoring-portal.org/index.php?thread/36174-wrong-retry-interval-check-interval-switching/

2016-05-20 00:22:37 - icinga2-lab-02/art-pc-slave.labnet/tcp - OK/HARD/1.0
2016-05-20 00:25:37 - icinga2-lab-02/art-pc-slave.labnet/tcp - CRITICAL/SOFT/1.0
2016-05-20 00:26:37 - icinga2-lab-02/art-pc-slave.labnet/tcp - CRITICAL/SOFT/2.0
2016-05-20 00:27:37 - icinga2-lab-02/art-pc-slave.labnet/tcp - CRITICAL/SOFT/3.0
2016-05-20 00:28:37 - icinga2-lab-02/art-pc-slave.labnet/tcp - CRITICAL/HARD/1.0 --> while everything is fine, now timer must be changed to check_interval (3min)
2016-05-20 00:29:37 - icinga2-lab-02/art-pc-slave.labnet/tcp - CRITICAL/HARD/1.0 --> but we see that timer wasnt change, still 1min
2016-05-20 00:31:37 - icinga2-lab-02/art-pc-slave.labnet/tcp - CRITICAL/HARD/1.0 --> 3min after first HARD, from this moment timer is correct
2016-05-20 00:34:37 - icinga2-lab-02/art-pc-slave.labnet/tcp - CRITICAL/HARD/1.0
2016-05-20 00:37:37 - icinga2-lab-02/art-pc-slave.labnet/tcp - CRITICAL/HARD/1.0
2016-05-20 00:40:37 - icinga2-lab-02/art-pc-slave.labnet/tcp - CRITICAL/HARD/1.0
2016-05-20 00:43:37 - icinga2-lab-02/art-pc-slave.labnet/tcp - OK/HARD/1.0
2016-05-20 00:46:37 - icinga2-lab-02/art-pc-slave.labnet/tcp - OK/HARD/1.0
2016-05-20 00:49:37 - icinga2-lab-02/art-pc-slave.labnet/tcp - OK/HARD/1.0

Changesets

2016-05-21 16:58:19 +00:00 by mfriedrich d49b63d

Fix: First HARD state does not change retry_interval to check_interval

refs #11825

2016-05-24 09:05:29 +00:00 by gbeutner aeb7a4a

Fix incorrect check interval for SOFT->HARD transitions

fixes #11825

2016-05-24 10:42:02 +00:00 by gbeutner 7b371f2

Add note about check intervals

refs #11825

Relations:

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-05-21 16:58:09 +00:00

  • Status changed from New to Assigned
  • Assigned to set to mfriedrich
  • Target Version set to 2.5.0
  • Parent Id set to 11310

We need to check whether a hardChange occurred when updating the next check. Using UpdateNextCheck() does not help here as it will include the scheduling offset.

Testing a possible fix with a shorter interval range (check_interval = 60s, retry_interval = 30s).

object Host "hard-interval" {
  check_command = "dummy"
  check_interval = 60s
  retry_interval = 30s
  max_check_attempts = 3
  check_command = "tcp"
  vars.tcp_port = 10101
}

2016-05-21 18:48:36 - {"check_result":{"active":true,"check_source":"mbmif.int.netways.de","command":["/usr/local/sbin/check_tcp","-H","","-M","warn","-p","10101","-r","crit","-t","10"],"execution_end":1463849316.6326351166,"execution_start":1463849316.6209499836,"exit_status":2.0,"output":"TCP CRITICAL - Invalid hostname, address or socket:","performance_data":[],"schedule_end":1463849316.6331150532,"schedule_start":1463849316.6185519695,"state":2.0,"type":"CheckResult","vars_after":{"attempt":1.0,"reachable":true,"state":2.0,"state_type":0.0},"vars_before":{"attempt":1.0,"reachable":true,"state":0.0,"state_type":1.0}},"host":"hard-interval","timestamp":1463849316.6340050697,"type":"CheckResult"}

2016-05-21 18:48:55 - {"check_result":{"active":true,"check_source":"mbmif.int.netways.de","command":["/usr/local/sbin/check_tcp","-H","","-M","warn","-p","10101","-r","crit","-t","10"],"execution_end":1463849335.8468580246,"execution_start":1463849335.8350260258,"exit_status":2.0,"output":"TCP CRITICAL - Invalid hostname, address or socket:","performance_data":[],"schedule_end":1463849335.8469719887,"schedule_start":1463849335.8200001717,"state":2.0,"type":"CheckResult","vars_after":{"attempt":2.0,"reachable":true,"state":2.0,"state_type":0.0},"vars_before":{"attempt":1.0,"reachable":true,"state":2.0,"state_type":0.0}},"host":"hard-interval","timestamp":1463849335.8472249508,"type":"CheckResult"}

2016-05-21 18:49:25 - {"check_result":{"active":true,"check_source":"mbmif.int.netways.de","command":["/usr/local/sbin/check_tcp","-H","","-M","warn","-p","10101","-r","crit","-t","10"],"execution_end":1463849365.8334469795,"execution_start":1463849365.8223469257,"exit_status":2.0,"output":"TCP CRITICAL - Invalid hostname, address or socket:","performance_data":[],"schedule_end":1463849365.8338980675,"schedule_start":1463849365.8199999332,"state":2.0,"type":"CheckResult","vars_after":{"attempt":3.0,"reachable":true,"state":2.0,"state_type":0.0},"vars_before":{"attempt":2.0,"reachable":true,"state":2.0,"state_type":0.0}},"host":"hard-interval","timestamp":1463849365.8353419304,"type":"CheckResult"}

2016-05-21 18:49:55 - {"check_result":{"active":true,"check_source":"mbmif.int.netways.de","command":["/usr/local/sbin/check_tcp","-H","","-M","warn","-p","10101","-r","crit","-t","10"],"execution_end":1463849395.8309910297,"execution_start":1463849395.8215589523,"exit_status":2.0,"output":"TCP CRITICAL - Invalid hostname, address or socket:","performance_data":[],"schedule_end":1463849395.8314950466,"schedule_start":1463849395.8199999332,"state":2.0,"type":"CheckResult","vars_after":{"attempt":1.0,"reachable":true,"state":2.0,"state_type":1.0},"vars_before":{"attempt":3.0,"reachable":true,"state":2.0,"state_type":0.0}},"host":"hard-interval","timestamp":1463849395.8319449425,"type":"CheckResult"}

HARD state - +1m

2016-05-21 18:50:55 - {"check_result":{"active":true,"check_source":"mbmif.int.netways.de","command":["/usr/local/sbin/check_tcp","-H","","-M","warn","-p","10101","-r","crit","-t","10"],"execution_end":1463849455.8464729786,"execution_start":1463849455.8368179798,"exit_status":2.0,"output":"TCP CRITICAL - Invalid hostname, address or socket:","performance_data":[],"schedule_end":1463849455.8467869759,"schedule_start":1463849455.8317630291,"state":2.0,"type":"CheckResult","vars_after":{"attempt":1.0,"reachable":true,"state":2.0,"state_type":1.0},"vars_before":{"attempt":1.0,"reachable":true,"state":2.0,"state_type":1.0}},"host":"hard-interval","timestamp":1463849455.8474071026,"type":"CheckResult"}

@icinga-migration
Copy link
Author

Updated by gbeutner on 2016-05-23 08:38:36 +00:00

  • Relates set to 11363

@icinga-migration
Copy link
Author

Updated by gbeutner on 2016-05-24 09:10:07 +00:00

  • Status changed from Assigned to Resolved
  • Done % changed from 0 to 100

Applied in changeset aeb7a4a.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-05-24 11:21:00 +00:00

  • Subject changed from First HARD state does not change retry_interval to check_interval to Problems with check scheduling for HARD state changes (standalone/command_endpoint)
  • Assigned to changed from mfriedrich to gbeutner

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-08-08 11:14:28 +00:00

  • Parent Id deleted 11310

@icinga-migration icinga-migration added bug Something isn't working libicinga labels Jan 17, 2017
@icinga-migration icinga-migration added this to the 2.5.0 milestone Jan 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant