[dev.icinga.com #11020] Master reloads with agents generate false alarms #3871

icinga-migration · 2016-01-22T15:23:50Z

This issue has been migrated from Redmine: https://dev.icinga.com/issues/11020

Created by tgelf on 2016-01-22 15:23:50 +00:00

Assignee: gbeutner
Status: Resolved (closed on 2016-02-23 09:59:37 +00:00)
Target Version: 2.4.2
Last Update: 2016-02-23 09:59:54 +00:00 (in Redmine)

Icinga Version: 2.4.1
Backport?: Already backported
Include in Changelog: 1

The most convenient configuration variant for Icinga 2 Agents are command endpoints. In such an environment we generate a lot of superfluous state changes (ok/unknown/ok). I didn't try it out, but I guess on slow reloads combined with typical retry_interval settings this would allow one to reach a hard state pretty fast, resulting in false alarms. And even if not, this causes overhead in the IDO, might influence SLA reports and so on. We need some kind of "reload awareness" or grace period to handle this.

Best,
Thomas

Changesets

2016-02-08 08:46:01 +00:00 by (unknown) 6d5014b

Increase grace period for agent-based checks

refs #11020

2016-02-23 09:51:12 +00:00 by (unknown) b8195be

Increase grace period for agent-based checks

refs #11020

The text was updated successfully, but these errors were encountered:

icinga-migration · 2016-01-25T09:59:19Z

Updated by mfriedrich on 2016-01-25 09:59:19 +00:00

Category set to Cluster
Target Version set to 2.5.0

icinga-migration · 2016-01-25T10:30:57Z

Updated by mfriedrich on 2016-01-25 10:30:57 +00:00

Target Version changed from 2.5.0 to 2.4.2

icinga-migration · 2016-01-28T18:08:11Z

Updated by ziaunys on 2016-01-28 18:08:11 +00:00

tgelf wrote:

The most convenient configuration variant for Icinga 2 Agents are command endpoints. In such an environment we generate a lot of superfluous state changes (ok/unknown/ok). I didn't try it out, but I guess on slow reloads combined with typical retry_interval settings this would allow one to reach a hard state pretty fast, resulting in false alarms. And even if not, this causes overhead in the IDO, might influence SLA reports and so on. We need some kind of "reload awareness" or grace period to handle this.

Best,
Thomas

I just started to encounter this issue. I'm not sure if it's because I have a lot of agents now. There are a total of 270. In my environment when Puppet runs and adds a new host it will reload and most of the agent cluster-zone checks will fail once. I have 2 attempts set so some times a handful of checks will fail and page our on-call person which is confusing because it's usually a random set of hosts and it looks like a bunch of hosts have gone down from their perspective.

icinga-migration · 2016-02-05T13:59:51Z

Updated by mfriedrich on 2016-02-05 13:59:51 +00:00

Status changed from New to Assigned
Assigned to set to mfriedrich
Priority changed from Normal to High

icinga-migration · 2016-02-05T14:11:06Z

Updated by mfriedrich on 2016-02-05 14:11:06 +00:00

I'll take a look into it, per customer requirement.

icinga-migration · 2016-02-08T12:46:44Z

Updated by mfriedrich on 2016-02-08 12:46:44 +00:00

Assigned to changed from mfriedrich to gbeutner

icinga-migration · 2016-02-23T09:59:37Z

Updated by gbeutner on 2016-02-23 09:59:37 +00:00

Status changed from Assigned to Resolved

icinga-migration · 2016-02-23T09:59:54Z

Updated by gbeutner on 2016-02-23 09:59:54 +00:00

Backport? changed from Not yet backported to Already backported

icinga-migration closed this as completed Feb 23, 2016

icinga-migration added blocker Blocks a release or needs immediate attention bug Something isn't working area/distributed Distributed monitoring (master, satellites, clients) labels Jan 17, 2017

icinga-migration added this to the 2.4.2 milestone Jan 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dev.icinga.com #11020] Master reloads with agents generate false alarms #3871

[dev.icinga.com #11020] Master reloads with agents generate false alarms #3871

icinga-migration commented Jan 22, 2016

icinga-migration commented Jan 25, 2016

icinga-migration commented Jan 25, 2016

icinga-migration commented Jan 28, 2016

icinga-migration commented Feb 5, 2016

icinga-migration commented Feb 5, 2016

icinga-migration commented Feb 8, 2016

icinga-migration commented Feb 23, 2016

icinga-migration commented Feb 23, 2016

[dev.icinga.com #11020] Master reloads with agents generate false alarms #3871

[dev.icinga.com #11020] Master reloads with agents generate false alarms #3871

Comments

icinga-migration commented Jan 22, 2016

icinga-migration commented Jan 25, 2016

icinga-migration commented Jan 25, 2016

icinga-migration commented Jan 28, 2016

icinga-migration commented Feb 5, 2016

icinga-migration commented Feb 5, 2016

icinga-migration commented Feb 8, 2016

icinga-migration commented Feb 23, 2016

icinga-migration commented Feb 23, 2016