New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dev.icinga.com #11173] Notification for hosts/Services in downtime after config reload #3944
Comments
Updated by mfriedrich on 2016-02-24 19:51:08 +00:00
Do you happen to have more details, e.g. (debug) logs providing more insights on why the downtime is ignored for these services, allowing to send notifications? |
Updated by Reavermaster on 2016-02-29 08:06:28 +00:00 Okay here are some lines from the icinga2.log:
The Problem is: the "checkhost" is in downtime - here the database export:
Do you need more Information? Edit: The Update to icinga2 version 2.4.3 did not fix this problem. The Log and SQL output is from the updated Version btw. |
Updated by essener61 on 2016-03-03 09:05:18 +00:00 We can confirm the problem . All non acknowledged services are alertet with each reload. |
Updated by mfriedrich on 2016-03-03 09:09:25 +00:00
|
Updated by mfriedrich on 2016-03-04 15:35:57 +00:00
|
Updated by phsc on 2016-03-09 15:30:15 +00:00 We do have the same issue with reloading icinga2 services in a cluster setup consisting of 2 nodes and 4 satellite zones (each zone has 2 satellite servers). This is the procedure how I started the cluster:
When I reload the icinga2 service on the first node, from where I distribute my config, the downtimes remain how they should. But when I reload the icinga2 service on the second node all downtimes turn ineffective. I can see them in "Downtimes" in icingaweb2 and in the database though. A few seconds after stopping the icinga2 service on the second node, the first node becomes the active endpoint. At this time all the previous configured downtimes turned effective again and deleting works again. Please let me know if you need more testing or log files. Thanks |
Updated by mfriedrich on 2016-03-09 15:34:02 +00:00
|
Updated by mfriedrich on 2016-08-02 13:23:18 +00:00
A guess from reading the comments - the secondary node does not know anything about the runtime created comments/downtimes objects. Once it reloads and takes over the active/active enable_ha IDO, it will flush/remove the visible downtimes/comments (as the core thinks it does not exist). @phcs Changing the category to "cluster" as it seems this is only affecting HA setups. Using a local standalone setup this is not reproducible.
|
Updated by mfriedrich on 2016-08-05 13:58:45 +00:00 Tried to reproduce the issue with the snapshot packages, but I am not able to reproduce any notifications upon ha cluster node restart. Similar issue is here: https://dev.icinga.org/issues/11012#note-12 Can you please deploy the current snapshot packages in your environment and check whether your problem is solved? |
Updated by phsc on 2016-08-29 12:47:40 +00:00 @dnsmichi I guess as well that it's only an issue in a cluster scenario since I don't encounter any problems with downtimes with only one active node. |
Updated by mfriedrich on 2016-08-29 13:34:43 +00:00
Ah. So the culprit is that downtimes are not in sync between two HA nodes. Anything else which causes trouble (notifications, etc.) is probably just related to this problem. If you re-sync that directory on both nodes, does it work again? |
Updated by phsc on 2016-08-29 14:52:49 +00:00 Ok, I copied the downtime files manually from the first to the second node, adjusted the permissions and started icinga2.service on the second node. After Icinga2 switched the active endpoint to the second node, I can see that the downtimes for which I have manually copied the files to the second host, are still in effect. This seems to work. |
Updated by mfriedrich on 2016-11-09 14:55:33 +00:00
|
Updated by phsc on 2016-12-23 07:34:05 +00:00 After upgrading all cluster members to 2.6 and cleaning up all downtimes files and database entries manually, the problem seems to be solved. Thanks! |
Updated by mfriedrich on 2017-01-09 15:06:11 +00:00
Ok, thanks for the feedback! Kind regards, |
This issue has been migrated from Redmine: https://dev.icinga.com/issues/11173
Created by Reavermaster on 2016-02-17 10:45:40 +00:00
Assignee: (none)
Status: Closed (closed on 2017-01-09 15:06:11 +00:00)
Target Version: (none)
Last Update: 2017-01-09 15:06:11 +00:00 (in Redmine)
I've monitored that icinga2 sends out notifications for hosts or services in downtime after a config reload was startet.
Sometimes Notifications was also resend from the other cluster node after a reload.
System:
CentOS 7.2.1511
icinga2 v2.4.1
Relations:
The text was updated successfully, but these errors were encountered: