New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dev.icinga.com #11273] Services status updated multiple times within check_interval even though no retry was triggered #3990
Comments
Updated by mfriedrich on 2016-03-02 16:14:12 +00:00
Are you using an Icinga 2 Cluster, or any nodes actually executing these checks? Please add the relevant zones.conf entries. |
Updated by ralph_b on 2016-03-02 20:53:31 +00:00 No, we actually don't use Icinga2 Cluster. The troubleshooting file contains the whole master1 zone definition. An icinga2 agent is installed on allmost all clients, but the services.conf on this client were empty. At moment all checks are triggered by the master. The communication between master and clients is a one way road (admin network to customer network). |
Updated by mfriedrich on 2016-03-03 08:17:16 +00:00
Ok, thanks. I'll try to reproduce the issue. Cheers, |
Updated by mfriedrich on 2016-03-03 08:19:34 +00:00
|
Updated by ralph_b on 2016-03-03 08:44:41 +00:00 Additional infomation: I reduced the scenario to master -> one single client w/o icinga2 agent. In this scenario the master is showing the same behavior. Cheers, |
Updated by rgrey on 2016-03-03 17:28:54 +00:00 I think I'm experiencing the same issue on Ubuntu. Single node reporting in hundreds of times a second. Let me know if/what further info I can provide to help. icinga2 - The Icinga 2 network monitoring daemon (version: r2.4.3-1) Copyright © 2012-2016 Icinga Development Team (https://www.icinga.org/) Application information: System information: |
Updated by rgrey on 2016-03-04 09:49:16 +00:00 So, some single node stats (aggregated through Graylog) for a node running for the last 5 minutes
|
Updated by mfriedrich on 2016-03-04 14:03:43 +00:00 https://monitoring-portal.org/index.php?thread/35412-services-checks-werden-mehrfach-ausgef%C3%BChrt/&postID=225805#post225805 (for reference) |
Updated by mfriedrich on 2016-03-04 15:27:26 +00:00
|
Updated by mfriedrich on 2016-03-04 15:27:37 +00:00
|
Updated by mfriedrich on 2016-03-04 15:27:49 +00:00
|
Updated by mfriedrich on 2016-03-04 15:31:42 +00:00
|
Updated by mfriedrich on 2016-03-04 15:33:36 +00:00
|
Updated by mfriedrich on 2016-03-04 15:33:39 +00:00
|
Updated by mfriedrich on 2016-03-04 15:33:46 +00:00
|
Updated by mfriedrich on 2016-03-05 17:40:18 +00:00 I've reverted 2 commits which might be causing trouble here. Can you please re-test the current git master? |
Updated by rgrey on 2016-03-07 11:48:01 +00:00
dnsmichi wrote:
I've downloaded and built the master from git and deployed that build to one node. Results: last 5 minutes: > 13,000 service check messages sent to my Graylog instance - see the attached image. |
Updated by mfriedrich on 2016-03-07 14:53:25 +00:00 Hm, that's fairly strange. I'm using a 3 node cluster (2 nodes in master zone, 1 satellite for command_endpoint checks using the latest icinga2 --version v2.4.3-232-gef532f2) and I don't see such behavior. @rgrey |
Updated by rgrey on 2016-03-07 15:06:57 +00:00 Hmm, I must have done something wrong, as my icinga2 --version on the node still says r2.4.3-1 rather than a git version. I'll do some more work ... sorry. Also, I only built and deployed this to my single remote node. I hadn't changed my master installation. Please advise. icinga2 - The Icinga 2 network monitoring daemon (version: r2.4.3-1) Copyright © 2012-2016 Icinga Development Team (https://www.icinga.org/) Application information: System information: |
Updated by mfriedrich on 2016-03-07 15:34:56 +00:00 Fixed the snapshot package repository for ubuntu trusty, you should see the latest packages available over there. Please update the affected node and the master. |
Updated by ralph_b on 2016-03-07 15:39:44 +00:00 Hi michael, tried to build from github. Sorry, I never installed it this way. I am searching for HowTo/doc to test it on my box. |
Updated by mfriedrich on 2016-03-07 15:43:42 +00:00 @ralph_b Change the repository to use the snapshot package repository instead of stable. Then you are able to install the icinga2 snapshot packages just like normal. |
Updated by rgrey on 2016-03-07 16:09:25 +00:00 Initial results look promising! I've updated my master using the snapshot repository and itself is now showing the expected number of service checks, rather than multiple versions within the same immediate timeframe. Building (correctly!) from git master branch on my remote node currently ... although that now might be moot. Great job. |
Updated by ralph_b on 2016-03-07 16:15:35 +00:00
Hi Michael, thank you for the hint. I got it. icinga2 - The Icinga 2 network monitoring daemon (version: v2.4.3-233-g7439633) Copyright © 2012-2016 Icinga Development Team (https://www.icinga.org/) Application information: System information: Local triggerd checks are working fine now, but the remotely on icinga clients started checks are still showing strange behavior: |
Updated by rgrey on 2016-03-08 15:16:20 +00:00 FYI - this seems resolved by running the latest snapshot on my master node. Client nodes are still running stock latest Ubuntu stable release 2.4.3-1. Mastericinga2 - The Icinga 2 network monitoring daemon (version: v2.4.3-236-g19cb781) Copyright © 2012-2016 Icinga Development Team (https://www.icinga.org/) Application information: System information: Client Nodeicinga2 - The Icinga 2 network monitoring daemon (version: r2.4.3-1) Copyright © 2012-2016 Icinga Development Team (https://www.icinga.org/) Application information: System information: |
Updated by mfriedrich on 2016-03-09 10:40:02 +00:00
Ok thanks for the tests. I suspect the problem is located updating the next check time when receiving a new check result, but without passing the cluster message origin. Besides that, the reverted commits merely affect the passive check results. A proper fix is discussed in #11336. I'll assign this issue for 2.4.4 - it'll be great if you could do further tests with 1) the same snapshot version on all clients 2) ntp running on all nodes (I could guess of a time sync problem here as well). |
Updated by ralph_b on 2016-03-09 11:36:42 +00:00
Hi Michael, there are three client hosts in my small landscape with icinga2 agents (2 Linux boxes and 1 Windows box) which are update now with the snapshot. Two of them had time differences due to not runnig ntpd (I have to talk with the server guys). It still remains one Linux box (host ID 97) with multiple checks within check_interval (please see attached screen shot). I am searching for the difference to the other hosts. Cheers, |
Updated by ralph_b on 2016-03-09 12:32:07 +00:00 Good news for the icinga2 team. Found the reason for host ID 97: services.conf was filled with the delivery content, but has to be emtpy, so the localy installed icinga2 agent fired checks by itself in addition the master (bad for myself). |
Updated by mfriedrich on 2016-03-11 08:36:17 +00:00
Ok thanks. |
Updated by mfriedrich on 2016-03-11 14:56:08 +00:00
|
Updated by mfriedrich on 2016-03-24 09:37:53 +00:00
|
This issue has been migrated from Redmine: https://dev.icinga.com/issues/11273
Created by ralph_b on 2016-03-02 09:37:02 +00:00
Assignee: mfriedrich
Status: Resolved (closed on 2016-03-11 08:36:17 +00:00)
Target Version: 2.4.4
Last Update: 2016-03-24 09:37:53 +00:00 (in Redmine)
Hi to all,
icinga2 fires multiple times new checks on a service before check_interval has reached.
icinga2 --version:
icinga2 - The Icinga 2 network monitoring daemon (version: v2.4.3)
_
Copyright © 2012-2016 Icinga Development Team (https://www.icinga.org/)
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Application information:
Installation root: /usr
Sysconf directory: /etc
Run directory: /var/run
Local state directory: /var
Package data directory: /usr/share/icinga2
State path: /var/lib/icinga2/icinga2.state
Modified attributes path: /var/lib/icinga2/modified-attributes.conf
Objects path: /var/cache/icinga2/icinga2.debug
Vars path: /var/cache/icinga2/icinga2.vars
PID path: /var/run/icinga2/icinga2.pid
System information:
Platform: Red Hat Enterprise Linux Server
Platform version: 6.7 (Santiago)
Kernel: Linux
Kernel version: 2.6.32-573.18.1.el6.x86_64
Architecture: x86_64_
icinga2 feature list:
Disabled features: compatlog debuglog gelf icingastatus livestatus opentsdb perfdata statusdata syslog
Enabled features: api checker command graphite ido-pgsql mainlog notification
The output of the following commad you will find in the file attached:
ELECT
status.servicestatus_id,
status.status_update_time,
status.last_check,
status.next_check,
age(status.next_check, status.last_check) as delta,
to_char(services.check_interval * 60 , '99') as check_interval,
status.service_object_id,
status.current_state,
status.has_been_checked,
status.current_check_attempt,
status.should_be_scheduled,
status.is_flapping
FROM
public.icinga_servicestatus status inner join icinga_services services on (status.service_object_id = services.service_object_id)
where status.status_update_time > '2016-03-02 10:27:00+01'::timestamp
and age(status.next_check, status.last_check) < '00:00:58'::time
order by status_update_time;
Please pay attention to column "delta".
There is a unknowable rule/pattern for me on which these checks are fired.
Attachments
Changesets
2016-03-05 17:15:03 +00:00 by mfriedrich b8e3d61
2016-03-05 17:16:49 +00:00 by mfriedrich ef532f2
2016-03-11 14:55:03 +00:00 by mfriedrich 8344f74
2016-03-11 14:55:14 +00:00 by mfriedrich f99feab
The text was updated successfully, but these errors were encountered: