New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dev.icinga.com #10085] cluster check requesting host/port attributes #3375
Comments
Updated by mfriedrich on 2015-09-03 14:38:57 +00:00
The topic is misleading imho. I don't really understand how that's supposed to influence your satellites not executing checks.
|
Updated by henti on 2015-09-04 05:36:09 +00:00 dnsmichi wrote:
Morning dnsmichi: Disabled features: command compatlog debuglog gelf graphite icingastatus livestatus notification opentsdb perfdata statusdata syslog There is no other debug information I can see. The checks are running and displaying on the dashboard in the master the client is connected to, the host informaiton is just not being passed to the clustered MoM connected to the master. H |
Updated by mfriedrich on 2015-09-04 09:05:07 +00:00 I really have a hard time following your descriptions. Please always add corresponding configurations and everything else which allows to easily understand and reproduce the issue. For now I'd just guess it's a configuration problem. |
Updated by henti on 2015-09-04 13:03:31 +00:00 dnsmichi wrote:
Hi dnsmichi, I'm sorry my description is not very clear. I'm not really sure how else to explain it. All configs are here : http://pastie.org/private/ngltkahb6oeczykjb5bgq Henti |
Updated by mfriedrich on 2015-09-04 13:10:45 +00:00 Don't use such external pastie urls. You may just put the text here, including proper formatting.
|
Updated by henti on 2015-09-17 13:36:05 +00:00 dnsmichi wrote:
I think I have found the problem. I found that in my icinga2 node list output, I have two hosts with the same name listed in my node list output. One under the Master node, and one under it's own node. I've confirmed that both hosts connects to the MOM and Master using the "last seen" data. this can only mean somebody has cloned these hosts and likely renamed them while keeping the icinga configs the same. I'm not sure why it's showing the host as down with log lag of 60000+ days. This does pose a problem. The hostnames in icinga2 is the same, and DNs is pointing to the correct server, so I cannot use the host name to find the incorrect one. I cannot tcpdump the traffic, as it's encrypted. What other way can i use to find the duplicated host that is connecting directly to the MOM to disable the config ? Should configcheck also not fail when this happens ? Regards |
Updated by henti on 2015-09-30 06:09:59 +00:00 henti wrote:
Good morning. Some more information. I've stopped the icinga2 service on the final endpoint to try and identify where the additional host comes from. This is what I've seen. Master of Master : prd-qua-za-mon.dc.domain.com
Master : stg-qua-za-aux01.int.domain.com
It seems appears that stg-qua-za-mis01.int.domain.com connects to both icinga2 servers, but the zones.conf is configured to connect to stg-qua-za-aux01.int.domain.com and I cannot see any traffic between stg-qua-za-mis01.int.domain.com and prd-qua-za-mon.dc.domain.com so the information must come from the stg-qua-za-aux01.int.domain.com server. I'm really at a loss here. More object information : = Master of Masters =
= Master Service in Region = root@stg-qua-za-aux01:/etc/icinga2# icinga2 object list --name stg-qua-za-mis01.int.domain.com
|
Updated by henti on 2015-10-05 09:04:42 +00:00 henti wrote:
Further to this I found the following. The repository.d host files generated by update-config on the master of master contains the following :
Whereas other machines that is working correctly has the following
This seems to indicate that stg-qua-za-db01.int.domain.com is connected to the MOM via the AUX server as configures, while stg-qua-za-mis01.int.domain.com is connected directly, however the AUX also reports that stg-qua-za-mis01.int.domain.com is connected to it. H |
Updated by henti on 2015-10-05 09:06:13 +00:00 henti wrote:
This seems to be only with new host conf files being generated. Older files are correct. When I removed correct files that has existed and working in the past and generate a new file, the new file contains the direct association. H |
Updated by mjbrooks on 2015-10-05 09:25:32 +00:00
Hello @dnsmichi @henti pinged me on IRC, he was wondering if you'd seen his feedback and was concerned. I can't seem to change the status back to "open" so I'm dropping it back in your lap and leaving it as "feedback" (sorry) |
Updated by mfriedrich on 2015-10-05 10:17:41 +00:00
We'll take care of that after our trip to Portland. |
Updated by henti on 2015-11-16 09:14:59 +00:00 dnsmichi wrote:
Any update on this bug ? Regards |
Updated by mfriedrich on 2015-12-03 11:55:17 +00:00 No, not yet. It requires time to read, analyse and tests in order to reproduce your issue. We are currently involved in other projects and/or issues. Kind regards, |
Updated by vishnu on 2015-12-07 10:08:23 +00:00 dnsmichi wrote:
I am facing the exact same issue. My environment has the below components,
satelliteA's side is working fine - I am able to see clientA and client1 in master's icingaweb2 as well as in satelliteA's icingaweb2. The error message in icingaweb2: Zone clientB.zoneB.company.com is not connected. Log lag: 16776 days, 8 hours, 46 minutes and 28 seconds I could not find any problem in icinga2.log or debug.log. It should not be a problem with the configuration, because the exactly same configurations work fine in the other side (satelliteA and its clients). Please look into this. Thanks, |
Updated by mfriedrich on 2016-02-24 23:23:51 +00:00
Not sure how this related to the rest of the history of this issue. I suspect that hentis' setup is overly complicated and some endpoints are missing the required connection information. Either connecting from the master to the client, or vice versa. There has been a problem with older versions opening multiple connections for both directions. A different but related issue was with endpoints with wrong "host" connection information, not checking against the presented CN of the connected node. Please test that again with 2.4.3. @vishnu |
Updated by mfriedrich on 2016-03-18 17:29:00 +00:00
We are not able to reproduce the issue here and therefore believe this problem has been fixed in recent releases. |
This issue has been migrated from Redmine: https://dev.icinga.com/issues/10085
Created by henti on 2015-09-03 11:18:57 +00:00
Assignee: (none)
Status: Closed (closed on 2016-03-18 17:29:00 +00:00)
Target Version: (none)
Last Update: 2016-03-18 17:29:00 +00:00 (in Redmine)
We run a MoM <- Master <- Client setup where we have a Master of Masters (MoM) server in our office, which does all our dependacies and notifications, Masters in regions (DMZ's) which the clients connect to.
Clients are configured using puppet.
The Master is configured in a custer with the MoM. Both Master and MoM has Icingaweb2 for dashboard. Bug 9262 impacted us as statuses were not being updated on the MoM when it changed on the Master so the dashboard were out of sync. With the release of 2.3.9, we set-up the configuration again. All instances are 2.3.9.
The dashboard on the Master is working as expected. I have 6 clients connected, all working.
The Dashboard on the MoM is showing 4 clients as expect with matching services.
Two clients shows as not connected with log lag in access of 16000 days. All services pending.
I've done a full reset of state on both clients and reconnected them, same situation.
I connected a second master to the MoM with 5 clients. Same situation. All clients show on the Master.
Two clients shows as not connected with log lag in access of 16000 days. All services pending.
The log files shows the following :
[2015-09-03 11:44:09 +0200] debug/ApiListener: Not connecting to Endpoint 'stg-qua-za-app01.int.domain.com' because the host/port attributes are missing.
This is consistent for all clients not working as described above.
The endpoint config are generated using update-config so all the configs are the same.
Initially I thought adding the host and port attributes to the Endpoint will resolve the issue, but that only changes the host status from Down to Up, which all services stays pending and doesn’t resolve.
The text was updated successfully, but these errors were encountered: