New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dev.icinga.com #11684] Cluster resync problem with API created objects #4169
Comments
Updated by mfriedrich on 2016-05-09 15:53:47 +00:00
Workaround discussed somewhere else: Set the "zone" attribute explicitly for PUT requests. |
Updated by gbeutner on 2016-05-11 13:43:02 +00:00
|
Updated by gbeutner on 2016-05-11 13:50:25 +00:00
|
Updated by mfriedrich on 2016-05-11 13:55:38 +00:00
|
Updated by mfriedrich on 2016-05-21 13:52:25 +00:00 Unfortunately setting the zone attribute from GetLocalZone() won't fix the issue itself. The files generated with CreateObject() are still located in "conf.d" which does not take them into account for runtime-syncs. Testing a separate patch which takes the zoneName into account and puts those files underneath zones.d///.conf. |
Updated by mfriedrich on 2016-05-21 14:23:54 +00:00
|
Updated by mfriedrich on 2016-06-23 13:38:41 +00:00
|
Updated by mfriedrich on 2016-09-28 13:41:44 +00:00
|
Updated by mfriedrich on 2016-11-10 16:14:32 +00:00 The fix works for the scenario where master A starts and sends UpdateObject messages to master B (which involves setting the target_zone for RelayMessage). The other way around, master B being shutdown and then reconnecting, the config sync does not work. This is probably due to the missing implicit zone attribute inside the same zone. I'll investigate further. |
Updated by mfriedrich on 2016-11-10 16:43:30 +00:00 Forget the statement above, re-syncing objects created at runtime through the API doesn't work, neither if explicitly specified via zone attribute nor when empty. One observation - if I specify a child zone ("satellite") instead of the current "master" (and also have code which explicitly sets that if omitted), the configuration gets synced.
The other way around, it does not work with the same zone.
I'm assuming that something within CanAccessObject() is somehow broken here. |
Updated by mfriedrich on 2016-11-10 18:27:16 +00:00 Funny. It is the check between the object version and the endpoint log position. Which means that a synced endpoint (no more replay logs) is not able to fetch objects at any time. This would probably also explain why Comments/Downtimes are not synced. Debug patched output:
I'm not entirely sure why we added that check in the past, most obviously it should've prevented unwanted object syncs. Although there is no direct relation between the replay log, and the runtime object configs. I've played around with an offset, but there is no exact value for that. Removed it Different test with downtimes.
|
Updated by mfriedrich on 2016-11-10 19:05:19 +00:00
|
Updated by mfriedrich on 2016-11-10 19:14:05 +00:00
Pushed a fix to git master, please test the snapshot packages. |
Updated by mfriedrich on 2016-11-14 13:56:43 +00:00
|
Updated by geds on 2016-11-24 15:11:10 +00:00 I have tested this fix with version: v2.5.4-206-gb028ff2 (icinga2.x86_64 2.5.4-1.snapshot201611221649.el7.centos @ICINGA-snapshot). The cluster is resyncing API objects now, but there is still one problem - cluster resyncs everything even if some objects were deleted on one node while other was down. How to reproduce:
This time 'service1' gets synced to Icinga2a from Icinga2b because it was still present there while it was down. Not sure if i should create a separate ticket for this. BTW thanks a lot for your hard work! |
Updated by mfriedrich on 2016-11-24 15:23:13 +00:00 Hm, imho that's a problem which wasn't introduced by this fix but existed already. Can you please open a new issue? |
Updated by geds on 2016-11-24 15:32:02 +00:00 Will do. Thanks for your prompt answer. |
This issue has been migrated from Redmine: https://dev.icinga.com/issues/11684
Created by geds on 2016-04-26 12:43:08 +00:00
Assignee: mfriedrich
Status: Resolved (closed on 2016-11-14 13:56:43 +00:00)
Target Version: 2.6.0
Last Update: 2016-11-24 15:32:02 +00:00 (in Redmine)
Hello,
there is a problem with Icinga2 in cluster mode not resyncing API created objects.
To reproduce this issue:
Stop Icinga2 instances on both cluster servers
Edit /etc/icinga2/zones.conf on both 'icinga2a' and 'icinga2b' to enable config replication
Start Icinga2 on 'icinga2a'
Create a host/service on 'icinga2a'
Repeat step 6. However this time configs are not there.
Changesets
2016-10-11 08:55:13 +00:00 by gbeutner 0145a32
2016-10-14 13:54:34 +00:00 by gbeutner 759aba8
2016-10-24 06:40:12 +00:00 by gbeutner d70d779
2016-11-10 16:15:06 +00:00 by mfriedrich 5dd4898
2016-11-10 16:16:08 +00:00 by mfriedrich 72bf538
2016-11-10 16:44:05 +00:00 by mfriedrich 2e2de7c
2016-11-11 15:29:37 +00:00 by mfriedrich 4b86f69
2016-11-17 12:51:04 +00:00 by mfriedrich e5a6bdc
2016-11-17 12:51:04 +00:00 by mfriedrich 099fc76
2016-11-17 12:51:04 +00:00 by mfriedrich bef53ac
2016-11-17 12:51:04 +00:00 by mfriedrich 46d7145
2016-12-05 15:37:31 +00:00 by mfriedrich 338f5c0
Relations:
The text was updated successfully, but these errors were encountered: