New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dev.icinga.com #9406] Selective cluster reconnecting breaks client communication #3061
Comments
Updated by mfrosch on 2015-06-11 20:54:29 +00:00
|
Updated by mfrosch on 2015-06-11 21:08:35 +00:00 Please check branch support/selective-reconnect-9406 https://git.icinga.org/?p=icinga2.git;a=commit;hb=support/selective-reconnect-9406;js=1 For documentation:
|
Updated by mfriedrich on 2015-06-12 16:35:33 +00:00
That diff isn't really readable w/o additional comments - better explain that by picture on Monday. Wrong git branch btw. |
Updated by mfrosch on 2015-06-15 12:50:03 +00:00
Applied in changeset cfbe82d. |
Updated by mfriedrich on 2015-06-15 13:11:22 +00:00
|
Updated by mfriedrich on 2015-06-17 08:06:04 +00:00
|
Updated by mfriedrich on 2015-06-18 08:54:45 +00:00
|
Updated by mfriedrich on 2015-06-18 08:57:48 +00:00
|
Updated by mfriedrich on 2015-07-02 18:25:20 +00:00
|
Updated by mfriedrich on 2015-07-10 08:35:21 +00:00
I forgot to cherry-pick that into support/2.3, mea culpa. Seems 2.3.5 was not tested for resolving the issue after release too. |
This issue has been migrated from Redmine: https://dev.icinga.com/issues/9406
Created by mfrosch on 2015-06-11 20:53:56 +00:00
Assignee: mfrosch
Status: Resolved (closed on 2015-06-15 12:50:03 +00:00)
Target Version: 2.3.7
Last Update: 2015-07-10 08:35:20 +00:00 (in Redmine)
I took some time to analyze a reconnect problem I'm experiencing.
The setup is as follows:
Every "remote" check is distributed via comand_endpoint, to local scheduled checks on the agents.
Now, when I boot everything up:
Everything is running fine, but now when I reload one of the agents, ALL command_endpoint checks icingaB tries to do for icingaX fail.
And that because of "is not connected to".
By investigating the code I found out that the selective reconnecting of Icinga 2 connections is causing us trouble!
[master zone problem] icingaB thinks, icingaA is the master (because of internal determination) and it has not to reconnect to any other endpoint
[agent problem] On the other hand, the agent thinks, hey I'm already connected to icingaA, and thats find, so I have not to connect to icingaB (Not connecting to Zone 'master' because we're already connected to it.)
Thats were this reconnect problem comes from.
I'll fiddle around with the code and push a branch for clarification.
We should discuss this on monday in detail!!
Changesets
2015-06-11 21:02:13 +00:00 by mfrosch 7ce9de0
2015-06-11 21:06:16 +00:00 by mfrosch dd1b5c0
2015-06-15 08:20:21 +00:00 by mfrosch ac0db02
2015-06-15 12:47:04 +00:00 by mfrosch cfbe82d
2015-07-10 08:32:28 +00:00 by mfrosch 97f4875
Relations:
The text was updated successfully, but these errors were encountered: