New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dev.icinga.com #9976] API Client on checkers not reconnecting after reload/restart #3307
Comments
Updated by mfrosch on 2015-08-24 08:03:57 +00:00
|
Updated by mfrosch on 2015-08-24 08:49:13 +00:00
|
Updated by mfrosch on 2015-08-24 08:50:21 +00:00 We try to fix this with #9986 |
Updated by rhillmann on 2015-08-24 09:12:15 +00:00 probably this is related to #9798. I fixed the connection problems by setting net.ipv4.tcp_orphan_retries to 5 |
Updated by mfrosch on 2015-08-25 12:17:30 +00:00
Please try to set log_rotation to "0" on all Endpoints that are only a agent. This should disable any massive log read on the master, and will only allow Agent -> Master messages being spooled in a log (agent side) Better solution will be something like #9730 (So we can test if this is not a TCP or other connection problem) |
Updated by mfrosch on 2015-08-25 12:17:49 +00:00
|
Updated by mfrosch on 2015-08-31 11:24:07 +00:00
|
Updated by mfriedrich on 2015-08-31 13:44:47 +00:00
|
Updated by mfrosch on 2015-08-31 14:28:08 +00:00
|
Updated by mwaldmueller on 2015-09-07 05:57:06 +00:00
I've tried the current snapshot and set net.ipv4.tcp_orphan_retries to 5, but the problem still occurs, annexed a tcpdump. Now I've set log_duration to "0" and will update the ticket soon... |
Updated by mwaldmueller on 2015-09-09 12:10:24 +00:00 Unfortunately setting "log_duration" to "0" doesn't solve the problem. |
Updated by mfriedrich on 2015-09-12 09:10:09 +00:00 Can you test the snapshot packages including a fix for #10002? |
Updated by mfriedrich on 2015-09-12 09:10:23 +00:00
|
Updated by mwaldmueller on 2015-09-21 13:41:30 +00:00 I've installed the snapshot packages on the master and on the checkers, but without success. The "heartbeat"-problem still occurs. |
Updated by mfriedrich on 2015-09-29 15:21:44 +00:00
No symbol table info available.
No symbol table info available.
No symbol table info available.
No symbol table info available.
No symbol table info available.
No symbol table info available.
No symbol table info available.
No symbol table info available.
No symbol table info available.
No symbol table info available.
No symbol table info available.
No symbol table info available.
No symbol table info available. |
Updated by mfriedrich on 2015-09-29 15:21:58 +00:00
|
Updated by gbeutner on 2015-10-16 12:35:00 +00:00
I'm fairly certain this is fixed in the master branch. |
This issue has been migrated from Redmine: https://dev.icinga.com/issues/9976
Created by mwaldmueller on 2015-08-21 09:39:10 +00:00
Assignee: (none)
Status: Closed (closed on 2015-10-16 12:35:00 +00:00)
Target Version: (none)
Last Update: 2015-10-16 12:35:00 +00:00 (in Redmine)
This is related to #8712, the problem still exists.
My setup:
Icinga 2 log of checker:
[2015-08-12 17:14:30 +0200] information/ApiClient: Not sending heartbeat for endpoint 'checker.localdomain' because we're replaying the log for it.
[2015-08-12 17:14:40 +0200] information/ApiClient: Not sending heartbeat for endpoint 'checker.localdomain' because we're replaying the log for it.
[2015-08-12 17:14:50 +0200] information/ApiClient: Not sending heartbeat for endpoint 'checker.localdomain' because we're replaying the log for it.
Only a restart of the Icinga 2 daemon helps to solve the problem. The GDB-traces are attached to the related issue.
Furthermore I think that the integrated cluster check should be able to determine such "hanging" clusternodes.
Attachments
Changesets
2015-09-29 14:03:38 +00:00 by mfriedrich 905de04
2015-09-30 14:39:36 +00:00 by (unknown) c1892a2
Relations:
The text was updated successfully, but these errors were encountered: