You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Created by spjmurray on 2016-06-21 14:42:06 +00:00
Assignee: spjmurray
Status: Resolved (closed on 2016-06-22 07:25:52 +00:00)
Target Version: 2.5.0
Last Update: 2016-06-22 07:25:52 +00:00 (in Redmine)
Icinga Version: 2.4.10
Backport?: Not yet backported
Include in Changelog: 1
This is the annoying issue that has plagued me due to restarting an icinga2 satellite heavily under load during log replay. What I've managed to derive from a GDB session:
Agent establishes TCP connection to parent
Parent goes down uncleanly/FIN packet never arrives
Agent ApiListener thread is sat waiting for the TLS handshake to complete/fail
All SocketEvents threads are sat happily in epoll_wait() - why the hell the established socket isn't ever ready for POLLOUT I have no idea, feel free to discuss
What I propose: enable keep alive packets on all TcpSockets. Hopefully this will generate EPOLLERR, SSL_do_handshake will fail in TlsStream::OnEvent, The next iteration of ApiListener::ApiTimerHandler should work... probably maybe :)
I've got plenty of debug should you need it, plus for a short while I have the GDB session still open.
Fix hanging API connections
There was a problem identified where an upstream API connection was found hanging waiting
for a TLS handshake to complete. Seeingly the TCP connection was ESTABLISHED locally but
not cleanly terminated remotely. The Socket events layer never triggered the TLS handshake
oddly. This however enables TCP keep alive packets to detect broken connections, raising
EPOLLERR and breaking the deadlock condition so that the agent will attempt to reconnect
at a later time.
fixes #12003
Signed-off-by: Gunnar Beutner <gunnar.beutner@netways.de>
Updated by spjmurray on 2016-06-21 14:48:01 +00:00
File added 0001-Fix-Hanging-API-Connections.patch
From ae4f933cda89b3a4530c87f0fe673fccebce9aec Mon Sep 17 00:00:00 2001
From: Simon Murray <spjmurray@yahoo.co.uk>
Date: Tue, 21 Jun 2016 15:46:53 +0100
Subject: [PATCH] Fix Hanging API Connections
There was a problem identified where an upstream API connection was found hanging waiting
for a TLS handshake to complete. Seeingly the TCP connection was ESTABLISHED locally but
not cleanly terminated remotely. The Socket events layer never triggered the TLS handshake
oddly. This however enables TCP keep alive packets to detect broken connections, raising
EPOLLERR and breaking the deadlock condition so that the agent will attempt to reconnect
at a later time.
This issue has been migrated from Redmine: https://dev.icinga.com/issues/12003
Created by spjmurray on 2016-06-21 14:42:06 +00:00
Assignee: spjmurray
Status: Resolved (closed on 2016-06-22 07:25:52 +00:00)
Target Version: 2.5.0
Last Update: 2016-06-22 07:25:52 +00:00 (in Redmine)
This is the annoying issue that has plagued me due to restarting an icinga2 satellite heavily under load during log replay. What I've managed to derive from a GDB session:
What I propose: enable keep alive packets on all TcpSockets. Hopefully this will generate EPOLLERR, SSL_do_handshake will fail in TlsStream::OnEvent, The next iteration of ApiListener::ApiTimerHandler should work... probably maybe :)
I've got plenty of debug should you need it, plus for a short while I have the GDB session still open.
Attachments
Changesets
2016-06-22 07:25:00 +00:00 by spjmurray e3645aa
2016-07-05 11:16:14 +00:00 by mfriedrich 85afec8
Relations:
The text was updated successfully, but these errors were encountered: