Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev.icinga.com #13907] Crash in HttpServerConnection::ProcessMessageAsync #4910

Closed
icinga-migration opened this issue Jan 9, 2017 · 18 comments
Labels
area/api REST API bug Something isn't working core/crash Shouldn't happen, requires attention

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/13907

Created by Nagodar on 2017-01-09 09:36:48 +00:00

Assignee: (none)
Status: New
Target Version: 2.6.2
Last Update: 2017-01-16 15:01:46 +00:00 (in Redmine)

Icinga Version: 2.6.0
Backport?: Not yet backported
Include in Changelog: 1

icinga2 crash report:

Application information:
Application version: r2.6.0-1
Installation root: /usr
Sysconf directory: /etc
Run directory: /run
Local state directory: /var
Package data directory: /usr/share/icinga2
State path: /var/lib/icinga2/icinga2.state
Modified attributes path: /var/lib/icinga2/modified-attributes.conf
Objects path: /var/cache/icinga2/icinga2.debug
Vars path: /var/cache/icinga2/icinga2.vars
PID path: /run/icinga2/icinga2.pid

System information:
Platform: Ubuntu
Platform version: 14.04.5 LTS, Trusty Tahr
Kernel: Linux
Kernel version: 3.13.0-98-generic
Architecture: x86_64

Build information:
Compiler: GNU 4.8.4
Build host: lgw01-29
Stacktrace:

(0) libpthread.so.0: (+0x10330) [0x2b0b5aca3330]
(1) libc.so.6: gsignal (+0x37) [0x2b0b5bcffc37]
(2) libc.so.6: abort (+0x148) [0x2b0b5bd03028]
(3) libc.so.6: (+0x2fbf6) [0x2b0b5bcf8bf6]
(4) libc.so.6: (+0x2fca2) [0x2b0b5bcf8ca2]
(5) libremote.so.2.6.0: (+0x44c83) [0x2b0b5d935c83]
(6) libremote.so.2.6.0: icinga::HttpServerConnection::ProcessMessageAsync(icinga::HttpRequest&) (+0xa05) [0x2b0b5d95f235]
(7) libbase.so.2.6.0: icinga::WorkQueue::WorkerThreadProc() (+0x483) [0x2b0b5af29653]
(8) libboost_thread.so.1.54.0: (+0xba4a) [0x2b0b5a616a4a]
(9) libpthread.so.0: (+0x8184) [0x2b0b5ac9b184]
(10) libc.so.6: clone (+0x6d) [0x2b0b5bdc337d]


  • This would indicate a runtime problem or configuration error. If you believe this is a bug in Icinga 2
  • please submit a bug report at https://dev.icinga.org/ and include this stack trace as well as any other
  • information that might be useful in order to reproduce this problem.
    *****
@icinga-migration
Copy link
Author

Updated by jflach on 2017-01-09 16:03:51 +00:00

  • Status changed from New to Feedback
  • Assigned to set to Nagodar

That's not much to work with, can you provide a log from the time of the crash? Also please describe what you were trying to do, like deploying new config, restarting icinga 2, just letting it run, running api requests etc.

@icinga-migration
Copy link
Author

Updated by Nagodar on 2017-01-09 17:01:05 +00:00

Hello,

icinga2 crashes (more or less) once a day (we have multiple autoscaling groups that register/unregister servers constantly). The last log lines for different crashes:

[2017-01-07 22:05:53 +0100] information/ApiListener: Sending replay log for endpoint 'xxxxxx'.
[2017-01-07 22:05:54 +0100] warning/TlsStream: TLS stream was disconnected.
[2017-01-07 22:05:54 +0100] information/ApiListener: New client connection from [127.0.0.1]:51673 (no client certificate)
[2017-01-07 22:05:54 +0100] information/HttpServerConnection: Request: GET /v1/config/packages (from [127.0.0.1]:51673, user: xxxx)
[2017-01-07 22:05:54 +0100] warning/TlsStream: TLS stream was disconnected.
[2017-01-07 22:05:54 +0100] information/ApiListener: New client connection from [127.0.0.1]:51674 (no client certificate)
[2017-01-07 22:05:54 +0100] information/HttpServerConnection: Request: GET /v1/config/packages (from [127.0.0.1]:51674, user: xxxx)
[2017-01-07 22:05:54 +0100] warning/TlsStream: TLS stream was disconnected.
[2017-01-07 22:05:54 +0100] information/ApiListener: New client connection from [127.0.0.1]:51675 (no client certificate)
[2017-01-07 22:05:54 +0100] information/HttpServerConnection: Request: GET /v1/config/packages (from [127.0.0.1]:51675, user: xxxx)
[2017-01-07 22:05:54 +0100] warning/TlsStream: TLS stream was disconnected.
[2017-01-07 22:05:54 +0100] information/ApiListener: New client connection from [127.0.0.1]:51676 (no client certificate)
[2017-01-07 22:05:54 +0100] information/HttpServerConnection: Request: GET /v1/config/packages (from [127.0.0.1]:51676, user: xxxx)

[2017-01-02 19:02:29 +0100] information/ApiListener: Sending replay log for endpoint 'xxxxxxx'.
[2017-01-02 19:02:29 +0100] information/ApiListener: New client connection from [127.0.0.1]:49479 (no client certificate)
[2017-01-02 19:02:29 +0100] information/HttpServerConnection: Request: GET /v1/config/packages (from [127.0.0.1]:49479, user: xxxx)
[2017-01-02 19:02:30 +0100] warning/TlsStream: TLS stream was disconnected.
[2017-01-02 19:02:30 +0100] information/ApiListener: New client connection from [127.0.0.1]:49480 (no client certificate)
[2017-01-02 19:02:30 +0100] information/HttpServerConnection: Request: GET /v1/config/packages (from [127.0.0.1]:49480, user: xxxx)
[2017-01-02 19:02:30 +0100] warning/TlsStream: TLS stream was disconnected.
[2017-01-02 19:02:30 +0100] information/ApiListener: New client connection from [127.0.0.1]:49481 (no client certificate)
[2017-01-02 19:02:30 +0100] information/HttpServerConnection: Request: GET /v1/config/packages (from [127.0.0.1]:49481, user: xxxx)
[2017-01-02 19:02:30 +0100] warning/TlsStream: TLS stream was disconnected.
[2017-01-02 19:02:30 +0100] information/ApiListener: New client connection from [127.0.0.1]:49482 (no client certificate)
[2017-01-02 19:02:30 +0100] information/HttpServerConnection: Request: GET /v1/config/packages (from [127.0.0.1]:49482, user: xxxx)

[2017-01-02 19:27:02 +0100] information/ApiListener: Sending replay log for endpoint 'xxxxx'.
[2017-01-02 19:27:02 +0100] warning/TlsStream: TLS stream was disconnected.
[2017-01-02 19:27:02 +0100] information/ApiListener: New client connection from [127.0.0.1]:33654 (no client certificate)
[2017-01-02 19:27:02 +0100] information/HttpServerConnection: Request: GET /v1/config/packages (from [127.0.0.1]:33654, user: xxxx)
[2017-01-02 19:27:02 +0100] warning/TlsStream: TLS stream was disconnected.
[2017-01-02 19:27:02 +0100] information/ApiListener: New client connection from [127.0.0.1]:33655 (no client certificate)
[2017-01-02 19:27:02 +0100] information/HttpServerConnection: Request: GET /v1/config/packages (from [127.0.0.1]:33655, user: xxxx)

regards

@icinga-migration
Copy link
Author

Updated by denny on 2017-01-10 12:50:29 +00:00

hi,

had the same problem:

# cat report.1483880012.058164
Application information:
  Application version: r2.6.0-1
  Installation root: /usr
  Sysconf directory: /etc
  Run directory: /run
  Local state directory: /var
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid

System information:
  Platform: Debian GNU/Linux
  Platform version: 8 (jessie)
  Kernel: Linux
  Kernel version: 3.16.0-4-amd64
  Architecture: x86_64

Build information:
  Compiler: GNU 4.9.2
  Build host: smithers
Stacktrace:

    (0) libpthread.so.0:  (+0xf8d0) [0x7fb3966dc8d0]
    (1) libc.so.6: gsignal (+0x37) [0x7fb393bea067]
    (2) libc.so.6: abort (+0x148) [0x7fb393beb448]
    (3) libc.so.6:  (+0x2e266) [0x7fb393be3266]
    (4) libc.so.6:  (+0x2e312) [0x7fb393be3312]
    (5) libremote.so.2.6.0:  (+0x42553) [0x7fb395a6f553]
    (6) libremote.so.2.6.0: icinga::HttpServerConnection::ProcessMessageAsync(icinga::HttpRequest&) (+0xbe5) [0x7fb395aa02c5]
    (7) libbase.so.2.6.0: icinga::WorkQueue::WorkerThreadProc() (+0x4eb) [0x7fb3963a279b]
    (8) libboost_thread.so.1.55.0:  (+0xdaea) [0x7fb397087aea]
    (9) libpthread.so.0:  (+0x80a4) [0x7fb3966d50a4]
    (10) libc.so.6: clone (+0x6d) [0x7fb393c9d62d]

In icinga2.err

icinga2: /usr/include/boost/smart_ptr/intrusive_ptr.hpp:162: T* boost::intrusive_ptr::operator->() const [with T = icinga::Socket]: Assertion `px != 0' failed.
Caught SIGABRT.
Current time: 2017-01-08 13:53:32 +0100
  • Debian Jessie 8.6
  • Icinga2 1.13.3-1~debmon80+1
  • KVM VM

@icinga-migration
Copy link
Author

Updated by Nagodar on 2017-01-10 15:43:06 +00:00

My err log file:

icinga2: /usr/include/boost/smart_ptr/intrusive_ptr.hpp:162: T* boost::intrusive_ptr::operator->() const [with T = icinga::Socket]: Assertion `px != 0' failed.
Caught SIGABRT.
Current time: 2017-01-02 19:02:30 +0100

icinga2: /usr/include/boost/smart_ptr/intrusive_ptr.hpp:162: T* boost::intrusive_ptr::operator->() const [with T = icinga::Socket]: Assertion `px != 0' failed.
Caught SIGABRT.
Current time: 2017-01-02 19:27:02 +0100

icinga2: /usr/include/boost/smart_ptr/intrusive_ptr.hpp:162: T* boost::intrusive_ptr::operator->() const [with T = icinga::Socket]: Assertion `px != 0' failed.
Caught SIGABRT.
Current time: 2017-01-07 22:05:54 +0100

@icinga-migration
Copy link
Author

Updated by jflach on 2017-01-11 08:49:25 +00:00

  • Assigned to deleted Nagodar

So it crashes randomly during API connections... This will be hard to reproduce as I have not had that issue.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2017-01-11 13:19:12 +00:00

  • Target Version set to 2.6.2

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2017-01-11 13:19:39 +00:00

  • Status changed from Feedback to New

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2017-01-11 13:29:29 +00:00

  • Subject changed from icinga2 crash to Crash in HttpServerConnection::ProcessMessageAsync

@icinga-migration
Copy link
Author

Updated by Nagodar on 2017-01-16 15:01:46 +00:00

Can I attach any other worth information?

@icinga-migration icinga-migration added bug Something isn't working area/api REST API labels Jan 17, 2017
@icinga-migration icinga-migration added this to the 2.6.2 milestone Jan 17, 2017
@dnsmichi
Copy link
Contributor

We'll deal with it right after releasing v2.6.1 which addresses some major issues introduced with 2.6.0.

@linuxmail
Copy link

hi,

maybe it addresses the same bug, but adding it anyway:

My icinga2 crashed again: 2.6.0

[2017-01-19 16:03:02 +0100] warning/Process: Killing process group 24042 ('/usr/lib/nagios/plugins/check_nwc_health' '--authpassword' '' '--authprotocol' 'sha' '--mode' 'interface-usage' '--name' '3' '--privpassword' 'secret' '--privprotocol' 'aes' '--protocol' '3' '--username' 'icinga' '-H' '1.2.3.4') after timeout of 60 seconds
[2017-01-19 16:03:02 +0100] warning/Process: Killing process group 24035 ('/usr/lib/nagios/plugins/check_nwc_health' '--authpassword' '' '--authprotocol' 'sha' '--mode' 'interface-discards' '--name' '3' '--privpassword' 'secret' '--privprotocol' 'aes' '--protocol' '3' '--username' 'icinga' '-H' '1.2.3.4') after timeout of 60 seconds
[2017-01-19 16:03:06 +0100] information/ConfigObject: Dumping program state to file '/var/lib/icinga2/icinga2.state'
[2017-01-19 16:03:06 +0100] information/HttpServerConnection: Request: GET /v1/status/IcingaApplication (from [130.180.66.250]:35834, user: dashing)
[2017-01-19 16:03:06 +0100] warning/Process: PID 24043 was terminated by signal 9 (Killed)
[2017-01-19 16:03:06 +0100] warning/Process: Killing process group 24053 ('/usr/lib/nagios/plugins/check_nwc_health' '--authpassword' '' '--authprotocol' 'sha' '--mode' 'interface-errors' '--name' '2' '--privpassword' 'secret' '--privprotocol' 'aes' '--protocol' '3' '--username' 'icinga' '-H' '1.2.3.4') after timeout of 60 seconds
[2017-01-19 16:03:06 +0100] information/HttpServerConnection: Request: GET /v1/status/IcingaApplication (from [130.180.66.250]:35836, user: dashing)
[2017-01-19 16:03:07 +0100] warning/Process: PID 24053 was terminated by signal 9 (Killed)
[2017-01-19 16:03:07 +0100] information/ApiListener: New client connection from [130.180.66.250]:35866 (no client certificate)
[2017-01-19 16:03:07 +0100] information/IdoMysqlConnection: Query queue items: 5, query rate:  4/s (240/min 1693/5min 5669/15min);
[2017-01-19 16:03:07 +0100] information/HttpServerConnection: Request: GET /v1/status/CIB (from [130.180.66.250]:35866, user: dashing)
[2017-01-19 16:03:07 +0100] warning/ApiListener: Removing API client for endpoint 'henrietta.4lin.net'. 0 API clients left.
[2017-01-19 16:03:07 +0100] information/JsonRpcConnection: No messages for identity 'jhell.org' have been received in the last 60 seconds.
[2017-01-19 16:03:07 +0100] warning/JsonRpcConnection: API client disconnected for identity 'jhell.org'
[2017-01-19 16:03:07 +0100] warning/JsonRpcConnection: API client disconnected for identity 'jhell.org'
[2017-01-19 16:03:07 +0100] warning/ApiListener: Removing API client for endpoint 'jhell.org'. 0 API clients left.
[2017-01-19 16:03:07 +0100] information/ApiListener: New client connection from [130.180.66.250]:35870 (no client certificate)
[2017-01-19 16:03:07 +0100] information/ApiListener: New client connection from [130.180.66.250]:35888 (no client certificate)
[2017-01-19 16:03:07 +0100] information/HttpServerConnection: Request: GET /v1/status/CIB (from [130.180.66.250]:35870, user: dashing)
[2017-01-19 16:03:07 +0100] information/HttpServerConnection: Request: GET /v1/status/CIB (from [130.180.66.250]:35888, user: dashing)
[2017-01-19 16:03:07 +0100] information/ApiListener: New client connection from [130.180.66.250]:35877 (no client certificate)
[2017-01-19 16:03:07 +0100] warning/Process: PID 24036 was terminated by signal 9 (Killed)
[2017-01-19 16:03:07 +0100] warning/Process: Killing process group 24037 ('/usr/lib/nagios/plugins/check_nwc_health' '--authpassword' '' '--authprotocol' 'sha' '--mode' 'interface-status' '--name' '2' '--privpassword' 'secret' '--privprotocol' 'aes' '--protocol' '3' '--username' 'icinga' '-H' '1.2.3.5') after timeout of 60 seconds
[2017-01-19 16:03:07 +0100] critical/HttpServerConnection: Unhandled exception while processing Http request: Error: Tried to read/write from a closed socket.
[2017-01-19 16:03:07 +0100] critical/HttpServerConnection: Unhandled exception while processing Http request: Error: Tried to read/write from a closed socket.
[2017-01-19 16:03:07 +0100] critical/HttpServerConnection: Unhandled exception while processing Http request: Error: Tried to read/write from a closed socket.
[2017-01-19 16:03:07 +0100] critical/HttpServerConnection: Unhandled exception while processing Http request: Error: Tried to read/write from a closed socket.
root@mon3:/var/log/icinga2# grep '16:03' icinga2.log -A 50
[2017-01-19 16:03:02 +0100] warning/Process: Killing process group 24042 ('/usr/lib/nagios/plugins/check_nwc_health' '--authpassword' '' '--authprotocol' 'sha' '--mode' 'interface-usage' '--name' '3' '--privpassword' 'secret' '--privprotocol' 'aes' '--protocol' '3' '--username' 'icinga' '-H' '1.2.3.4') after timeout of 60 seconds
[2017-01-19 16:03:02 +0100] warning/Process: Killing process group 24035 ('/usr/lib/nagios/plugins/check_nwc_health' '--authpassword' '' '--authprotocol' 'sha' '--mode' 'interface-discards' '--name' '3' '--privpassword' 'secret' '--privprotocol' 'aes' '--protocol' '3' '--username' 'icinga' '-H' '1.2.3.4') after timeout of 60 seconds
[2017-01-19 16:03:06 +0100] information/ConfigObject: Dumping program state to file '/var/lib/icinga2/icinga2.state'
[2017-01-19 16:03:06 +0100] information/HttpServerConnection: Request: GET /v1/status/IcingaApplication (from [130.180.66.250]:35834, user: dashing)
[2017-01-19 16:03:06 +0100] warning/Process: PID 24043 was terminated by signal 9 (Killed)
[2017-01-19 16:03:06 +0100] warning/Process: Killing process group 24053 ('/usr/lib/nagios/plugins/check_nwc_health' '--authpassword' '' '--authprotocol' 'sha' '--mode' 'interface-errors' '--name' '2' '--privpassword' 'secret' '--privprotocol' 'aes' '--protocol' '3' '--username' 'icinga' '-H' '1.2.3.4') after timeout of 60 seconds
[2017-01-19 16:03:06 +0100] information/HttpServerConnection: Request: GET /v1/status/IcingaApplication (from [130.180.66.250]:35836, user: dashing)
[2017-01-19 16:03:07 +0100] warning/Process: PID 24053 was terminated by signal 9 (Killed)
[2017-01-19 16:03:07 +0100] information/ApiListener: New client connection from [130.180.66.250]:35866 (no client certificate)
[2017-01-19 16:03:07 +0100] information/IdoMysqlConnection: Query queue items: 5, query rate:  4/s (240/min 1693/5min 5669/15min);
[2017-01-19 16:03:07 +0100] information/HttpServerConnection: Request: GET /v1/status/CIB (from [130.180.66.250]:35866, user: dashing)
[2017-01-19 16:03:07 +0100] warning/ApiListener: Removing API client for endpoint 'henrietta.4lin.net'. 0 API clients left.
[2017-01-19 16:03:07 +0100] information/JsonRpcConnection: No messages for identity 'jhell.org' have been received in the last 60 seconds.
[2017-01-19 16:03:07 +0100] warning/JsonRpcConnection: API client disconnected for identity 'jhell.org'
[2017-01-19 16:03:07 +0100] warning/JsonRpcConnection: API client disconnected for identity 'jhell.org'
[2017-01-19 16:03:07 +0100] warning/ApiListener: Removing API client for endpoint 'jhell.org'. 0 API clients left.
[2017-01-19 16:03:07 +0100] information/ApiListener: New client connection from [130.180.66.250]:35870 (no client certificate)
[2017-01-19 16:03:07 +0100] information/ApiListener: New client connection from [130.180.66.250]:35888 (no client certificate)
[2017-01-19 16:03:07 +0100] information/HttpServerConnection: Request: GET /v1/status/CIB (from [130.180.66.250]:35870, user: dashing)
[2017-01-19 16:03:07 +0100] information/HttpServerConnection: Request: GET /v1/status/CIB (from [130.180.66.250]:35888, user: dashing)
[2017-01-19 16:03:07 +0100] information/ApiListener: New client connection from [130.180.66.250]:35877 (no client certificate)
[2017-01-19 16:03:07 +0100] warning/Process: PID 24036 was terminated by signal 9 (Killed)
[2017-01-19 16:03:07 +0100] warning/Process: Killing process group 24037 ('/usr/lib/nagios/plugins/check_nwc_health' '--authpassword' '' '--authprotocol' 'sha' '--mode' 'interface-status' '--name' '2' '--privpassword' 'secret' '--privprotocol' 'aes' '--protocol' '3' '--username' 'icinga' '-H' '1.2.3.5') after timeout of 60 seconds
[2017-01-19 16:03:07 +0100] critical/HttpServerConnection: Unhandled exception while processing Http request: Error: Tried to read/write from a closed socket.

	(0) libbase.so.2.6.0: <unknown function> (+0xdb0af) [0x2b24490ef0af]
	(1) libbase.so.2.6.0: <unknown function> (+0xdb147) [0x2b24490ef147]
	(2) libbase.so.2.6.0: icinga::SocketEventEngineEpoll::ChangeEvents(icinga::SocketEvents*, int) (+0x164) [0x2b2449073164]
	(3) libbase.so.2.6.0: icinga::TlsStream::Write(void const*, unsigned long) (+0x65) [0x2b2449072965]
	(4) libremote.so.2.6.0: icinga::HttpResponse::SetStatus(int, icinga::String const&) (+0x223) [0x2b24499ab153]
	(5) libremote.so.2.6.0: icinga::StatusHandler::HandleRequest(boost::intrusive_ptr<icinga::ApiUser> const&, icinga::HttpRequest&, icinga::HttpResponse&, boost::intrusive_ptr<icinga::Dictionary> const&) (+0x59a) [0x2b24499cacca]
	(6) libremote.so.2.6.0: icinga::HttpHandler::ProcessRequest(boost::intrusive_ptr<icinga::ApiUser> const&, icinga::HttpRequest&, icinga::HttpResponse&) (+0x5c2) [0x2b24499abcf2]
	(7) libremote.so.2.6.0: icinga::HttpServerConnection::ProcessMessageAsync(icinga::HttpRequest&) (+0x4f6) [0x2b24499aebd6]
	(8) libbase.so.2.6.0: icinga::WorkQueue::WorkerThreadProc() (+0x4eb) [0x2b244909179b]
	(9) libboost_thread.so.1.55.0: <unknown function> (+0xdaea) [0x2b2448459aea]
	(10) libpthread.so.0: <unknown function> (+0x80a4) [0x2b2448dff0a4]
	(11) libc.so.6: clone (+0x6d) [0x2b244b86962d]


[2017-01-19 16:03:07 +0100] critical/HttpServerConnection: Unhandled exception while processing Http request: Error: Tried to read/write from a closed socket.

	(0) libbase.so.2.6.0: <unknown function> (+0xdb0af) [0x2b24490ef0af]
	(1) libbase.so.2.6.0: <unknown function> (+0xdb147) [0x2b24490ef147]
	(2) libbase.so.2.6.0: icinga::SocketEventEngineEpoll::ChangeEvents(icinga::SocketEvents*, int) (+0x164) [0x2b2449073164]
	(3) libbase.so.2.6.0: icinga::TlsStream::Write(void const*, unsigned long) (+0x65) [0x2b2449072965]
	(4) libremote.so.2.6.0: icinga::HttpResponse::SetStatus(int, icinga::String const&) (+0x223) [0x2b24499ab153]
	(5) libremote.so.2.6.0: icinga::StatusHandler::HandleRequest(boost::intrusive_ptr<icinga::ApiUser> const&, icinga::HttpRequest&, icinga::HttpResponse&, boost::intrusive_ptr<icinga::Dictionary> const&) (+0x59a) [0x2b24499cacca]
	(6) libremote.so.2.6.0: icinga::HttpHandler::ProcessRequest(boost::intrusive_ptr<icinga::ApiUser> const&, icinga::HttpRequest&, icinga::HttpResponse&) (+0x5c2) [0x2b24499abcf2]
	(7) libremote.so.2.6.0: icinga::HttpServerConnection::ProcessMessageAsync(icinga::HttpRequest&) (+0x4f6) [0x2b24499aebd6]
	(8) libbase.so.2.6.0: icinga::WorkQueue::WorkerThreadProc() (+0x4eb) [0x2b244909179b]
	(9) libboost_thread.so.1.55.0: <unknown function> (+0xdaea) [0x2b2448459aea]
	(10) libpthread.so.0: <unknown function> (+0x80a4) [0x2b2448dff0a4]
	(11) libc.so.6: clone (+0x6d) [0x2b244b86962d]


[2017-01-19 16:03:07 +0100] critical/HttpServerConnection: Unhandled exception while processing Http request: Error: Tried to read/write from a closed socket.

	(0) libbase.so.2.6.0: <unknown function> (+0xdb0af) [0x2b24490ef0af]
	(1) libbase.so.2.6.0: <unknown function> (+0xdb147) [0x2b24490ef147]
	(2) libbase.so.2.6.0: icinga::SocketEventEngineEpoll::ChangeEvents(icinga::SocketEvents*, int) (+0x164) [0x2b2449073164]
	(3) libbase.so.2.6.0: icinga::TlsStream::Write(void const*, unsigned long) (+0x65) [0x2b2449072965]
	(4) libremote.so.2.6.0: icinga::HttpResponse::SetStatus(int, icinga::String const&) (+0x223) [0x2b24499ab153]
	(5) libremote.so.2.6.0: icinga::StatusHandler::HandleRequest(boost::intrusive_ptr<icinga::ApiUser> const&, icinga::HttpRequest&, icinga::HttpResponse&, boost::intrusive_ptr<icinga::Dictionary> const&) (+0x59a) [0x2b24499cacca]
	(6) libremote.so.2.6.0: icinga::HttpHandler::ProcessRequest(boost::intrusive_ptr<icinga::ApiUser> const&, icinga::HttpRequest&, icinga::HttpResponse&) (+0x5c2) [0x2b24499abcf2]
	(7) libremote.so.2.6.0: icinga::HttpServerConnection::ProcessMessageAsync(icinga::HttpRequest&) (+0x4f6) [0x2b24499aebd6]
	(8) libbase.so.2.6.0: icinga::WorkQueue::WorkerThreadProc() (+0x4eb) [0x2b244909179b]
	(9) libboost_thread.so.1.55.0: <unknown function> (+0xdaea) [0x2b2448459aea]
	(10) libpthread.so.0: <unknown function> (+0x80a4) [0x2b2448dff0a4]
	(11) libc.so.6: clone (+0x6d) [0x2b244b86962d]


[2017-01-19 16:03:07 +0100] critical/HttpServerConnection: Unhandled exception while processing Http request: Error: Tried to read/write from a closed socket.

	(0) libbase.so.2.6.0: <unknown function> (+0xdb0af) [0x2b24490ef0af]
	(1) libbase.so.2.6.0: <unknown function> (+0xdb147) [0x2b24490ef147]
	(2) libbase.so.2.6.0: icinga::SocketEventEngineEpoll::ChangeEvents(icinga::SocketEvents*, int) (+0x164) [0x2b2449073164]
	(3) libbase.so.2.6.0: icinga::TlsStream::Write(void const*, unsigned long) (+0x65) [0x2b2449072965]
	(4) libremote.so.2.6.0: icinga::HttpResponse::SetStatus(int, icinga::String const&) (+0x223) [0x2b24499ab153]
	(5) libremote.so.2.6.0: icinga::StatusHandler::HandleRequest(boost::intrusive_ptr<icinga::ApiUser> const&, icinga::HttpRequest&, icinga::HttpResponse&, boost::intrusive_ptr<icinga::Dictionary> const&) (+0x59a) [0x2b24499cacca]
	(6) libremote.so.2.6.0: icinga::HttpHandler::ProcessRequest(boost::intrusive_ptr<icinga::ApiUser> const&, icinga::HttpRequest&, icinga::HttpResponse&) (+0x5c2) [0x2b24499abcf2]
	(7) libremote.so.2.6.0: icinga::HttpServerConnection::ProcessMessageAsync(icinga::HttpRequest&) (+0x4f6) [0x2b24499aebd6]
	(8) libbase.so.2.6.0: icinga::WorkQueue::WorkerThreadProc() (+0x4eb) [0x2b244909179b]
	(9) libboost_thread.so.1.55.0: <unknown function> (+0xdaea) [0x2b2448459aea]
	(10) libpthread.so.0: <unknown function> (+0x80a4) [0x2b2448dff0a4]
	(11) libc.so.6: clone (+0x6d) [0x2b244b86962d]

@gunnarbeutner gunnarbeutner added the core/crash Shouldn't happen, requires attention label Feb 7, 2017
@dnsmichi dnsmichi modified the milestones: 2.6.2, 2.7.0 Feb 8, 2017
@dnsmichi
Copy link
Contributor

Could be related: #5148

@dnsmichi dnsmichi modified the milestones: 2.8.0, 2.7.0 Jun 2, 2017
@Stefar77
Copy link
Contributor

Stefar77 commented Jul 14, 2017

Guessing the <unknown function>s are locks;
Could you try #5416 it fixed my problem with HTTP requests to the API
In #5419 are even more lock fixes for the API

dnsmichi pushed a commit that referenced this issue Sep 18, 2017
This was split from #5416 and #5419.

More patches from #5419 are pending.

refs #5419
refs #5418
refs #5416

refs #5408
refs #5148
refs #5007
refs #4968
refs #4910
@dnsmichi dnsmichi assigned dnsmichi and unassigned Al2Klimov Sep 25, 2017
@gunnarbeutner gunnarbeutner removed this from the 2.8.0 milestone Oct 16, 2017
@gunnarbeutner gunnarbeutner added this to the 2.9.0 milestone Oct 16, 2017
@n0npax
Copy link

n0npax commented Jan 18, 2018

Hi,

I've got same error in logs on ubuntu 14.04 and icinga 2.8.0-187

I'm able to reproduce it by setting short downtimes(about 2 sec) in loop.

icinga error log is empty.
Configuration is 2 masters(in zone) and 4 satellites (2 per zone)

icinga2 - The Icinga 2 network monitoring daemon (version: v2.8.0-187-g025abc3)

Copyright (c) 2012-2017 Icinga Development Team (https://www.icinga.com/)
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Application information:
  Installation root: /usr
  Sysconf directory: /etc
  Run directory: /run
  Local state directory: /var
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid

System information:
  Platform: Ubuntu
  Platform version: 14.04.1 LTS, Trusty Tahr
  Kernel: Linux
  Kernel version: 3.13.0-45-generic
  Architecture: x86_64

Build information:
  Compiler: GNU 4.8.2
  Build host: 024380082aea


[2018-01-17 16:40:26 +0000] warning/TlsStream: TLS stream was disconnected.
[2018-01-17 16:40:26 +0000] information/ApiListener: New client connection from [MY_DESK_IP_ADDR]:59939 (no client certificate)
[2018-01-17 16:40:26 +0000] information/HttpServerConnection: Request: POST /v1/actions/schedule-downtime (from [MY_DESK_IP_ADDR]:59939, user: MY_USER)
[2018-01-17 16:40:26 +0000] warning/TlsStream: TLS stream was disconnected.
[2018-01-17 16:40:26 +0000] information/ConfigItem: Committing config item(s).
[2018-01-17 16:40:26 +0000] warning/ApplyRule: Apply rule 'icinga-cluster' (in /etc/icinga2/conf.d/services-basic.conf: 1:0-1:29) for type 'Service' does not match anywhere!
[2018-01-17 16:40:26 +0000] warning/ApplyRule: Apply rule 'icinga-zone-caas' (in /etc/icinga2/conf.d/services-basic.conf: 8:1-8:32) for type 'Service' does not match anywhere!
[2018-01-17 16:40:26 +0000] information/ConfigItem: Instantiated 1 Downtime.
[2018-01-17 16:40:26 +0000] information/ConfigItem: Triggering Start signal for config items
[2018-01-17 16:40:26 +0000] information/ConfigItem: Activated all objects.
[2018-01-17 16:40:26 +0000] critical/HttpServerConnection: Unhandled exception while processing Http request: Error: Tried to read/write from a closed socket.
--
 
(0) libbase.so.2.8.0: <unknown function> (+0xeae9f) [0x2b1611f5de9f]
(1) libbase.so.2.8.0: <unknown function> (+0xeaf29) [0x2b1611f5df29]
(2) libbase.so.2.8.0: icinga::SocketEventEngineEpoll::ChangeEvents(icinga::SocketEvents*, int) (+0x147) [0x2b1611ee3267]
(3) libbase.so.2.8.0: icinga::TlsStream::Write(void const*, unsigned long) (+0x55) [0x2b1611eddf35]
(4) libremote.so.2.8.0: icinga::HttpResponse::SetStatus(int, icinga::String const&) (+0x1e5) [0x2b1614942aa5]
(5) libremote.so.2.8.0: icinga::ActionsHandler::HandleRequest(boost::intrusive_ptr<icinga::ApiUser> const&, icinga::HttpRequest&, icinga::HttpRespo
(6) libremote.so.2.8.0: icinga::HttpHandler::ProcessRequest(boost::intrusive_ptr<icinga::ApiUser> const&, icinga::HttpRequest&, icinga::HttpRespons
(7) libremote.so.2.8.0: icinga::HttpServerConnection::ProcessMessageAsync(icinga::HttpRequest&) (+0x747) [0x2b1614945967]
(8) libbase.so.2.8.0: icinga::WorkQueue::WorkerThreadProc() (+0x4ed) [0x2b1611ef767d]
(9) libboost_thread.so.1.54.0: <unknown function> (+0xba4a) [0x2b1613265a4a]
(10) libpthread.so.0: <unknown function> (+0x8182) [0x2b1611c5d182]
(11) libc.so.6: clone (+0x6d) [0x2b1612d8b00d]


@Crunsher
Copy link
Contributor

@n0npax Could you provide a minimal configuration example/script for those downtimes?

@n0npax
Copy link

n0npax commented Jan 22, 2018

@Crunsher , of course! Here additionals info and script:

Anyway, I just realized that haproxy may mess with traffic. Even if that's true, there's still:

critical/HttpServerConnection: Unhandled exception while processing Http request: Error: Tried to read/write from a closed socket

I'm not sure if You're still interested in following configuration.
Also sorry for no information about haproxy in previous comment.

Script

please run 'pip install python-icinga2api' before running script and fill fields with TODO
Generating issue may take up to few hours, so just let this work during night or whatever.
Script:

#!/usr/bin/env python

import time
from icinga2api.client import Client

server_name = 'localhost'  # haproxy to icinga-masters (roundrobin)
api_user = 'user' # TODO FILL
api_password = 'pass' # TODO FILL
api_port = 5665
mon_host = "https://{}:{}".format(server_name, api_port)

host = 'my_hostname.domain.net' # TODO FILL -> hostname which will be used for service downtime
service = 'service-name' # TODO FILL -> service name

client = Client(mon_host, api_user, api_password,
                ca_certificate='/etc/icinga2/ca.crt')


def schedule_service_downtime(hostname, service_name, comment, duration,
                              start_time=None, filter=None, filter_vars=None,
                              fixed=None, trigger_name=None):
    if not filter:
        filter = 'match("{}", host.name) && match("{}", service.name)'.format(
            hostname, service_name)
    return schedule_downtime('Service', filter, comment, duration, start_time,
                             filter_vars, fixed, trigger_name)


def schedule_downtime(object_type, filter, comment, duration, start_time=None,
                      filter_vars=None, fixed=None, trigger_name=None):
    if not start_time:
        start_time = time.time()

    end_time = start_time + duration
    user = api_user
    comment = "{}:{}".format(user, comment)
    results = client.actions.schedule_downtime(
        object_type, filter, user,
        comment, start_time, end_time, duration, filter_vars=filter_vars,
        fixed=fixed, trigger_name=trigger_name)
    for result in results['results']:
        if "Successfully scheduled downtime" not in result['status']:
            raise Exception
    return results

while True:
    try:
        schedule_service_downtime(host, service, "lets trigger the issue", 1)
    except Exception as e:
        print 'one of the issues generated'
        print e
    finally:
        time.sleep(2)

Configuration

So, I've got 4 satellites and 2 masters with classic configuration:

zone.conf

# master
object Endpoint "icinga-master-dev72.my_domain" {
    host = "icinga-master-dev72.my_domain"
    port = "5665"
}
object Endpoint "icinga-master-dev73.my_domain" {
    host = "icinga-master-dev73.my_domain"
    port = "5665"
}
object Zone "master" {
    endpoints = ["icinga-master-dev72.my_domain", "icinga-master-dev73.my_domain"]
}
# glob
object Zone "global-templates" {
  global = true
}
# sat 1
object Zone "sat1" {
    endpoints = ["icinga-satellite-dev86.my_domain", "icinga-satellite-dev87.my_domain"]
    parent = "master" 
}
object Endpoint  "icinga-satellite-dev86.my_domain" {
   host = "icinga-satellite-dev86.my_domain"
   port = "5665" 
}
object Endpoint  "icinga-satellite-dev87.my_domain" {
   host = "icinga-satellite-dev87.my_domain"
   port = "5665" 
}
# sat 2
object Zone "sat2" {
    endpoints = ["icinga-satellite-dev88.my_domain", "icinga-satellite-dev89.my_domain"]
    parent = "master" 
}
object Endpoint  "icinga-satellite-dev88.my_domain" {
   host = "icinga-satellite-dev88.my_domain"
   port = "5665" 
}
object Endpoint  "icinga-satellite-dev89.my_domain" {
   host = "icinga-satellite-dev89.my_domain"
   port = "5665" 
}

icinga-masters are available via haproxy in round-robin mode.
haproxy is used just on other host (without icinga)
Part of haproxy.conf:

global
  chroot  /var/lib/haproxy
  daemon  
  group  haproxy
  log  /dev/log local0
  maxconn  8000
  pidfile  /var/run/haproxy.pid
  stats  socket /var/lib/haproxy/stats
  user  haproxy

defaults
  log  global
  maxconn  8000
  option  redispatch
  retries  3
  timeout  http-request 10s
  timeout  queue 1m
  timeout  connect 10s
  timeout  client 1m
  timeout  server 1m
  timeout  check 10s

#### other services...

listen monitoring
  bind 127.0.0.1:5665 
  balance  roundrobin
  log-format  [%t]\ %f\ SRC:%ci:%cp\ ->\ (FE:%fi:%fp)\ ->\ (BE:%bi:%bp)\ ->\ DST:%si:%sp
  option  tcp-check
  option  logasap
  timeout client  60
  timeout server  60
  server 149.202.176.75 149.202.176.75:5665
  server 149.202.176.93 149.202.176.93:5665

@dnsmichi dnsmichi removed this from the 2.9.0 milestone Apr 18, 2018
@dnsmichi dnsmichi removed their assignment Apr 18, 2018
@dnsmichi
Copy link
Contributor

Likely related to #6361.

@dnsmichi
Copy link
Contributor

A similar behaviour has been analysed and fixed with 2.9. Please test the new release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api REST API bug Something isn't working core/crash Shouldn't happen, requires attention
Projects
None yet
Development

No branches or pull requests

8 participants