Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev.icinga.com #10700] Crash in ExternalCommandListener #3697

Closed
icinga-migration opened this issue Nov 23, 2015 · 28 comments
Closed

[dev.icinga.com #10700] Crash in ExternalCommandListener #3697

icinga-migration opened this issue Nov 23, 2015 · 28 comments
Labels
area/compat Deprecated features from 1.x blocker Blocks a release or needs immediate attention bug Something isn't working
Milestone

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/10700

Created by kitharo on 2015-11-23 06:37:08 +00:00

Assignee: mfriedrich
Status: Resolved (closed on 2016-01-25 08:58:18 +00:00)
Target Version: 2.4.2
Last Update: 2016-02-23 09:58:19 +00:00 (in Redmine)

Icinga Version: 2.4.0-1
Backport?: Already backported
Include in Changelog: 1

Icinga2 always crashing after update to r2.4.0-1. The crashlog always say:

Failed to launch GDB: No such file or directory

So I installed GDB, restarted and it crashed again with the same message. I lunched GDB and it worked.
I'm using ubuntu x64 (15.04) with icingaweb2 (latest release).

Sometime it always crash every 15 minutes and sometimes it crashing after 2h.

Changesets

2016-01-20 15:38:31 +00:00 by mfriedrich 4ce43b8

ExternalCommandListener: Fix crash when reading from socket

refs #10700

2016-02-23 08:23:39 +00:00 by mfriedrich 0516cb5

ExternalCommandListener: Fix crash when reading from socket

refs #10700

Relations:

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-11-23 08:38:03 +00:00

  • Status changed from New to Feedback
  • Assigned to set to kitharo

Please install gdb and generate a backtrace.

@icinga-migration
Copy link
Author

Updated by kitharo on 2015-11-23 14:34:20 +00:00

GDB was already installed. It's starting normally.
If I run "gdb icinga2" it always says:

"usr/sbin/icinga2": not in executable format: File format not recognized

How can I do a GDB correctly?

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-11-23 14:37:35 +00:00

gdb --args /usr/lib64/icinga2/sbin/icinga2 ... fixed the doc bug in #10710

@icinga-migration
Copy link
Author

Updated by kitharo on 2015-11-25 06:33:29 +00:00

If I run:

gdb --args /usr/lib64/icinga2/sbin/icinga2

I get:

/usr/lib64/icinga2/sbin/icinga2: No such file or directory.

There is indeed not such directory.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-11-25 07:52:11 +00:00

Then please adopt the path where your icinga2 binary is, depending on your distribution.

icinga2 --version

should provide you with details. In 2.4 the binary was moved to the lib directory, /usr/sbin/icinga2 is just a shell wrapper which cannot be invoked with gdb.

@icinga-migration
Copy link
Author

Updated by kitharo on 2015-11-25 10:19:06 +00:00

Application information:
  Installation root: /usr
  Sysconf directory: /etc
  Run directory: /run
  Local state directory: /var
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid

I tried /usr/lib/icinga2/icinga2 but it said "not in executable format: File format not recognized". I'm using Ubuntu 15.04.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-11-25 14:12:57 +00:00

Then use your preferred method of finding a file on Ubuntu. I'd guess it is somewhere in the multi arch lib directory.

@icinga-migration
Copy link
Author

Updated by gbeutner on 2015-11-26 07:14:03 +00:00

FYI, it's /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2 on 64-bit systems.

@icinga-migration
Copy link
Author

Updated by kitharo on 2015-11-26 12:06:10 +00:00

Thanks for the hinds!

What I did on command line:

  1. gdb --args /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2
  2. (gdb) run
    .. waiting for crash..
  3. (gdb) bt
    Output: No stack.

I noticed after 2. that gdb said: [LWP 8188 exited]
[Inferior 1 (process 8188) exited normally]

What I'm doing wrong?

@icinga-migration
Copy link
Author

Updated by gbeutner on 2015-12-07 07:23:09 +00:00

You're missing the 'daemon' argument for icinga2.

For example:

gdb --args /usr/lib/.../icinga2 daemon

@icinga-migration
Copy link
Author

Updated by kitharo on 2015-12-07 13:40:33 +00:00

Thanks, it's running and I'm waiting for the crash. I will post the output.

@icinga-migration
Copy link
Author

Updated by kitharo on 2015-12-07 14:05:15 +00:00

First try (crashed after ~15min):

[2015-12-07 14:55:45 +0100] information/ExternalCommandListener: Executing external command: [1449496545] PROCESS_SERVICE_CHECK_RESULT;vs-SERVER2;mem;0;OK: physical: Total: 4GB - Used: 1.156GB (28%) - Free: 2.843GB (71%)|'physical'=1.15607GB;3.19965;3.5996;0;3.99956 'physical %'=28%;79;89;0;100
[2015-12-07 14:55:45 +0100] information/ExternalCommandListener: Executing external command: [1449496545] PROCESS_SERVICE_CHECK_RESULT;vs-SERVER3;mem;0;OK: physical: Total: 4GB - Used: 1.125GB (28%) - Free: 2.875GB (71%)|'physical'=1.12493GB;3.19965;3.5996;0;3.99956 'physical %'=28%;79;89;0;100
[2015-12-07 14:55:45 +0100] critical/Socket: recv() failed with error code 11, "Resource temporarily unavailable"
[New Thread 0x7fffdba48700 (LWP 12415)]
[New Thread 0x7fffdbb4c700 (LWP 12407)]

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffdbaca700 (LWP 12409)]
0x00007ffff629b267 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/unix/sysv/linux/raise.c:55
55      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.

(gdb) bt

#0  0x00007ffff629b267 in __GI_raise (sig=sig@entry=6)

at ../sysdeps/unix/sysv/linux/raise.c:55

#1  0x00007ffff629ceca in __GI_abort () at abort.c:89
#2  0x00007ffff7236310 in icinga::Application::ExceptionHandler ()

at /build/icinga2-F1OJeq/icinga2-2.4.1/lib/base/application.cpp:774

#3  0x00007ffff68a4ee6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff68a4f31 in std::terminate() ()

from /usr/lib/x86_64-linux-gnu/libstdc**.so.6

#5  0x00007ffff68a5149 in __cxa_throw ()

from /usr/lib/x86_64-linux-gnu/libstdc**.so.6

#6  0x00007ffff71fc49a in __cxa_throw (obj=obj@entry=0x7fffbc0dd920,

pvtinfo=pvtinfo@entry=0x7ffff7518520 <typeinfo for boost::exception_detail::clone_implicinga::socket_error>,
dest=dest@entry=0x7ffff72620e0 <boost::exception_detail::clone_implicinga::socket_error::~clone_impl()>)
at /build/icinga2-F1OJeq/icinga2-2.4.1/lib/base/exception.cpp:110

#7  0x00007ffff72739b7 in boost::throw_exception (e=...)

at /usr/include/boost/throw_exception.hpp:70

#8  0x00007ffff7273a4e in boost::exception_detail::throw_exception_ (x=...,

current_function=current_function@entry=0x7ffff72c2a20 <icinga::Socket::Read(void*, unsigned long)::PRETTY_FUNCTION> "size_t icinga::Socket::Read(void*, size_t)",

second one (crash after ~1min):

[2015-12-07 15:01:09 +0100] information/ExternalCommandListener: Executing external command: [1449496869] PROCESS_SERVICE_CHECK_RESULT;vs-SERVER1;cpu;0;OK: CPU load is ok.|'total 5m'=0%;80;90 'total 1m'=0%;80;90 'total 30s'=0%;80;90
[2015-12-07 15:01:09 +0100] critical/Socket: recv() failed with error code 11, "Resource temporarily unavailable"
[New Thread 0x7fffdbaca700 (LWP 8227)]
[2015-12-07 15:01:09 +0100] warning/MacroProcessor: Macro 'remote_nrpe_arguments' is not defined.
Context:
        (0) Resolving macros for string '$remote_nrpe_arguments$'
        (1) Executing check for object 'rs-cits04!cpu'

[New Thread 0x7fffdba07700 (LWP 8235)]
[New Thread 0x7ffff7e51700 (LWP 8233)]
[New Thread 0x7fffdbb8d700 (LWP 8232)]
[New Thread 0x7fffdba89700 (LWP 8231)]
[New Thread 0x7fffdbb0b700 (LWP 8226)]
[New Thread 0x7fffdbb4c700 (LWP 8225)]

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffdbaca700 (LWP 8227)]
0x00007ffff629b267 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/unix/sysv/linux/raise.c:55
55      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.

(gdb) bt

#0  0x00007ffff629b267 in __GI_raise (sig=sig@entry=6)

at ../sysdeps/unix/sysv/linux/raise.c:55

#1  0x00007ffff629ceca in __GI_abort () at abort.c:89
#2  0x00007ffff7236310 in icinga::Application::ExceptionHandler ()

at /build/icinga2-F1OJeq/icinga2-2.4.1/lib/base/application.cpp:774

#3  0x00007ffff68a4ee6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff68a4f31 in std::terminate() ()

from /usr/lib/x86_64-linux-gnu/libstdc**.so.6

#5  0x00007ffff68a5149 in __cxa_throw ()

from /usr/lib/x86_64-linux-gnu/libstdc**.so.6

#6  0x00007ffff71fc49a in __cxa_throw (obj=obj@entry=0x7fffc408cc50,

pvtinfo=pvtinfo@entry=0x7ffff7518520 <typeinfo for boost::exception_detail::clone_implicinga::socket_error>,
dest=dest@entry=0x7ffff72620e0 <boost::exception_detail::clone_implicinga::socket_error::~clone_impl()>)
at /build/icinga2-F1OJeq/icinga2-2.4.1/lib/base/exception.cpp:110

#7  0x00007ffff72739b7 in boost::throw_exception (e=...)

at /usr/include/boost/throw_exception.hpp:70

#8  0x00007ffff7273a4e in boost::exception_detail::throw_exception_ (x=...,

current_function=current_function@entry=0x7ffff72c2a20 <icinga::Socket::Read(void*, unsigned long)::PRETTY_FUNCTION> "size_t icinga::Socket::Read(void*, size_t)",
file=file@entry=0x7ffff72b0bd0 "/build/icinga2-F1OJeq/icinga2-2.4.1/lib/base/socket.cpp", line=line@entry=314) at /usr/include/boost/throw_exception.hpp:87

#9  0x00007ffff720a1ec in icinga::Socket::Read (this=this@entry=0x7fffc40bcdd0,

buffer=buffer@entry=0x7fffdbac7e80, count=count@entry=8192)
at /build/icinga2-F1OJeq/icinga2-2.4.1/lib/base/socket.cpp:314

#10 0x00007ffff0c3ff5e in icinga::ExternalCommandListener::CommandPipeThread (

this=, commandPath=...)
at /build/icinga2-F1OJeq/icinga2-2.4.1/lib/compat/externalcommandlistener.cpp:113

#11 0x00007ffff7bcc09a in ?? ()

from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.55.0

#12 0x00007ffff752e6aa in start_thread (arg=0x7fffdbaca700)

at pthread_create.c:333

#13 0x00007ffff636ceed in clone ()

at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-12-07 14:18:53 +00:00

  • Subject changed from Icinga2 always crashing after update to r2.4.0-1 to Crash in ExternalCommandListener
  • Category set to Compat
  • Status changed from Feedback to New
  • Assigned to deleted kitharo
  • Target Version set to 2.4.2

@icinga-migration
Copy link
Author

Updated by chrisportman on 2015-12-08 22:17:45 +00:00

Hi all, I'm also having what looks like the same issue. I use the external commands file for passive results...

Here are some details:

## Version

$ icinga2 --version
icinga2 - The Icinga 2 network monitoring daemon (version: v2.4.0)

Copyright (c) 2012-2015 Icinga Development Team (https://www.icinga.org)
License GPLv2+: GNU GPL version 2 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Application information:
  Installation root: /usr
  Sysconf directory: /etc
  Run directory: /var/run
  Local state directory: /var
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /var/run/icinga2/icinga2.pid

System information:
  Operating system: Linux
  Operating system version: 2.6.32-573.8.1.el6.x86_64
  Architecture: x86_64
  Distribution: "CentOS release 6.7 (Final)"

# GBB Crash error:

[2015-12-09 09:01:54 +1100] critical/Socket: recv() failed with error code 11, "Resource temporarily unavailable"
Detaching after fork from child process 49843.
[2015-12-09 09:01:54 +1100] debug/IdoPgsqlConnection: Query: UPDATE icinga_servicestatus SET next_check = TO_TIMESTAMP(1449612172) WHERE instance_id = 1 AND service_object_id = 196
[2015-12-09 09:01:54 +1100] debug/IdoPgsqlConnection: Query: INSERT INTO icinga_externalcommands (command_args, command_name, command_type, endpoint_object_id, entry_time, instance_id) VALUES (E'hostname;Invalid Users;0;All users are OK', E'PROCESS_SERVICE_CHECK_RESULT', E'30', 65, TO_TIMESTAMP(1449596761), 1)

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fff73fbe700 (LWP 44549)]
0x0000003e2ac32625 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install cyrus-sasl-lib-2.1.23-15.el6_6.2.x86_64 glibc-2.12-1.166.el6_7.3.x86_64 keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-42.el6.x86_64 libboost_program_options1_53_0-1.53.0-0.x86_64 libboost_regex1_53_0-1.53.0-0.x86_64 libboost_system1_53_0-1.53.0-0.x86_64 libboost_thread1_53_0-1.53.0-0.x86_64 libcom_err-1.41.12-22.el6.x86_64 libedit-2.11-4.20080712cvs.1.el6.x86_64 libgcc-4.4.7-16.el6.x86_64 libicu-4.2.1-12.el6.x86_64 libselinux-2.0.94-5.8.el6.x86_64 libstdc++-4.4.7-16.el6.x86_64 ncurses-libs-5.7-4.20090207.el6.x86_64 nspr-4.10.8-2.el6_7.x86_64 nss-3.19.1-5.el6_7.x86_64 nss-softokn-freebl-3.14.3-23.el6_7.x86_64 nss-util-3.19.1-2.el6_7.x86_64 openldap-2.4.40-7.el6_7.x86_64 openssl-1.0.1e-42.el6.x86_64 postgresql94-libs-9.4.5-1PGDG.rhel6.x86_64 zlib-1.2.3-29.el6.x86_64

# GDB bt

#0  0x0000003e2ac32625 in raise () from /lib64/libc.so.6
#1  0x0000003e2ac33e05 in abort () from /lib64/libc.so.6
#2  0x0000003c7370fa1a in icinga::Application::ExceptionHandler () at ../base/application.cpp:889
#3  0x0000003c754bcbd6 in ?? () from /usr/lib64/libstdc++.so.6
#4  0x0000003c754bcc03 in std::terminate() () from /usr/lib64/libstdc++.so.6
#5  0x0000003c754bcd22 in __cxa_throw () from /usr/lib64/libstdc++.so.6
#6  0x0000003c736c9cfe in __cxa_throw (obj=0x7fff64b99870, pvtinfo=0x3c739ace80, dest=0x3c7371bdb0 ::~clone_impl()>) at ../base/exception.cpp:110
#7  0x0000003c7372e977 in boost::throw_exception (e=...) at /usr/include/boost153/boost/throw_exception.hpp:67
#8  0x0000003c7372ea20 in boost::exception_detail::throw_exception_ (x=Unhandled dwarf expression opcode 0xf3

) at /usr/include/boost153/boost/throw_exception.hpp:84

#9  0x0000003c736d5562 in icinga::Socket::Read (this=Unhandled dwarf expression opcode 0xf3

) at ../base/socket.cpp:314

#10 0x00007ffff6ea258b in icinga::ExternalCommandListener::CommandPipeThread (this=Unhandled dwarf expression opcode 0xf3

) at ../compat/externalcommandlistener.cpp:113

#11 0x0000003c7420c5c3 in ?? () from /usr/lib64/libboost_thread.so.1.53.0
#12 0x0000003e2b007a51 in start_thread () from /lib64/libpthread.so.0
#13 0x0000003e2ace893d in clone () from /lib64/libc.so.6

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-12-15 09:41:48 +00:00

  • Relates set to 10757

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-12-15 09:42:12 +00:00

  • Relates set to 10841

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-12-15 09:42:30 +00:00

  • Relates set to 10410

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-12-15 09:43:24 +00:00

  • Priority changed from Normal to High

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-12-16 18:07:28 +00:00

  • Status changed from New to Assigned
  • Assigned to set to mfriedrich

Notes:

  • sock->Read() try/catch
  • more tests

@icinga-migration
Copy link
Author

Updated by Tux12Fun on 2016-01-14 13:54:16 +00:00

Hi,

I've also seen something similar today (Core Version: r2.4.1-1 from icinga repo for ubuntu).

kernel: [1209328.726959] icinga2[22111]: segfault at 2b690000ea8a ip 00002b6998f39964 sp 00002b69aa343bd0 error 4 in libc-2.19.so[2b6998e78000+1bb000]

[2016-01-14 13:48:52 +0100] information/ExternalCommandListener: Executing external command: [1452775732] PROCESS_SERVICE_CHECK_RESULT;xxxx.xxxxxx.local;ORA_TS__XXXDWH_IDX;0;TS_XXXDWH_IDX 39.0% free|Size_B=131072B Free_B=52340B Free_TS_PC=39.0%;0;0
[2016-01-14 13:48:52 +0100] critical/Socket: recv() failed with error code 11, "Resource temporarily unavailable"

I've collected the following files:

drwxr-s--x nagios/adm        0 2016-01-14 14:28 var/log/icinga2/
-rw-r--r-- nagios/nagios 2187820 2016-01-14 13:49 var/log/icinga2/icinga2.log
-rw-r--r-- root/adm        89527 2016-01-14 13:33 var/log/icinga2/startup.log
-rw------- nagios/adm          0 2015-10-01 11:25 var/log/icinga2/icinga2.err
drwxr-s--x nagios/adm          0 2016-01-14 14:24 var/log/icinga2/crash/
-rw------- nagios/adm          0 2016-01-14 13:48 var/log/icinga2/crash/report.1452775732.741087
drwxr-s--x nagios/adm          0 2015-10-08 10:15 var/log/icinga2/compat/
drwxr-sr-x nagios/adm          0 2015-09-06 22:14 var/log/icinga2/compat/archives/
-rw-r----- root/root        7455 2016-01-14 14:28 var/log/messages
-rw-r----- nagios/nagios 30363268 2016-01-14 13:49 var/crash/_usr_lib_x86_64-linux-gnu_icinga2_sbin_icinga2.108.crash

If you like to get the files to analyze them, send me a message.

I'can upload the files to my public SFTP Server (Size 22MB). I don't like to post the crashdump here because of sensitive
information, that could be in the logs or crash data.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-01-20 15:40:43 +00:00

  • Status changed from Assigned to Feedback
  • Assigned to changed from mfriedrich to kitharo

Can you please re-test this with the current snapshot packages? (commit 4ce43b8)

@icinga-migration
Copy link
Author

Updated by kitharo on 2016-01-22 07:53:22 +00:00

Could you please write a short instruction to install a snapshot?
I'm not very familiar with installing snapshots.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-01-22 14:49:41 +00:00

Change the repository from release to snapshot and update/reinstall the packages. Though I'd highly recommend you to use a test vm which does not harm your production environment.

@icinga-migration
Copy link
Author

Updated by kitharo on 2016-01-25 06:51:57 +00:00

Seems to be running. I got no crash over the weekend.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-01-25 08:58:18 +00:00

  • Status changed from Feedback to Resolved
  • Assigned to changed from kitharo to mfriedrich
  • Done % changed from 0 to 100

Ok, thanks for the fast feedback :)

Kind regards,
Michael

@icinga-migration
Copy link
Author

Updated by kitharo on 2016-01-25 09:59:28 +00:00

Is there a schedule for the public release?

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-01-25 10:25:37 +00:00

Hi there,

we are currently investigating on important fixes (check the roadmap for 2.4.2 and 2.5.0) in the next 2 weeks. Once these are completed we are planning to have a new release in February.

Kind regards,
Michael

@icinga-migration
Copy link
Author

Updated by gbeutner on 2016-02-23 09:58:19 +00:00

  • Backport? changed from Not yet backported to Already backported

@icinga-migration icinga-migration added the blocker Blocks a release or needs immediate attention label Jan 17, 2017
@icinga-migration icinga-migration added bug Something isn't working area/compat Deprecated features from 1.x labels Jan 17, 2017
@icinga-migration icinga-migration added this to the 2.4.2 milestone Jan 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/compat Deprecated features from 1.x blocker Blocks a release or needs immediate attention bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant