Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev.icinga.com #13655] Crash - Error: parse error: premature EOF #4882

Closed
icinga-migration opened this issue Dec 16, 2016 · 49 comments
Closed
Labels
blocker Blocks a release or needs immediate attention bug Something isn't working
Milestone

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/13655

Created by erikandersonmx on 2016-12-16 20:40:28 +00:00

Assignee: gbeutner
Status: Resolved (closed on 2017-01-16 13:10:03 +00:00)
Target Version: 2.6.1
Last Update: 2017-01-16 13:10:03 +00:00 (in Redmine)

Icinga Version: r2.6.0-1
Backport?: Not yet backported
Include in Changelog: 1

Not sure where to look from here:

@error: parse error: premature EOF
{arguments
(right here) ------^

(0) libbase.so.2.6.0: (+0xc9148) [0x7f60bab36148]
(1) libbase.so.2.6.0: (+0xc91f9) [0x7f60bab361f9]
(2) libbase.so.2.6.0: icinga::JsonDecode(icinga::String const&) (+0x3ce) [0x7f60baad26ae]
(3) libbase.so.2.6.0: (+0x85f82) [0x7f60baaf2f82]
(4) libbase.so.2.6.0: (+0x87640) [0x7f60baaf4640]
(5) /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2() [0x409088]
(6) /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2() [0x406fa9]
(7) libc.so.6: __libc_start_main (+0xf5) [0x7f60b9c43f45]
(8) /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2() [0x4070e0]@

Changesets

2017-01-16 13:04:45 +00:00 by (unknown) 06064e7

Fix crash when sending/receiving messages longer than 4 kB to/from the child process for the Process class

fixes #13655

Signed-off-by: Michael Friedrich <michael.friedrich@icinga.com>

2017-01-16 13:05:18 +00:00 by gbeutner 82bab6b

Clean up the patch for the crash issue in the Process class

fixes #13655

2017-01-16 13:05:36 +00:00 by gbeutner c31d024

Update AUTHORS

refs #13655

2017-01-16 13:15:39 +00:00 by (unknown) 9fa3f3b

Fix crash when sending/receiving messages longer than 4 kB to/from the child process for the Process class

refs #13655

Signed-off-by: Michael Friedrich <michael.friedrich@icinga.com>

2017-01-16 13:15:42 +00:00 by gbeutner 060e20f

Clean up the patch for the crash issue in the Process class

fixes #13655

2017-01-16 13:15:42 +00:00 by gbeutner ff07cee

Update AUTHORS

refs #13655

Relations:

@icinga-migration
Copy link
Author

Updated by erikandersonmx on 2016-12-16 21:02:14 +00:00

Full debug log https://gist.github.com/erikanderson/4cb950e2ee6888ea98a49e47ba02ca6a

@icinga-migration
Copy link
Author

Updated by erikandersonmx on 2016-12-16 21:59:38 +00:00

from kern.log

Dec 16 14:55:36 sd-prod-icinga1 kernel: [16255.139654] icinga2[31719]: segfault at 7f3800005587 ip 00007f38e50a8814 sp 00007f38c9ea8340 error 4 in libc-2.19.so[7f38e4fe7000+1ba000]

@icinga-migration
Copy link
Author

Updated by erikandersonmx on 2016-12-16 23:01:05 +00:00

This issue seems to be the same thing https://dev.icinga.com/issues/13173

@icinga-migration
Copy link
Author

Updated by erikandersonmx on 2016-12-19 23:35:32 +00:00

I think the root cause of this was the check_rabbitmq plugin from here https://github.com/CaptPhunkosis/check\_rabbitmq

We have a large number of queues and so my theory is that when the plugin reported a recovery it sent a large list of queues that were ok and it overloaded something within icinga. We haven't had this issue in the past, before 2.6.0 but I don't have any hard data on that

I think this can be closed

@icinga-migration
Copy link
Author

Updated by gvde on 2016-12-21 08:01:59 +00:00

We see the exact same problem on our server since the update last week. 5 crashes within 7 days. Two this morning within two hours.

erikandersonmx wrote:

I think the root cause of this was the check_rabbitmq plugin from here https://github.com/CaptPhunkosis/check\_rabbitmq

We don't use this plugin. I doubt it's related with that.

We have a large number of queues and so my theory is that when the plugin reported a recovery it sent a large list of queues that were ok and it overloaded something within icinga. We haven't had this issue in the past, before 2.6.0 but I don't have any hard data on that

I think this can be closed

I don't think so. IMHO it has nothing to do with that plugin. And even if it was related to the plugin: no plugin should ever be able to break the icinga2 server, regardless of what it returns.

This needs to be fixed.

@icinga-migration
Copy link
Author

Updated by gvde on 2016-12-21 08:14:44 +00:00

Here is my full crash report:

Caught unhandled exception.
Current time: 2016-12-21 08:16:21 +0100

Application information:
Application version: v2.6.0
Installation root: /usr
Sysconf directory: /etc
Run directory: /run
Local state directory: /var
Package data directory: /usr/share/icinga2
State path: /var/lib/icinga2/icinga2.state
Modified attributes path: /var/lib/icinga2/modified-attributes.conf
Objects path: /var/cache/icinga2/icinga2.debug
Vars path: /var/cache/icinga2/icinga2.vars
PID path: /run/icinga2/icinga2.pid

System information:
Platform: CentOS Linux
Platform version: 7 (Core)
Kernel: Linux
Kernel version: 3.10.0-514.2.2.el7.x86_64
Architecture: x86_64

Build information:
Compiler: GNU 4.8.5
Build host: unknown

Error: parse error: premature EOF
{arguments
(right here) ------^

(0) libbase.so.2.6.0: (+0xa4a25) [0x7f22a980ca25]
(1) libbase.so.2.6.0: icinga::JsonDecode(icinga::String const&) (+0x3a6) [0x7f22a980d046]
(2) libbase.so.2.6.0: (+0xd06a3) [0x7f22a98386a3]
(3) libbase.so.2.6.0: (+0xe59d9) [0x7f22a984d9d9]
(4) /usr/lib64/icinga2/sbin/icinga2() [0x40b0ba]
(5) /usr/lib64/icinga2/sbin/icinga2() [0x4092c9]
(6) libc.so.6: __libc_start_main (+0xf5) [0x7f22a6c13b35]
(7) /usr/lib64/icinga2/sbin/icinga2() [0x409361]

@icinga-migration
Copy link
Author

Updated by gvde on 2016-12-21 08:27:28 +00:00

Maybe it has something to do with the call to the notification script. I found these log messages at the time from the crash before:

[2016-12-21 06:27:18 +0100] warning/Process: PID -1 was terminated by signal 58 (Real-time signal 24)
[2016-12-21 06:27:18 +0100] information/Notification: Completed sending 'Problem' notification 'bovecmail-service' for checkable 'bovec!yum-updates' and user 'vogt'.
[2016-12-21 06:27:26 +0100] warning/PluginNotificationTask: Notification command for object 'bovec!yum-updates' (PID: -1, arguments: '/etc/icinga2/scripts/mail-service-notification.sh') terminated with exit code 128, output: <Terminated by signal 58 (Real-time signal 24).>

Oddly, for the last crash there is not the same. However, at the time of the crash it was sending notifications. Same for the crashes before that one...

@icinga-migration
Copy link
Author

Updated by fallback on 2016-12-22 09:56:41 +00:00

I got a similar problem. Unfortunately there is no other error/warning/suspicious message before the crash. Everything went smooth till then.

Caught unhandled exception.
Current time: 2016-12-21 17:29:02 +0100
Application information:
  Application version: r2.6.0-1
  Installation root: /usr
  Sysconf directory: /etc
  Run directory: /run
  Local state directory: /var
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid
System information:
  Platform: Ubuntu
  Platform version: 16.04.1 LTS (Xenial Xerus)
  Kernel: Linux
  Kernel version: 4.4.0-53-generic
  Architecture: x86_64
Build information:
  Compiler: GNU 5.4.0
  Build host: lcy01-04
Error: parse error: premature EOF
                                       {"arguments":["/etc/icinga2/scr
                     (right here) ------^
        (0) libbase.so.2.6.0:  (+0xd68d3) [0x7fc3a0b1b8d3]
        (1) libbase.so.2.6.0:  (+0xd69b7) [0x7fc3a0b1b9b7]
        (2) libbase.so.2.6.0: icinga::JsonDecode(icinga::String const&) (+0x3cb) [0x7fc3a0aa739b]
        (3) libbase.so.2.6.0:  (+0xaa356) [0x7fc3a0aef356]
        (4) libbase.so.2.6.0:  (+0xaba39) [0x7fc3a0af0a39]
        (5) /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2() [0x409aa5]
        (6) /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2() [0x407c79]
        (7) libc.so.6: __libc_start_main (+0xf0) [0x7fc3a0104830]
        (8) /usr/lib/x86_64-linux-gnu/icinga2/sbin/icinga2() [0x407de9]
***
* This would indicate a runtime problem or configuration error. If you believe this is a bug in Icinga 2
* please submit a bug report at https://dev.icinga.org/ and include this stack trace as well as any other
* information that might be useful in order to reproduce this problem.
***
Failed to launch GDB: No such file or directory

After this, the Host (satellite) loses the conncetion the its master.

2016-12-21 17:33:43 +0100] notice/ThreadPool: Pool #1: Pending tasks: 846; Average latency: 0ms; Threads: 64; Pool utilization: 100%
[2016-12-21 17:33:44 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[...] # line repeats many times - shortened for better reading
[2016-12-21 17:33:50 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2016-12-21 17:33:50 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2016-12-21 17:33:50 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2016-12-21 17:33:50 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2016-12-21 17:33:50 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2016-12-21 17:33:50 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2016-12-21 17:33:50 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2016-12-21 17:33:50 +0100] warning/JsonRpcConnection: API client disconnected for identity 'master.domain.tld'
[2016-12-21 17:33:50 +0100] warning/ApiListener: Removing API client for endpoint 'master.domain.tld'. 0 API clients left.
[2016-12-21 17:33:51 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2016-12-21 17:33:51 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2016-12-21 17:33:51 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2016-12-21 17:33:51 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2016-12-21 17:33:51 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2016-12-21 17:33:51 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2016-12-21 17:33:51 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2016-12-21 17:33:51 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2016-12-21 17:33:51 +0100] information/ApiListener: New client connection for identity 'master.domain.tld' from [192.168.62.96]:56218

@icinga-migration
Copy link
Author

Updated by n0braist on 2016-12-22 11:05:58 +00:00

Hi, same for me but in the moment just on sles with HA cluster environment

permanent crashes after some minutes:

example1:

Caught unhandled exception.
Current time: 2016-12-22 09:23:40 +0100

Application information:
  Application version: v2.6.0
  Installation root: /usr
  Sysconf directory: /etc
  Run directory: /var/run
  Local state directory: /var
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /var/run/icinga2/icinga2.pid

System information:
  Platform: SLES
  Platform version: 11.4
  Kernel: Linux
  Kernel version: 3.0.101-63-default
  Architecture: x86_64

Build information:
  Compiler: GNU 4.8.3
  Build host: 92ab943600f4

Error: parse error: premature EOF
                                       {"arguments":["/usr/lib/nagios/
                     (right here) ------^


        (0) libbase.so.2.6.0:  (+0x9ba15) [0x7f8561ba7a15]
        (1) libbase.so.2.6.0: icinga::JsonDecode(icinga::String const&) (+0x37e) [0x7f8561ba7fee]
        (2) libbase.so.2.6.0:  (+0xc4b43) [0x7f8561bd0b43]
        (3) libbase.so.2.6.0:  (+0xc61d0) [0x7f8561bd21d0]
        (4) /usr/lib64/icinga2/sbin/icinga2() [0x40b6bb]
        (5) /usr/lib64/icinga2/sbin/icinga2() [0x409819]
        (6) libc.so.6: __libc_start_main (+0xe6) [0x7f855f0dcc36]
        (7) /usr/lib64/icinga2/sbin/icinga2() [0x40989d]

example2:

Caught unhandled exception.
Current time: 2016-12-21 11:49:02 +0100

Application information:
  Application version: v2.6.0
  Installation root: /usr
  Sysconf directory: /etc
  Run directory: /var/run
  Local state directory: /var
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /var/run/icinga2/icinga2.pid

System information:
  Platform: SLES
  Platform version: 11.4
  Kernel: Linux
  Kernel version: 3.0.101-63-default
  Architecture: x86_64

Build information:
  Compiler: GNU 4.8.3
  Build host: 92ab943600f4

Error: lexical error: invalid char in json text.
                                         shows the event of the inform
                     (right here) ------^


        (0) libbase.so.2.6.0:  (+0x9ba15) [0x7f8bc92d5a15]
        (1) libbase.so.2.6.0: icinga::JsonDecode(icinga::String const&) (+0x37e) [0x7f8bc92d5fee]
        (2) libbase.so.2.6.0: icinga::Process::DoEvents() (+0x2e6) [0x7f8bc931e366]
        (3) libbase.so.2.6.0: icinga::Process::IOThreadProc(int) (+0x3a9) [0x7f8bc93446c9]
        (4) libboost_thread.so.1.53.0:  (+0xe524) [0x7f8bc9d3c524]
        (5) libpthread.so.0:  (+0x7806) [0x7f8bc6b6f806]
        (6) libc.so.6: clone (+0x6d) [0x7f8bc68ca9bd]

@icinga-migration
Copy link
Author

Updated by n0braist on 2016-12-22 11:22:56 +00:00

little correction. also on non clustered single node machines

@icinga-migration
Copy link
Author

Updated by erikandersonmx on 2016-12-22 18:50:25 +00:00

Thanks for the confirmation that it is most likely not that one specific plugin. I just had a similar crash on a node that had an updated version of the check so there is something else going on here. I'm also going to try to see if I can get the previous version of icinga2 somewhere (its been nuked from the ppa of course) so I can revert for the holidays

@icinga-migration
Copy link
Author

Updated by erikandersonmx on 2016-12-22 20:14:15 +00:00

Based on the error I am looking at this change to see how it might impact this: https://dev.icinga.com/projects/i2/repository/revisions/14ea2596c5cc9a2622f3ade2819cff86cc2eec71/diff/lib/base/json.cpp

@icinga-migration
Copy link
Author

Updated by erikandersonmx on 2016-12-22 20:18:42 +00:00

This is the issue for that change that is refd in the changelog https://dev.icinga.com/issues/12538

@icinga-migration
Copy link
Author

Updated by erikandersonmx on 2016-12-22 20:31:12 +00:00

Nm, that is most likely not the issue as I dug into the for loop change. From what I have seen so far it seems to be happening right after a notification is sent so I am going to zero in on changes made to notifications

@icinga-migration
Copy link
Author

Updated by erikandersonmx on 2016-12-22 20:35:52 +00:00

These are the changes made for notifications and seem like prime candidates for the source of this

https://dev.icinga.com/issues/12718
https://dev.icinga.com/issues/13151

@icinga-migration
Copy link
Author

Updated by n0braist on 2016-12-22 21:06:47 +00:00

In my case it seems to be a problem during notification or better if different notifications for one problem are defined at the same time (multiple notification definitions).

On the other hand hot candidate is hipsaint (hipchat). calling in notification but I cannot confirm. Still investigating, but later if anything is running again....

@icinga-migration
Copy link
Author

Updated by fallback on 2016-12-23 06:29:36 +00:00

I'm currently running with the default mail-notification-scripts and one custom sms-notification script written in bash. I think it has nothing to do with hipsaint/hipchat. Morelikely like @erikandersonmx said with the fact that multiple different notifications are defined.

@icinga-migration
Copy link
Author

Updated by gvde on 2016-12-23 07:35:32 +00:00

n0braist wrote:

In my case it seems to be a problem during notification or better if different notifications for one problem are defined at the same time (multiple notification definitions).

On the other hand hot candidate is hipsaint (hipchat). calling in notification but I cannot confirm. Still investigating, but later if anything is running again....

I don't know how it's organized in SLES but are your notification scripts (i.e. the script mail-host-notification.sh, etc.) in /usr/lib/nagios/ or are they in /etc/icinga2/scripts like on CentOS?

@icinga-migration
Copy link
Author

Updated by dominicpratt on 2016-12-23 07:48:57 +00:00

We're experiencing the same error on a Debian Jessie system without hipchat or something, we're only using the standard notification scripts in /etc/icinga2/scripts.

@icinga-migration
Copy link
Author

Updated by n0braist on 2016-12-23 08:08:44 +00:00

Hi, my scripts are in /usr/lib/nagios/
In the moment the problem happens for a special service, but hadn't time what information is inside the check result.
It is a try and error game in the moment :-(

@icinga-migration
Copy link
Author

Updated by greatexpectations on 2016-12-24 09:47:56 +00:00

Hi, as a temporary workaround on RHEL/CentOS 7 (or any other distribution that uses systemd), you may want to modify the systemd unit file for icinga2 so that the service is automatically restarted in case of a crash.

  1. systemctl edit icinga2.service

then add the Restart=always line in the Service section, e.g.

[Service]
Restart=always

Kind regards

@icinga-migration
Copy link
Author

Updated by gvde on 2016-12-24 10:04:00 +00:00

greatexpectations wrote:

Hi, as a temporary workaround on RHEL/CentOS 7 (or any other distribution that uses systemd), you may want to modify the systemd unit file for icinga2 so that the service is automatically restarted in case of a crash.

For all the last crashes that won't work on our CentOS 7: even through there is a crash report for icinga2, the icinga2 process itself is still running and the status of the icinga2.service is still active. icinga2 doesn't check anything anymore but still the process is running. Thus, systemd won't notice that icinga2 isn't operating correctly anymore and thus there won't be any restart...

@icinga-migration
Copy link
Author

Updated by gvde on 2016-12-24 10:12:12 +00:00

greatexpectations wrote:

Hi, as a temporary workaround on RHEL/CentOS 7 (or any other distribution that uses systemd), you may want to modify the systemd unit file for icinga2 so that the service is automatically restarted in case of a crash.

And on a sidenote: you don't want to modify the original icinga2.service file on CentOS 7, i.e. that in /usr/lib/systemd/system/, on ever. All files in that directory come from the rpm packages and are not to be modified. Any change will be overwritten during next update.

Modifications of systemd services go into /etc/systemd/system, e.g. you create a directory icinga2.service.d and put the configuration into a conf file inside that directory...

@icinga-migration
Copy link
Author

Updated by fallback on 2016-12-29 09:04:20 +00:00

gvde wrote:

greatexpectations wrote:
> Hi, as a temporary workaround on RHEL/CentOS 7 (or any other distribution that uses systemd), you may want to modify the systemd unit file for icinga2 so that the service is automatically restarted in case of a crash.

For all the last crashes that won't work on our CentOS 7: even through there is a crash report for icinga2, the icinga2 process itself is still running and the status of the icinga2.service is still active. icinga2 doesn't check anything anymore but still the process is running. Thus, systemd won't notice that icinga2 isn't operating correctly anymore and thus there won't be any restart...

This is exactly the same behaviour on my "Ubuntu 16.04.1 LTS".

@icinga-migration
Copy link
Author

Updated by Marax on 2016-12-30 08:12:31 +00:00

I noticed here that the crashes occur when a plugin with extensive output changes its state. Icinga2 tries to send a notification and crash. The mail contains the extended output of the plugin under additional info.

for example...

***** Icinga *****

Notification Type: PROBLEM

Service: Interface Usage
Host: xxxx
Address: xxx
State: CRITICAL

Date/Time: 2016-12-28 09:37:33 +0100

Additional Info: CRITICAL - interface GigabitEthernet0/24 (alias Gi0/24) usage is in:95.51% (955074379.75bit/s) out:0.61% (6077712.50bit/s), interface Vlan1 (alias Vl1) usage is in:0.00% (5821.50bit/s) out:0.00% (14859.00bit/s), interface GigabitEthernet0/1 (alias Gi0/1) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface GigabitEthernet0/2 (alias Gi0/2) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface GigabitEthernet0/3 (alias Gi0/3) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface GigabitEthernet0/4 (alias Gi0/4) usage is in:0.00% (374.25bit/s) out:0.00% (2084.00bit/s), interface GigabitEthernet0/5 (alias Gi0/5) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface GigabitEthernet0/6 (alias Gi0/6) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface GigabitEthernet0/7 (alias Gi0/7) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface GigabitEthernet0/8 (alias Gi0/8) usage is in:0.00% (0.00bit/s) out:0.00% (1517.50bit/s), interface GigabitEthernet0/9 (alias Gi0/9) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface GigabitEthernet0/10 (alias Gi0/10) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface GigabitEthernet0/11 (alias Gi0/11) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface GigabitEthernet0/12 (alias Gi0/12) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface GigabitEthernet0/13 (alias Gi0/13) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface GigabitEthernet0/14 (alias Gi0/14) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface GigabitEthernet0/15 (alias Gi0/15) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface GigabitEthernet0/16 (alias Gi0/16) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface GigabitEthernet0/17 (alias Gi0/17) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface GigabitEthernet0/18 (alias Gi0/18) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface GigabitEthernet0/19 (alias Gi0/19) usage is in:0.39% (3866826.75bit/s) out:62.95% (629520783.75bit/s), interface GigabitEthernet0/20 (alias Gi0/20) usage is in:0.22% (2195255.50bit/s) out:32.56% (325550413.50bit/s), interface GigabitEthernet0/21 (alias Gi0/21) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface GigabitEthernet0/22 (alias Gi0/22) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface GigabitEthernet0/23 (alias Gi0/23) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface Null0 usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s)

Comment: []

Can someone confirm this from his experience?

@icinga-migration
Copy link
Author

Updated by fallback on 2016-12-30 08:55:43 +00:00

In my experience it's not related to the plugin's output.

It even keeps crashing although no service or host notification was sent by the time.

@icinga-migration
Copy link
Author

Updated by Marax on 2016-12-30 10:15:46 +00:00

Maybe it's not necessarily a plugin output. We don't run Icinga2 in cluster mode here, but perhaps the bug also affects the communication between cluster nodes. However, for testing purposes i have switched off the notification for the affected services. Let's see if it runs until next year ;-)

@icinga-migration
Copy link
Author

Updated by flex on 2017-01-04 13:06:07 +00:00

Marax wrote:

I noticed here that the crashes occur when a plugin with extensive output changes its state. Icinga2 tries to send a notification and crash. The mail contains the extended output of the plugin under additional info.

for example...

***** Icinga *****

Notification Type: PROBLEM

Service: Interface Usage
Host: xxxx
Address: xxx
State: CRITICAL

Date/Time: 2016-12-28 09:37:33 +0100

Additional Info: CRITICAL - interface GigabitEthernet0/24 (alias Gi0/24) usage is in:95.51% (955074379.75bit/s) out:0.61% (6077712.50bit/s), interface Vlan1 (alias Vl1) usage is in:0.00% (5821.50bit/s) out:0.00% (14859.00bit/s), interface GigabitEthernet0/1 (alias Gi0/1) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface GigabitEthernet0/2 (alias Gi0/2) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface GigabitEthernet0/3 (alias Gi0/3) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface GigabitEthernet0/4 (alias Gi0/4) usage is in:0.00% (374.25bit/s) out:0.00% (2084.00bit/s), interface GigabitEthernet0/5 (alias Gi0/5) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface GigabitEthernet0/6 (alias Gi0/6) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface GigabitEthernet0/7 (alias Gi0/7) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface GigabitEthernet0/8 (alias Gi0/8) usage is in:0.00% (0.00bit/s) out:0.00% (1517.50bit/s), interface GigabitEthernet0/9 (alias Gi0/9) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface GigabitEthernet0/10 (alias Gi0/10) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface GigabitEthernet0/11 (alias Gi0/11) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface GigabitEthernet0/12 (alias Gi0/12) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface GigabitEthernet0/13 (alias Gi0/13) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface GigabitEthernet0/14 (alias Gi0/14) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface GigabitEthernet0/15 (alias Gi0/15) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface GigabitEthernet0/16 (alias Gi0/16) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface GigabitEthernet0/17 (alias Gi0/17) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface GigabitEthernet0/18 (alias Gi0/18) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface GigabitEthernet0/19 (alias Gi0/19) usage is in:0.39% (3866826.75bit/s) out:62.95% (629520783.75bit/s), interface GigabitEthernet0/20 (alias Gi0/20) usage is in:0.22% (2195255.50bit/s) out:32.56% (325550413.50bit/s), interface GigabitEthernet0/21 (alias Gi0/21) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface GigabitEthernet0/22 (alias Gi0/22) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface GigabitEthernet0/23 (alias Gi0/23) usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s) (down), interface Null0 usage is in:0.00% (0.00bit/s) out:0.00% (0.00bit/s)

Comment: []

Can someone confirm this from his experience?

I can confirm this, if we remove a check which output too long, everything is OK

@icinga-migration
Copy link
Author

Updated by fugstrolch on 2017-01-05 13:04:05 +00:00

(sorry for my english :) )

We have a HA-cluster with two Icinga2-Masters. Since the Installation of V2.6.0 they crashed every 2-3 days.
We have no more problems since we changed our "NotificationCommand" - definitions by deleting all lines with "import plugin-notification-command".

MAYBE: The "plugin-notification-command" (which isn't necessary since V2.6.0 ?) has problems with extended plugin-outputs?

Can someone confirm this?

Solution would be: Don't use "import plugin-notification-command" any more (?)

@icinga-migration
Copy link
Author

Updated by Marax on 2017-01-05 18:36:16 +00:00

flex wrote:

> Can someone confirm this from his experience?

I can confirm this, if we remove a check which output too long, everything is OK

Since switching off the notification of services with extensive output there are no crashs anymore since last year ;-)

It has something in common with passing of a lot of data to external scripts. Perhaps the resulting runtime. It does not have to be a notification. I think that all external scripts will be affected.

@icinga-migration
Copy link
Author

Updated by lucasmopdx on 2017-01-05 22:02:34 +00:00

Is there any resolution available here?

I've tried removing the import plugin-notification-command. I can't make my notification messages any shorter.

This is a pretty serious bug and the packages for 2.5.4 appear to have been removed from the repository (at least for Ubuntu Trusty), so downgrading is also a difficult option.

@icinga-migration
Copy link
Author

Updated by lucasmopdx on 2017-01-05 22:30:33 +00:00

Here's a backtrace, if useful:

(gdb) bt

#0  __execvpe (file=0x7ffc00000ae0 , argv=0x7ffcc74f9e30, envp=0x7ffcc74ff780) at execvpe.c:53
#1  0x00007f1bee421e74 in icinga::Application::AttachDebugger (filename=..., interactive=interactive@entry=false) at /build/icinga2-xYTSb8/icinga2-2.6.0/lib/base/application.cpp:613
#2  0x00007f1bee4523ad in icinga::Application::ExceptionHandler () at /build/icinga2-xYTSb8/icinga2-2.6.0/lib/base/application.cpp:805
#3  0x00007f1bedb946d6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007f1bedb94703 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007f1bedb94922 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007f1bee3fd9af in __cxa_throw (obj=obj@entry=0xd95750,

pvtinfo=pvtinfo@entry=0x7f1bee7297c0 <typeinfo for boost::exception_detail::clone_impl<boost::exception_detail::error_info_injectorstd::invalid_argument >>,
dest=dest@entry=0x7f1bee45fe60 <boost::exception_detail::clone_impl<boost::exception_detail::error_info_injectorstd::invalid_argument >::~clone_impl()>)
at /build/icinga2-xYTSb8/icinga2-2.6.0/lib/base/exception.cpp:144

#7  0x00007f1bee46f148 in boost::throw_exception > (e=...) at /usr/include/boost/throw_exception.hpp:67
#8  0x00007f1bee46f1f9 in boost::exception_detail::throw_exception_ (x=...,

current_function=current_function@entry=0x7f1bee4ccc40 <icinga::JsonDecode(icinga::String const&)::PRETTY_FUNCTION> "icinga::Value icinga::JsonDecode(const icinga::String&)",
file=file@entry=0x7f1bee4b5040 "/build/icinga2-xYTSb8/icinga2-2.6.0/lib/base/json.cpp", line=line@entry=351) at /usr/include/boost/throw_exception.hpp:84

#9  0x00007f1bee40b6ae in icinga::JsonDecode (data=...) at /build/icinga2-xYTSb8/icinga2-2.6.0/lib/base/json.cpp:351
#10 0x00007f1bee42bf82 in ProcessHandler () at /build/icinga2-xYTSb8/icinga2-2.6.0/lib/base/process.cpp:278
#11 0x00007f1bee42d640 in StartSpawnProcessHelper () at /build/icinga2-xYTSb8/icinga2-2.6.0/lib/base/process.cpp:334
#12 0x00007f1bee42d6d5 in icinga::Process::InitializeSpawnHelper () at /build/icinga2-xYTSb8/icinga2-2.6.0/lib/base/process.cpp:452
#13 0x0000000000409088 in Main () at /build/icinga2-xYTSb8/icinga2-2.6.0/icinga-app/icinga.cpp:452
#14 0x0000000000406fa9 in main (argc=6, argv=0x7ffcc74ff748) at /build/icinga2-xYTSb8/icinga2-2.6.0/icinga-app/icinga.cpp:779

@icinga-migration
Copy link
Author

Updated by lucasmopdx on 2017-01-05 22:42:16 +00:00

The issue appears to be here:

https://github.com/Icinga/icinga2/blob/v2.6.0/lib/base/process.cpp#L278

Basically, it allocates a buffer that's a maximum of 4096 bytes for a message. If the message exceeds 4096 bytes it's just not read past that length? (I didn't look at the code that sends this)
Since it's expecting a well formed JSON document, a truncated JSON document causes an unhandled exception.

@icinga-migration
Copy link
Author

Updated by Marax on 2017-01-06 07:37:28 +00:00

lucasmopdx wrote:

The issue appears to be here:

https://github.com/Icinga/icinga2/blob/v2.6.0/lib/base/process.cpp#L278

Basically, it allocates a buffer that's a maximum of 4096 bytes for a message. If the message exceeds 4096 bytes it's just not read past that length? (I didn't look at the code that sends this)
Since it's expecting a well formed JSON document, a truncated JSON document causes an unhandled exception.

I have no idea of programming but is there possibly a relationship with issue #13567 ? Or are two bugs in the same file?

https://dev.icinga.com/projects/i2/repository/revisions/fb8f4105ad9b72b58305b79f830d5e2b9a8128f4/diff/lib/base/process.cpp

@icinga-migration
Copy link
Author

Updated by lucasmopdx on 2017-01-06 21:30:49 +00:00

Marax wrote:

lucasmopdx wrote:

I have no idea of programming but is there possibly a relationship with issue #13567 ? Or are two bugs in the same file?

https://dev.icinga.com/projects/i2/repository/revisions/fb8f4105ad9b72b58305b79f830d5e2b9a8128f4/diff/lib/base/process.cpp

These are two separate bugs in the same file.

@icinga-migration
Copy link
Author

Updated by lucasmopdx on 2017-01-06 21:35:15 +00:00

Added a PR on github to fix this issue:

#119

Note that I was not able to test it as we resolved by downgrading to icinga2 2.5.4 (painfully).

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2017-01-09 15:16:48 +00:00

  • Duplicated set to 13619

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2017-01-09 15:18:26 +00:00

  • Category set to libbase
  • Priority changed from Normal to High
  • Target Version set to 2.6.1

@icinga-migration
Copy link
Author

Updated by n0braist on 2017-01-09 15:33:21 +00:00

On information for the length: I saw the behavior if the output length is longer than 5120 bytes.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2017-01-09 15:40:37 +00:00

  • Relates set to 13895

@icinga-migration
Copy link
Author

Updated by erikandersonmx on 2017-01-09 16:52:52 +00:00

lucasmopdx wrote:

The issue appears to be here:

https://github.com/Icinga/icinga2/blob/v2.6.0/lib/base/process.cpp#L278

Basically, it allocates a buffer that's a maximum of 4096 bytes for a message. If the message exceeds 4096 bytes it's just not read past that length? (I didn't look at the code that sends this)
Since it's expecting a well formed JSON document, a truncated JSON document causes an unhandled exception.

Awesome find!

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2017-01-09 16:59:11 +00:00

  • Relates set to 13739

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2017-01-11 13:08:26 +00:00

  • Relates set to 13657

@icinga-migration
Copy link
Author

Updated by snallygaster on 2017-01-12 16:09:51 +00:00

Just want to check - is everyone that is experiencing this issue running SELinux in enforcing mode? Although not a good idea for a production environment test, since running my test server with SELinux in permissive mode I haven't had a recurrence of the issue.

@icinga-migration
Copy link
Author

Updated by gvde on 2017-01-12 16:18:38 +00:00

snallygaster wrote:

Just want to check - is everyone that is experiencing this issue running SELinux in enforcing mode? Although not a good idea for a production environment test, since running my test server with SELinux in permissive mode I haven't had a recurrence of the issue.

We are running the production server in permissive mode only as there are a lot of avc hits. And we see the crash from time to time... So at least for us it's not related with selinux avc denies...

@icinga-migration
Copy link
Author

Updated by gvde on 2017-01-13 09:18:50 +00:00

n0braist wrote:

On information for the length: I saw the behavior if the output length is longer than 5120 bytes.

I can confirm that it seems to be related to the length of the plugin output, crashing when it's longer than 4223 bytes, not crashing when it's less then 2933 bytes.

@icinga-migration
Copy link
Author

Updated by gbeutner on 2017-01-16 07:18:33 +00:00

  • Status changed from New to Assigned
  • Assigned to set to gbeutner

The only thing I'm not quite happy with with regard to the PR is the fact that the ProcessHandler function is throwing exceptions. There's nobody to catch those exceptions which would definitely cause the child process to crash. I'll see if I can get that updated myself and then merge the PR into the master branch.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2017-01-16 10:29:08 +00:00

Test case for many arguments creating a crash:

object CheckCommand "13655-data" {
  command = {{
    var cmd = []

    for (i in range(0, 100000)) {
      cmd.add(i)
    }

    return cmd
  }}
}

object Host "13655-host" {
  check_command = "13655-data"
  check_interval = 1s
}

@icinga-migration
Copy link
Author

Updated by Anonymous on 2017-01-16 13:10:03 +00:00

  • Status changed from Assigned to Resolved
  • Done % changed from 0 to 100

Applied in changeset 06064e7.

@icinga-migration icinga-migration added blocker Blocks a release or needs immediate attention bug Something isn't working libbase labels Jan 17, 2017
@icinga-migration icinga-migration added this to the 2.6.1 milestone Jan 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker Blocks a release or needs immediate attention bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant