Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev.icinga.com #10426] Icinga crashes with a segfault on receiving a lot of check results for nonexisting hosts/services #3526

Closed
icinga-migration opened this issue Oct 21, 2015 · 17 comments
Labels
bug Something isn't working
Milestone

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/10426

Created by vytenis on 2015-10-21 18:31:32 +00:00

Assignee: vytenis
Status: Resolved (closed on 2016-02-24 22:27:10 +00:00)
Target Version: 2.4.8
Last Update: 2016-04-21 07:46:57 +00:00 (in Redmine)

Icinga Version: 2.4.0
Backport?: Not yet backported
Include in Changelog: 1

Large setup, ~500k services, check results are fed via nsca-ng/external command pipe. On real load with a lot of check results via external command pipe and some of them (<10%) are not actually registered in Icinga2, it crashes a few seconds after startup:
http://hastebin.com/yivajejape.sm

Running icinga2 git master @ from oct 16 / commit 21a2986

Attachments

Changesets

2016-02-24 22:25:22 +00:00 by vytenis 6729679

Try to queue all PROCESS_FILE commands instead of exploding the stack

fixes #10426

Signed-off-by: Michael Friedrich <michael.friedrich@netways.de>

2016-02-24 22:25:59 +00:00 by mfriedrich 8e0cc70

Update AUTHORS

refs #10426

2016-05-12 09:08:19 +00:00 by vytenis 9f3a6b9

Try to queue all PROCESS_FILE commands instead of exploding the stack

fixes #10426

Signed-off-by: Michael Friedrich <michael.friedrich@netways.de>

2016-05-12 09:08:19 +00:00 by mfriedrich 7175174

Update AUTHORS

refs #10426
@icinga-migration
Copy link
Author

Updated by vytenis on 2015-10-21 20:57:44 +00:00

BTW removing calls to BOOST_THROW_EXCEPTION(std::invalid_argument("Cannot process passive host check result for non-existent host '" + arguments[0] + "'")); fixes the issue
... or not, it only delays it for a while.

@icinga-migration
Copy link
Author

Updated by gbeutner on 2015-10-22 05:54:24 +00:00

  • Category set to libicinga
  • Status changed from New to Feedback
  • Assigned to set to vytenis

What does the file you're passing to PROCESS_FILE look like?

@icinga-migration
Copy link
Author

Updated by vytenis on 2015-10-22 10:49:34 +00:00

Most of the files look like this - all `PROCESS_SERVICE_CHECK_RESULT` with one chained `PROCESS_FILE` somewhere

[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostyxz;service1234;0;plugin output ...................................................
[1445494578] PROCESS_FILE;/dev/shm/nsca.hMLg5v;1
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostyxz;service1234;0;plugin output ...................................................
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostyxz;service1234;0;plugin output ...................................................
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostyxz;service1234;0;plugin output ...................................................
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostbca;service1234;0;plugin output ...................................................
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostyxz;service1234;0;plugin output ...................................................
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostyxz;service1234;0;plugin output ...................................................
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostyxz;service1234;0;plugin output ...................................................
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostyxz;service1234;0;plugin output ...................................................
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostyxz-;puppet;0;PUPPET OK: last successful puppet run at Thu Oct 22 05:28:35 UTC 2015 Duration: 212 seconds
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostbca;service1234;0;plugin output ...................................................
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostyxz;check-results;0;Processed 53 checks in 3.02s
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostyxz;service1234;0;plugin output ...................................................
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostbca;service1234;0;plugin output ...................................................
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostyxz;service1234;0;plugin output ...................................................
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostbca;num-procs;0;PROCS OK: 477 total processes
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostyxz;service1234;0;plugin output ...................................................
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostbca;service1234;0;plugin output ...................................................
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostyxz;service1234;0;plugin output ...................................................
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostbca;config-validation-check;0;config validated
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostyxz;service1234;0;plugin output ...................................................
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostbca;service1234;0;plugin output ...................................................
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostyxz;service1234;0;plugin output ...................................................
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostyxzapp02;error_watcher;0;OK: no unwanted lines found (15 examined)
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostyxz;service1234;0;plugin output ...................................................

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-10-22 11:29:19 +00:00

So you are processing that file with PROCESS_FILE and then chain another PROCESS_FILE request inside? That's strange.

@icinga-migration
Copy link
Author

Updated by vytenis on 2015-10-22 12:00:55 +00:00

That's how https://www.nsca-ng.org works - sends a single PROCESS_FILE command instead of thousands of them.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-10-22 12:13:06 +00:00

Sure. But I've never seen that nesting PROCESS_FILE into files does work. Which is most likely the problem here.

@icinga-migration
Copy link
Author

Updated by gbeutner on 2015-10-22 12:46:00 +00:00

Well, there's no inherent problem with nesting PROCESS_FILE calls, but I suspect you might be calling PROCESS_FILE for the same file recursively (i.e. file 'a' calls PROCESS_FILE for file 'a').

@icinga-migration
Copy link
Author

Updated by vytenis on 2015-10-22 13:14:40 +00:00

Same exact setup works with Nagios4 :-/

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-10-22 13:21:37 +00:00

vytenis wrote:

Most of the files look like this - all `PROCESS_SERVICE_CHECK_RESULT` with one chained `PROCESS_FILE` somewhere
[...]

Whats the exact path for that file being processed?

@icinga-migration
Copy link
Author

Updated by vytenis on 2015-10-22 13:30:09 +00:00

dnsmichi wrote:

Whats the exact path for that file being processed?
/dev/shm/nsca.****, eg. /dev/shm/nsca.hMLg5v
It is a file owned by nagios user, no permission errors there.

@icinga-migration
Copy link
Author

Updated by vytenis on 2015-10-27 17:08:43 +00:00

  • File added 0001-Try-to-queue-all-PROCESS_FILE-commands-instead-of-ex.patch

We fixed it in our setup with the attached patch.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-11-25 09:42:27 +00:00

  • Status changed from Feedback to New
  • Assigned to deleted vytenis

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-02-24 22:26:57 +00:00

  • Status changed from New to Assigned
  • Assigned to set to vytenis
  • Target Version set to 2.5.0

Sorry for the delay, and thanks for the patch :)

@icinga-migration
Copy link
Author

Updated by vytenis on 2016-02-24 22:27:10 +00:00

  • Status changed from Assigned to Resolved
  • Done % changed from 0 to 100

Applied in changeset 6729679.

@icinga-migration
Copy link
Author

Updated by gbeutner on 2016-04-20 08:14:38 +00:00

  • Target Version changed from 2.5.0 to 2.4.6

@icinga-migration
Copy link
Author

Updated by gbeutner on 2016-04-20 16:35:39 +00:00

  • Target Version changed from 2.4.6 to 2.4.7

@icinga-migration
Copy link
Author

Updated by gbeutner on 2016-04-21 07:46:57 +00:00

  • Target Version changed from 2.4.7 to 2.4.8

@icinga-migration icinga-migration added bug Something isn't working libicinga labels Jan 17, 2017
@icinga-migration icinga-migration added this to the 2.4.8 milestone Jan 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant