New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dev.icinga.com #10731] Icinga2 locks up after initialization and state dump with IDO enabled #3717
Comments
Updated by vytenis on 2015-11-26 15:17:07 +00:00
Same with a recent snapshot:
|
Updated by vytenis on 2015-11-26 16:34:45 +00:00
So the issue is due to the fact that IDO queue is full. However, for some reason it is never draining - SQL execution just stops. |
Updated by gbeutner on 2015-11-26 18:11:38 +00:00
Can you attach gdb to the icinga2 process once it has locked up and post the output of "thread apply all bt full" here? |
Updated by vytenis on 2015-11-26 18:16:49 +00:00
Bumping queue from 500k to 5m allows Icinga2 to start successfully, however with 400k services (passive mostly), a lot of host/service groups it apparently takes tens of millions of queries to do it's job. However, if the queue got full still during init, everything would block (no checks would be performed) |
Updated by vytenis on 2015-11-26 19:54:18 +00:00
Attaching `bt full` output |
Updated by vytenis on 2015-11-26 19:57:59 +00:00
Apparently even if you turn off pagination, gdb still thinks you're insane to output it all in one go. Real, full backtrace is >20mb w/o compression |
Updated by mfriedrich on 2015-12-17 09:41:18 +00:00
Can you re-test this with the latest git master, please? :) |
Updated by vytenis on 2015-12-20 20:18:17 +00:00 Much better performance (less queries) on restart, but once config is big enough to hit queue limit, icinga2 is still non-responsive. |
Updated by vytenis on 2015-12-20 21:56:13 +00:00 Nevermind, my bad - bad config with way too many objects made it freeze.
Stats are updating correctly now, even icingaweb2 thinks icinga is alive, although most of the data is out of data |
Updated by vytenis on 2015-12-20 22:02:39 +00:00 This is weird though:
However everything works, and is probably related to the fact that I previously created 10+M objects that have not been cleaned up |
Updated by mfriedrich on 2016-02-24 21:40:27 +00:00 At some point the work queue is full (when the database is not able to process more queries than are pushed into from the core). Then the check result processing will be blocked for example. Suggestions on how to solve slow database on the core side welcome. |
Updated by mfriedrich on 2016-03-18 15:30:17 +00:00
This is merely a problem with your database not being able to keep up. Once the query queue is full (you've already modified the upstream source from 1mio to 5mio) threads lock up and wait for the queue to get empty again. I'm closing this issue for now, please look into your database performance. |
Updated by lewiseason on 2016-03-30 15:57:26 +00:00 Is there any tooling available to get information about the query queue, or, flush it completely? Occasionally this is the easiest way to get back into a good state, and I don't mind losing a few minutes of checks. |
Updated by mfriedrich on 2016-03-30 16:02:32 +00:00 Either define the "ido" Check (see the docs for details) which sends performance data metrics for the queue size too. Or fetch it from the REST API, /v1/status. |
Updated by lewiseason on 2016-03-30 16:45:03 +00:00 Do either of those methods require the query queue/my database to be responsive? At the moment, I have a cron job that looks at the log file and gets the last Query Queue length, which is at least quite robust. What I'd really like to be able to do is say "ok icinga, just drop everything in the queue and carry on starting from now" (at least until I can get the performance issues in check). Is this something I can do? |
This issue has been migrated from Redmine: https://dev.icinga.com/issues/10731
Created by vytenis on 2015-11-26 14:35:40 +00:00
Assignee: (none)
Status: Closed (closed on 2016-03-18 15:30:17 +00:00)
Target Version: (none)
Last Update: 2016-03-30 16:45:03 +00:00 (in Redmine)
Icinga2 locks up after initialization and state dump with IDO enabled. API, livestatus is still responsive but no checks are made, passive check results are not processed, no database queries are executed.
Attaching gdb `bt`/`thread apply all bt` output. Debug log looks like this:
Attachments
The text was updated successfully, but these errors were encountered: