New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dev.icinga.com #12597] With too many comments, Icinga reload process won't finish reconnecting to Database #4603
Comments
Updated by gbeutner on 2016-08-30 21:33:29 +00:00
Can you show me the output for "SHOW FULL PROCESSLIST" from right before the MySQL client connection was closed (Error "MySQL server has gone away")? |
Updated by itbess on 2016-08-31 05:57:32 +00:00 gunnarbeutner wrote:
I can't seem to find anything in there.It just shows that long statement and then nothing. |
Updated by itbess on 2016-09-01 01:35:28 +00:00 I deleted halve of all the comments, now the reload works. |
Updated by gbeutner on 2016-09-01 05:43:37 +00:00 I'd still be interested in the output, in particular the time the command has been running. |
Updated by itbess on 2016-09-08 20:38:26 +00:00 gunnarbeutner wrote:
Problem is that as soon as that commadn runs my screen will be filled with text from that long statement so it is not easy to get anything It also seems like it is not just comments but also downtimes that affect this. |
Updated by PowellEB on 2016-09-09 04:50:25 +00:00 ... ERROR 2020 (HY000) at line 1: Got packet bigger than max_allowed_packet bytes Was able to capture this by: We have "max_allowed_packet" set to 1024M, which is the max, but this still fails. |
Updated by gbeutner on 2016-11-17 10:50:49 +00:00
|
Updated by mfriedrich on 2016-11-22 14:57:27 +00:00
|
Updated by mfriedrich on 2016-11-22 15:00:02 +00:00
Applied in changeset b028ff2. |
Updated by mfriedrich on 2016-11-22 15:09:16 +00:00 Steps to reproduce:
The code parts inside IdoMysqlConnection::FinishAsyncQueries() create a small portion of a query string based on max_allowed_packet setting. In order to trigger this error soon enough it is reasonable to modify the mysqld setting inside the mysql/mariadb server.
64K is the lowest, 128K works fine. That small query buffer with many delete/insert comments queries is then fired against the mysql server. After firing the query, all result sets must be fetched before firing another query. Inside the loop for fetching the result sets, there also is evaluation of a callback. The fix for this issue is to store the result sets for later processing, but fetching them all at first glance. Once done, further result set processing for callbacks is initiated. Please test the git master in your environments. We've properly tested this working. |
Updated by mfriedrich on 2016-11-25 14:56:36 +00:00
Unfortunately this does not entirely fix the problem, causing #13321. Reverted the patch in git master. |
Updated by mfriedrich on 2016-12-05 11:40:42 +00:00
This bug requires more investigation. I'm removing the release target in favour of QA and testing for 2.6 already going on. Once we find the root cause we'll release a version including this bugfix and tests. |
Updated by mfriedrich on 2017-01-11 13:20:57 +00:00
|
While replying here https://monitoring-portal.org/index.php?thread/39916-commands-out-of-sync-you-can-t-run-this-command-now/&postID=244475#post244475 I was thinking about removing the entire "replace into" thing with delete/update-insert logic. This attempt tried to solve empty web interfaces on a config reload, which where caused by simple delete/insert queries. It might be worthwhile to reconsider and remove the "improvement" sticking with the old behaviour of using the session_token. |
Ignore my comment above. To summarize the underlaying issue - CLIENT_MULTI_STATEMENTS requires us to fetch the entire result set. During this period of time (a loop iterating over all queries) we cannot execute a query in the same scope. I've added further query debug log (only for debug builds) and was looking for those query callbacks. The attempt in #4807 somehow causes the result sets being processed after retrieving them, but unveils another bug with contactnotifications queries - they're endlessly rescheduled. I have no idea why exactly, but this bug hides that behaviour. Another idea - which callbacks are actually registered for these queries? The main cause for comments and downtimes running into "out of sync command errors" is the "upsert" query logic. Once an UPDATE is executed, and does not affect any rows, FinishExecuteQuery() will fire another query which adds DELETE/INSERT and fires the query again. Actually it immediately attempts to execute the query, and does not enqueue it into the QueryQueue. This is mainly where the command out of sync error comes from. There are no other callbacks registered for such upsert queries, but it probably could happen in the future - keep this in mind. Changing the code parts to enqueue the query fixes the "out of sync" issue. I'll keep testing the patch in a branch for a little longer before actually merging it to master. https://github.com/Icinga/icinga2/tree/fix/ido-resultset-4603 TestsRequires to lower the packet size again.
Bug
PatchApplied fix and truncated icinga_comments table. 18000 comments loaded.
|
Everything's working fine, will do a PR. |
This issue has been migrated from Redmine: https://dev.icinga.com/issues/12597
Created by itbess on 2016-08-30 20:11:59 +00:00
Assignee: mfriedrich
Status: Assigned (closed on 2016-11-22 15:00:02 +00:00)
Target Version: 2.6.2
Last Update: 2017-01-11 13:20:57 +00:00 (in Redmine)
We recently updated from 2.4.10 to 2.5.3.
After the update the Icinga Process ran fine for a short time but then we exprienced that the IDO won't be updated.
The Logfile has the following entry:
This long statement deletes and reads every comment from the comments table.
When I copy the statement into a file and run it manually, it will work and the reload finishes.
We have around 18000 Comments. When I delete about the halve of them, then it works.
Changesets
2016-11-22 14:56:05 +00:00 by mfriedrich b028ff2
2016-11-23 06:50:43 +00:00 by mfriedrich ad0604d
2016-11-25 14:53:07 +00:00 by mfriedrich d076617
2016-11-28 11:13:34 +00:00 by gbeutner 79ecda5
The text was updated successfully, but these errors were encountered: