Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev.icinga.com #8756] Improve DB IDO config dump performance #2767

Closed
icinga-migration opened this issue Mar 16, 2015 · 14 comments
Closed

[dev.icinga.com #8756] Improve DB IDO config dump performance #2767

icinga-migration opened this issue Mar 16, 2015 · 14 comments
Labels
area/db-ido Database output bug Something isn't working

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/8756

Created by TheSerapher on 2015-03-16 08:34:17 +00:00

Assignee: TheSerapher
Status: Closed (closed on 2016-11-11 08:46:16 +00:00)
Target Version: (none)
Last Update: 2016-11-11 08:47:01 +00:00 (in Redmine)

Icinga Version: 2.3.2
Backport?: Not yet backported
Include in Changelog: 1

We have been seeing issues with the initial configuration dump during server startups taking a very long time to complete a full wipe and re-fill of all tables. This interferes with masters when doing reload causing them to initiate a failover. It would be great if these dumps could somehow be changed to only update the tables when needed. Maybe a simple first step could be a config md5sum stored in a table that can be compared against, that way not every restart or reload causes this. Or go further and keep a sum over all files that have been loaded into DB and only reload those files, that have changed.

I don't know the inner workings of Icinga2 so I am not sure this is even doable, but it would greatly help in startup performance.


Relations:

@icinga-migration
Copy link
Author

Updated by TheSerapher on 2015-03-16 12:30:41 +00:00

Here the WorkQueue during Master Startup. Once startup completes, this queue stays empty until checkers are added.

# egrep "(ThreadPool|WorkQueue)" debug.log
[2015-03-16 13:16:35 +0100] notice/WorkQueue: #2 tasks: 0
[2015-03-16 13:16:36 +0100] notice/WorkQueue: #3 tasks: 8750
[2015-03-16 13:16:44 +0100] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 0.0137885%
[2015-03-16 13:16:45 +0100] notice/WorkQueue: #2 tasks: 0
[2015-03-16 13:16:46 +0100] notice/WorkQueue: #3 tasks: 41772
[2015-03-16 13:16:55 +0100] notice/WorkQueue: #2 tasks: 0
[2015-03-16 13:16:56 +0100] notice/WorkQueue: #3 tasks: 75191
[2015-03-16 13:16:59 +0100] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 0.0573364%
[2015-03-16 13:17:05 +0100] notice/WorkQueue: #2 tasks: 0
[2015-03-16 13:17:06 +0100] notice/WorkQueue: #3 tasks: 108820
[2015-03-16 13:17:14 +0100] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 0.0159137%
[2015-03-16 13:17:15 +0100] notice/WorkQueue: #2 tasks: 0
[2015-03-16 13:17:16 +0100] notice/WorkQueue: #3 tasks: 140385
[2015-03-16 13:17:25 +0100] notice/WorkQueue: #2 tasks: 0
[2015-03-16 13:17:26 +0100] notice/WorkQueue: #3 tasks: 172324
[2015-03-16 13:17:29 +0100] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 0.0697104%
[2015-03-16 13:17:35 +0100] notice/WorkQueue: #2 tasks: 0
[2015-03-16 13:17:36 +0100] notice/WorkQueue: #3 tasks: 204787
[2015-03-16 13:17:44 +0100] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 0.0146939%
[2015-03-16 13:17:45 +0100] notice/WorkQueue: #2 tasks: 0
[2015-03-16 13:17:46 +0100] notice/WorkQueue: #3 tasks: 260815
[2015-03-16 13:17:55 +0100] notice/WorkQueue: #2 tasks: 0
[2015-03-16 13:17:56 +0100] notice/WorkQueue: #3 tasks: 313382
[2015-03-16 13:17:59 +0100] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 0.0591065%
[2015-03-16 13:18:05 +0100] notice/WorkQueue: #2 tasks: 0
[2015-03-16 13:18:06 +0100] notice/WorkQueue: #3 tasks: 0
[2015-03-16 13:18:14 +0100] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 0.015875%
[2015-03-16 13:18:15 +0100] notice/WorkQueue: #2 tasks: 0
[2015-03-16 13:18:16 +0100] notice/WorkQueue: #3 tasks: 0
[2015-03-16 13:18:25 +0100] notice/WorkQueue: #2 tasks: 0
[2015-03-16 13:18:26 +0100] notice/WorkQueue: #3 tasks: 0
[2015-03-16 13:18:29 +0100] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 0.074989%
[2015-03-16 13:18:35 +0100] notice/WorkQueue: #2 tasks: 0
[2015-03-16 13:18:36 +0100] notice/WorkQueue: #3 tasks: 0

@icinga-migration
Copy link
Author

Updated by TheSerapher on 2015-03-16 12:49:02 +00:00

It seems that during reload the IDO connections are paused too:

[2015-03-16 13:46:37 +0100] information/Application: Got reload command: Starting new instance.
[2015-03-16 13:46:48 +0100] information/Checkable: Notifications are disabled for service 'datadomain-1.backup.fra1!Disk-States'.
[2015-03-16 13:46:49 +0100] information/Checkable: Notifications are disabled for service 'messages-1.api.fra1!cron'.
[2015-03-16 13:46:50 +0100] information/DynamicObject: Dumping program state to file '/var/lib/icinga2/icinga2.state'
[2015-03-16 13:46:55 +0100] information/Application: Received request to shut down.
[2015-03-16 13:46:55 +0100] information/Application: Shutting down...
[2015-03-16 13:46:55 +0100] information/DbConnection: Pausing IDO connection: ido-mysql
[2015-03-16 13:47:07 +0100] information/ConfigItem: Activated all objects.

If that is the case, the master would probably stop updating the status table until the reload completes?

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-03-19 09:41:11 +00:00

  • Subject changed from Icinga2 Startup Configuration Dump to Improve DB IDO config dump performance
  • Category set to DB IDO

@icinga-migration
Copy link
Author

Updated by TheSerapher on 2015-04-17 10:39:54 +00:00

Here an idea that could help to speed up cluster restarts if no config changes are required: Maybe store an overall configuration of checks MD5 sum in the database, once the master starts it can check against the current md5 sum of the current configuration and compare with what is on file in the DB. If those match, just start up and don't dump the configuration.

Would greatly improve restarting the core for Icinga2 changes that do not involve any check changes.

As for check updates, a proper way to improve performance is still required.

@icinga-migration
Copy link
Author

Updated by gbeutner on 2015-12-10 16:18:52 +00:00

  • Relates set to 10822

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-04-07 09:20:00 +00:00

  • Target Version set to Backlog

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-04-07 09:20:51 +00:00

  • Relates deleted 10822

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-04-07 09:22:28 +00:00

  • Parent Id set to 10073

@icinga-migration
Copy link
Author

Updated by gbeutner on 2016-08-15 10:56:54 +00:00

  • Relates set to 12435

@icinga-migration
Copy link
Author

Updated by gbeutner on 2016-08-15 10:57:36 +00:00

This will likely be addressed by #12435.

@icinga-migration
Copy link
Author

Updated by gbeutner on 2016-08-15 14:51:27 +00:00

  • Status changed from New to Feedback
  • Assigned to set to TheSerapher

Please re-test this with the latest master branch. Note that the initial config dump might take longer than before - but subsequent restarts should be noticibly faster.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-09-30 13:54:03 +00:00

Please test that with 2.5.4 and provide your findings. Otherwise we'll close the issue soon-ish.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-11-11 08:46:16 +00:00

  • Status changed from Feedback to Closed
  • Target Version deleted Backlog

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-11-11 08:47:01 +00:00

  • Parent Id deleted 10073

@icinga-migration icinga-migration added bug Something isn't working area/db-ido Database output labels Jan 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/db-ido Database output bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant