Skip to content
This repository has been archived by the owner on Jan 15, 2019. It is now read-only.

[dev.icinga.com #2178] Very long database query time for Servicegroups cronk #616

Closed
icinga-migration opened this issue Dec 12, 2011 · 21 comments
Labels
Milestone

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/2178

Created by ralfk on 2011-12-12 13:28:03 +00:00

Assignee: jmosshammer
Status: Resolved (closed on 2012-03-01 18:38:43 +00:00)
Target Version: 1.7
Last Update: 2012-10-22 13:38:27 +00:00 (in Redmine)

Icinga Version: 1.6.1
Icinga Web Version: 1.6.0
IDO Version: 1.6.0
OS Version: Linux
DB Type: MySQL
DB Version: 5.1.53
Browser Version: unknown

When I start the Servicegroup cronk, a database query is initiated that takes a very long time (or even lasts forever). Two queries are started initially and after 300 seconds of reload time another query is started, and so on. The query is:

SELECT DISTINCT i.servicegroup_id AS i__servicegroup_id, i.alias AS i__alias, i.servicegroup_id AS i__servicegroup_id, i.alias AS i__alias, i2.name1 AS i2__name1, i2.name1 AS i2__name1, i2.name1 AS i2__name1, i5.current_state AS i5__current_state, i5.current_state AS i5__current_state, i6.instance_name AS i6__instance_name, i2.name1 AS i2__0, i.servicegroup_id AS i__1, i2.name1 AS i2__2, i.alias AS i__2, i5.current_state AS i5__3, (i5.has_been_checked-i5.should_be_scheduled)*-1 AS i__4, (i5.has_been_checked-i5.should_be_scheduled)*-1 AS i__5, i6.instance_name AS i6__6, i.servicegroup_id AS i__7, i2.name1 AS i2__7, i.alias AS i__7, i5.current_state AS i5__7, COUNT(i5.current_state) AS i5__7, COUNT(i5.current_state) AS i5__8, (i5.has_been_checked-i5.should_be_scheduled)*-1 AS i__9, (i5.has_been_checked-i5.should_be_scheduled)*-1 AS i__9 FROM icinga_servicegroups i INNER JOIN icinga_objects i2 ON i.servicegroup_object_id = i2.object_id INNER JOIN icinga_servicegroup_members i4 ON (i.servicegroup_id = i4.servicegroup_id) INNER JOIN icinga_services i3 ON i3.service_object_id = i4.service_object_id INNER JOIN icinga_servicestatus i5 ON i3.service_object_id = i5.service_object_id INNER JOIN icinga_instances i6 ON i.instance_id = i6.instance_id INNER JOIN icinga_servicegroup_members i8 ON (i.servicegroup_id = i8.servicegroup_id) INNER JOIN icinga_services i7 ON i7.service_object_id = i8.service_object_id INNER JOIN icinga_service_contactgroups i10 ON (i7.service_id = i10.service_id) INNER JOIN icinga_contactgroups i9 ON i9.contactgroup_object_id = i10.contactgroup_object_id INNER JOIN icinga_contactgroup_members i12 ON (i9.contactgroup_id = i12.contactgroup_id) INNER JOIN icinga_contacts i11 ON i11.contact_object_id = i12.contact_object_id INNER JOIN icinga_objects i13 ON i11.contact_object_id = i13.object_id WHERE (i13.name1 = 'btt') GROUP BY i.servicegroup_id, i2.name1, i.alias, i5.current_state, (i5.has_been_checked-i5.should_be_scheduled)*-1, i6.instance_name ORDER BY i2.name1 ASC LIMIT 30

On a smaller system (i.e. with much less services etc.) the same query is fast. But on my main system it blocks my system due to heavy load and multiple queries at same time (MySQL "copy to tmp table"). The InnoDB buffer size is 256 MB (which could of course be higher, but other applications and the rest of Icinga work smoothly).

Important: I am pretty sure that this only happens under following conditions:

  • The web GUI user is not member of all host and service definitions (1) AND
  • "Only show items that contain a contact with this name in their contactgroup definitions" is enabled. (2)

If the user is member of all items the servicegroups are displayed quickly (1). If the definitions option (2) is disabled then it's also quick.


My system:
Icinga 1.6.1, Icinga Web 1.6.0, Linux 64 bit, MySQL 5.1.53

No of hosts: 350
No of services: 2500
No of hostgroups: 46
No of servicegroups: 47

Attachments

Changesets

2012-03-01 16:51:21 +00:00 by jmosshammer f80da0a

* Added dql views for templates and build new servicegroup version (refs #2368, fixes #2178)

2012-04-17 08:25:38 +00:00 by jmosshammer 4d28ddb11637dbff8c6a32cdcacb81dd984d7a82

* Added dql views for templates and build new servicegroup version (refs #2368, fixes #2178)

2012-04-17 08:40:21 +00:00 by jmosshammer e7e5859

* Added dql views for templates and build new servicegroup version (refs #2368, fixes #2178)

Relations:

@icinga-migration
Copy link
Author

Updated by ralfk on 2011-12-12 13:54:09 +00:00

Even though I close the "Loading" Servicegroups cronk the associated MySQL queries are still active and need to be manually killed using e.g. mysqladmin. That's bad. Isn't there a method to kill the database session/process when the hanging cronk is closed?

@icinga-migration
Copy link
Author

Updated by mhein on 2012-01-20 11:37:37 +00:00

  • Assigned to set to jmosshammer
  • Target Version set to 1.7

@icinga-migration
Copy link
Author

Updated by mhein on 2012-02-10 09:23:14 +00:00

  • Target Version changed from 1.7 to 1.6.2

@icinga-migration
Copy link
Author

Updated by jmosshammer on 2012-02-21 12:31:52 +00:00

  • Status changed from New to Feedback

Hi,

can you try to add those indexes to your database and tell me if the problem is solved afterwards?:

CREATE INDEX idx_all on icinga_servicegroups (servicegroup_id,instance_id,servicegroup_object_id,config_type,alias);
CREATE INDEX idx_all on icinga_servicegroup_members (servicegroup_member_id,instance_id,servicegroup_id,service_object_id);
CREATE INDEX idx_all on icinga_contactgroup_members (contactgroup_member_id, instance_id, contactgroup_id,contact_object_id);

@icinga-migration
Copy link
Author

Updated by ralfk on 2012-02-23 12:55:08 +00:00

I implemented the indexes, but unfortunately it did not improve it. Same behavior as before. Maybe the 2nd table icinga_servicegroup_members with 3006 rows would be a candidate to be significantly improved by indexing. The other two tables have only 48 and 346 rows.

@icinga-migration
Copy link
Author

Updated by jmosshammer on 2012-02-23 13:22:16 +00:00

  • Status changed from Feedback to Assigned

Thx for testing!
Could you perhaps tell me the results of the following explain query:

EXPLAIN SELECT DISTINCT i.servicegroup_id AS i_servicegroup_id, i.alias AS ialias, i.servicegroup_id AS iservicegroup_id, i.alias AS ialias, i2.name1 AS i2name1, i2.name1 AS i2name1, i2.name1 AS i2name1, i5.current_state AS i5current_state, i5.current_state AS i5current_state, i6.instance_name AS i6instance_name, i2.name1 AS i20, i.servicegroup_id AS i1, i2.name1 AS i22, i.alias AS i2, i5.current_state AS i53, (i5.has_been_checked-i5.should_be_scheduled)-1 AS i__4, (i5.has_been_checked-i5.should_be_scheduled)-1 AS i5, i6.instance_name AS i66, i.servicegroup_id AS i7, i2.name1 AS i27, i.alias AS i7, i5.current_state AS i57, COUNT (i5.current_state) AS i57, COUNT (i5.current_state) AS i58, (i5.has_been_checked-i5.should_be_scheduled)-1 AS i__9, (i5.has_been_checked-i5.should_be_scheduled)-1 AS i_9 FROM icinga_servicegroups i INNER JOIN icinga_objects i2 ON i.servicegroup_object_id = i2.object_id INNER JOIN icinga_servicegroup_members i4 ON (i.servicegroup_id = i4.servicegroup_id) INNER JOIN icinga_services i3 ON i3.service_object_id = i4.service_object_id INNER JOIN icinga_servicestatus i5 ON i3.service_object_id = i5.service_object_id INNER JOIN icinga_instances i6 ON i.instance_id = i6.instance_id INNER JOIN icinga_servicegroup_members i8 ON (i.servicegroup_id = i8.servicegroup_id) INNER JOIN icinga_services i7 ON i7.service_object_id = i8.service_object_id INNER JOIN icinga_service_contactgroups i10 ON (i7.service_id = i10.service_id) INNER JOIN icinga_contactgroups i9 ON i9.contactgroup_object_id = i10.contactgroup_object_id INNER JOIN icinga_contactgroup_members i12 ON (i9.contactgroup_id = i12.contactgroup_id) INNER JOIN icinga_contacts i11 ON i11.contact_object_id = i12.contact_object_id INNER JOIN icinga_objects i13 ON i11.contact_object_id = i13.object_id WHERE (i13.name1 = 'btt') GROUP BY i.servicegroup_id, i2.name1, i.alias, i5.current_state, (i5.has_been_checked-i5.should_be_scheduled)*-1, i6.instance_name ORDER BY i2.name1 ASC LIMIT 30

@icinga-migration
Copy link
Author

Updated by ralfk on 2012-02-24 07:15:04 +00:00

  • File added query.txt

See attached query result (tab separated).

@icinga-migration
Copy link
Author

Updated by jmosshammer on 2012-02-24 09:01:56 +00:00

Sorry, i think i forgot one index in my last comment:
Could you try this:

DROP INDEX idx_all on icinga_servicegroups;
DROP INDEX idx_all on icinga_servicegroup_members;
DROP INDEX idx_all on icinga_contactgroup_members;
CREATE INDEX idx_all on icinga_service_contactgroups (service_contactgroup_id,instance_id,service_id,contactgroup_object_id);

On my machine row 12 (which should be the bottleneck) changed from 'all' to 'index' by adding this index.

@icinga-migration
Copy link
Author

Updated by ralfk on 2012-02-24 10:52:36 +00:00

  • File added query2.txt

I deleted the indexes and created the one on icinga_service_contactgroups. No improvement. See attached explain query output.

Just an idea: Since the query contains the instance, maybe the slow performance is related to bug #2179 ?

@icinga-migration
Copy link
Author

Updated by jmosshammer on 2012-02-24 11:03:33 +00:00

Hi,

#2179 was an issue that occured in icinga-web, this one is more a problem in the database .

Could you please readd those two indices and see if the performance is better:

CREATE INDEX idx_all on icinga_servicegroup_members (servicegroup_member_id,instance_id,servicegroup_id,service_object_id);
CREATE INDEX idx_all on icinga_contactgroup_members (contactgroup_member_id, instance_id, contactgroup_id,contact_object_id);

@icinga-migration
Copy link
Author

Updated by ralfk on 2012-02-24 11:16:31 +00:00

Did that. No improvement.

Would it maybe help if you would get a copy of the icinga DB? If so, only on a separate and secure channel, since the DB contains some internal info that I don't want to become public. The DB is (unzipped) 1.3 GB big ;-)

@icinga-migration
Copy link
Author

Updated by jmosshammer on 2012-02-24 11:21:39 +00:00

Ok, I hoped we could fix it the easy way ;)

I'm now going to setup a test configuration for that - i.e. with a lot of servicegroups and contactgroups. If think this should be sufficient to reproduce the error.

@icinga-migration
Copy link
Author

Updated by ralfk on 2012-02-24 11:36:40 +00:00

OK. In general it would make sense that you have a big DB for testing :-)

I feel that the new GUI has still some room for improvement concerning performance. Some cronks which even work are not really fast (compared to old GUI). Since the new GUI can't still not replace the old one (at least for my requirements: A user shall only see his hosts and services configured as contact) I don't have much experience with the performance of the new GUI. If needed functionality is given I will more often use the new GUI and you will get more feedback.

@icinga-migration
Copy link
Author

Updated by jmosshammer on 2012-02-27 10:39:48 +00:00

  • Target Version changed from 1.6.2 to 1.7

Hi,

I've done some analyzing, - the problem lies in the grouping and counting fields (which can be a little bit tricky in mysql - altough pgsql shows the same issue).

As this will require some deeper changes, indices and additional query rewrites i'm going to postpone it to 1.7.
Neitherless, I'm continuing to work on fixing the issue.

@icinga-migration
Copy link
Author

Updated by ralfk on 2012-02-27 11:00:56 +00:00

OK. Thanks for your effort. As soon as I can help testing give me a note here.

@icinga-migration
Copy link
Author

Updated by jmosshammer on 2012-03-01 18:38:43 +00:00

  • Status changed from Assigned to Resolved
  • Done % changed from 0 to 100

Applied in changeset f80da0a.

@icinga-migration
Copy link
Author

Updated by jmosshammer on 2012-03-07 08:41:47 +00:00

Hi,

please try the jmosshammer/templateDQL branch. I've rewritten the database backend code for the templates, servicegroups (and especially notifications) are now in an acceptable speed when I add all possible credentials:

https://git.icinga.org/?p=icinga-web.git;a=commit;h=97f27b513774c20a0bbe32058cac94b169f4b7a0

Please let me know if you encountered any issues in #2368,

thanks!

@icinga-migration
Copy link
Author

Updated by ralfk on 2012-05-30 14:40:30 +00:00

I updated to 1.7.0 today (sorry for being so late).

In general the Servicegroup cronk does work now. Great.

But the refresh (query) still takes 20-30 seconds for 8 service groups with about 1600 service checks (which is maybe 50% of the overall amount of groups and checks). There are still many "copy to tmp table" MySQL processes.

So, there is still room for improvement for users which only should see their service groups and service checks according to the setting "Only show items that contain a contact with this name in their contactgroup definitions".

@icinga-migration
Copy link
Author

Updated by tgelf on 2012-05-31 11:26:45 +00:00

@jmosshammer: as soon as you are creating an index involving the autoincremental primary key (such as service_contactgroup_id in icinga_service_contactgroups) it is going to be pretty useless: it's cardinality will match the table row count. Please also note that MySQL is incapable of combining mulitiple indexes efficiently, it usually chooses one index for each involved table.

What we are talking about here are so-called m:n relations. The only useful columns such tables (e.g. icinga_xy_members) are those referencing related objects. This means that the best index will involve just two columns on each table looking like the ones you named before. As object_id is unique throughout all instances, the instance_id is not important for your joins.

Unfortunately we have already had such indexes before Icinga 1.5 (ido), they have been dropped. Please see issue #2618 for farther details.

@icinga-migration
Copy link
Author

Updated by ralfk on 2012-10-22 13:33:16 +00:00

Just for info: After upgrade to 1.8.0 the same servicegroup cronk query takes about 4 seconds. So, it has been significantly improved from 1.7.2 to 1.8.0.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2012-10-22 13:38:28 +00:00

  • Icinga Version set to 1
  • Icinga Web Version set to 1
  • IDO Version set to 1
  • OS Version set to Linux
  • DB Type set to MySQL
  • DB Version set to 5
  • Browser Version set to unknown

within #2618 there were serveral badly deleted indexes re-added, which should have helped in that case as well.

@icinga-migration icinga-migration added this to the 1.7 milestone Jan 17, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

1 participant