Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev.icinga.com #8612] IDO performance is far from where it used to be #1428

Closed
icinga-migration opened this issue Mar 5, 2015 · 19 comments
Closed
Labels
area/monitoring Affects the monitoring module bug Something isn't working queue/important Blocks a release or needs immediate attention

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/8612

Created by tgelf on 2015-03-05 14:55:30 +00:00

Assignee: (none)
Status: Closed (closed on 2016-10-05 11:24:52 +00:00)
Target Version: (none)
Last Update: 2016-10-05 11:24:52 +00:00 (in Redmine)


While we are A LOT faster than legacy frontends I'm very unhappy with the current performance. We used to be WAY faster than we are now, especially in environments with thousands of objects. We should try to figure out the worst bottlenecks, link them to this issue and try to fix them one by one.

Cheers,
Thomas


Subtasks:

Relations:

@icinga-migration
Copy link
Author

Updated by tgelf on 2015-03-05 15:02:16 +00:00

  • Relates set to 8613

@icinga-migration
Copy link
Author

Updated by tgelf on 2015-03-05 15:16:06 +00:00

  • Relates set to 8614

@icinga-migration
Copy link
Author

Updated by tgelf on 2015-03-05 15:18:44 +00:00

  • Relates set to 8615

@icinga-migration
Copy link
Author

Updated by tgelf on 2015-03-05 15:28:29 +00:00

  • Relates set to 8616

@icinga-migration
Copy link
Author

Updated by elippmann on 2015-07-14 10:33:10 +00:00

  • Relates deleted 8615

@icinga-migration
Copy link
Author

Updated by elippmann on 2015-07-14 10:33:14 +00:00

  • Relates deleted 8616

@icinga-migration
Copy link
Author

Updated by elippmann on 2015-07-14 10:34:12 +00:00

  • Category set to Monitoring
  • Target Version set to 2.0.0

@icinga-migration
Copy link
Author

Updated by jmeyer on 2015-07-29 10:18:53 +00:00

  • Relates set to 9767

@icinga-migration
Copy link
Author

Updated by elippmann on 2015-08-04 10:54:58 +00:00

  • Relates deleted 9767

@icinga-migration
Copy link
Author

Updated by frjaraur on 2015-08-28 10:20:06 +00:00

We have poor performance on icingaweb2 and we caught some very slow queries that are used quite often. Why don't we use some views for message counts instead of refreshing data continously?. We have seen that mysql sessions stayed opened for long time waiting for that queries to stop. We had to update max_sessions from 50 (mysql default) to 200 but that wasn't the real problem because sessions inceasse everytime a long query appears slowing down icinga2, until we restart mysql server and sessions are freed.

  1. Time: 150828 12:09:45
  2. User@Host: icinga[icinga] @ XXXXXXXXX Id: 841
  3. Query_time: 7396.678959 Lock_time: 0.000693 Rows_sent: 1 Rows_examined: 293250
    SET timestamp=1440756585;
    SELECT COUNT (*) AS cnt FROM (SELECT statussummary.servicegroup_alias COLLATE latin1_general_ci AS servicegroup_alias, statussummary.servicegroup_name, SUM (CASE WHEN object_type = 'service' AND state = 2 AND handled + host_state > 0 THEN 1 ELSE 0 END) AS services_critical_handled, MAX (CASE WHEN object_type = 'service' AND state = 2 AND handled + host_state > 0 THEN state_change ELSE 0 END) AS services_critical_last_state_change_handled, MAX (CASE WHEN object_type = 'service' AND state = 2 AND handled + host_state = 0 THEN state_change ELSE 0 END) AS services_critical_last_state_change_unhandled, SUM (CASE WHEN object_type = 'service' AND state = 2 AND handled + host_state = 0 THEN 1 ELSE 0 END) AS services_critical_unhandled, SUM (CASE WHEN object_type = 'service' AND state = 0 THEN 1 ELSE 0 END) AS services_ok, MAX (CASE WHEN object_type = 'service' AND state = 0 THEN state_change ELSE 0 END) AS services_ok_last_state_change, SUM (CASE WHEN object_type = 'service' AND state = 99 THEN 1 ELSE 0 END) AS services_pending, MAX (CASE WHEN object_type = 'service' AND state = 99 THEN state_change ELSE 0 END) AS services_pending_last_state_change, SUM (CASE WHEN object_type = 'service' THEN 1 ELSE 0 END) AS services_total, SUM (CASE WHEN object_type = 'service' AND state = 3 AND handled + host_state > 0 THEN 1 ELSE 0 END) AS services_unknown_handled, MAX (CASE WHEN object_type = 'service' AND state = 3 AND handled + host_state > 0 THEN state_change ELSE 0 END) AS services_unknown_last_state_change_handled, MAX (CASE WHEN object_type = 'service' AND state = 3 AND handled + host_state = 0 THEN state_change ELSE 0 END) AS services_unknown_last_state_change_unhandled, SUM (CASE WHEN object_type = 'service' AND state = 3 AND handled + host_state = 0 THEN 1 ELSE 0 END) AS services_unknown_unhandled, SUM (CASE WHEN object_type = 'service' AND state = 1 AND handled + host_state > 0 THEN 1 ELSE 0 END) AS services_warning_handled, MAX (CASE WHEN object_type = 'service' AND state = 1 AND handled + host_state > 0 THEN state_change ELSE 0 END) AS services_warning_last_state_change_handled, MAX (CASE WHEN object_type = 'service' AND state = 1 AND handled + host_state = 0 THEN state_change ELSE 0 END) AS services_warning_last_state_change_unhandled, SUM (CASE WHEN object_type = 'service' AND state = 1 AND handled + host_state = 0 THEN 1 ELSE 0 END) AS services_warning_unhandled FROM (SELECT CASE WHEN (hs.problem_has_been_acknowledged + hs.scheduled_downtime_depth) > 0 THEN 1 ELSE 0 END AS handled, NULL AS host_state, sg.alias COLLATE latin1_general_ci AS servicegroup_alias, sgo.name1 AS servicegroup_name, ('host') AS object_type, NULL AS severity, CASE WHEN hs.has_been_checked = 0 OR hs.has_been_checked IS NULL THEN 99 ELSE hs.current_state END AS state, UNIX_TIMESTAMP(hs.last_state_change) AS state_change FROM icinga_hosts AS h
    INNER JOIN icinga_objects AS ho ON ho.object_id = h.host_object_id AND ho.is_active = 1 AND ho.objecttype_id = 1
    INNER JOIN icinga_hoststatus AS hs ON hs.host_object_id = ho.object_id
    LEFT JOIN icinga_services AS s ON s.host_object_id = h.host_object_id
    LEFT JOIN icinga_objects AS so ON so.object_id = s.service_object_id AND so.is_active = 1 AND so.objecttype_id = 2
    LEFT JOIN icinga_servicegroup_members AS sgm ON sgm.service_object_id = s.service_object_id
    LEFT JOIN icinga_servicegroups AS sg ON sgm.servicegroup_id = sg.servicegroup_id
    LEFT JOIN icinga_objects AS sgo ON sgo.object_id = sg.servicegroup_object_id AND sgo.is_active = 1 AND sgo.objecttype_id = 4 WHERE ( (sgo.name1 COLLATE latin1_general_ci LIKE 'thottst01' OR sg.alias COLLATE latin1_general_ci LIKE 'thottst01') ) GROUP BY h.host_id,
    ho.object_id,
    hs.hoststatus_id,
    sg.servicegroup_id,
    sgo.object_id UNION ALL SELECT CASE WHEN (ss.problem_has_been_acknowledged + ss.scheduled_downtime_depth + COALESCE (hs.current_state, 0)) > 0 THEN 1 ELSE 0 END AS handled, CASE WHEN hs.has_been_checked = 0 OR hs.has_been_checked IS NULL THEN 99 ELSE hs.current_state END AS host_state, sg.alias COLLATE latin1_general_ci AS servicegroup_alias, sgo.name1 AS servicegroup_name, ('service') AS object_type, CASE WHEN ss.current_state = 0 THEN CASE WHEN ss.has_been_checked = 0 OR ss.has_been_checked IS NULL THEN 16 ELSE 0 END + CASE WHEN ss.problem_has_been_acknowledged = 1 THEN 2 ELSE CASE WHEN ss.scheduled_downtime_depth > 0 THEN 1 ELSE 4 END END ELSE CASE WHEN ss.has_been_checked = 0 OR ss.has_been_checked IS NULL THEN 16 WHEN ss.current_state = 1 THEN 32 WHEN ss.current_state = 2 THEN 128 WHEN ss.current_state = 3 THEN 64 ELSE 256 END + CASE WHEN hs.current_state > 0 THEN 1024 ELSE CASE WHEN ss.problem_has_been_acknowledged = 1 THEN 512 ELSE CASE WHEN ss.scheduled_downtime_depth > 0 THEN 256 ELSE 2048 END END END END + CASE WHEN ss.state_type = 1 THEN 8 ELSE 0 END AS severity, CASE WHEN ss.has_been_checked = 0 OR ss.has_been_checked IS NULL THEN 99 ELSE ss.current_state END AS state, UNIX_TIMESTAMP(ss.last_state_change) AS state_change FROM icinga_objects AS so
    INNER JOIN icinga_services AS s ON s.service_object_id = so.object_id AND so.is_active = 1 AND so.objecttype_id = 2
    INNER JOIN icinga_hoststatus AS hs ON hs.host_object_id = s.host_object_id
    INNER JOIN icinga_servicestatus AS ss ON ss.service_object_id = so.object_id
    LEFT JOIN icinga_servicegroup_members AS sgm ON sgm.service_object_id = so.object_id
    LEFT JOIN icinga_servicegroups AS sg ON sgm.servicegroup_id = sg.servicegroup_id
    LEFT JOIN icinga_objects AS sgo ON sgo.object_id = sg.servicegroup_object_id AND sgo.is_active = 1 AND sgo.objecttype_id = 4 WHERE ( (sgo.name1 COLLATE latin1_general_ci LIKE 'thottst01' OR sg.alias COLLATE latin1_general_ci LIKE 'thottst01') ) GROUP BY s.service_id,
    so.object_id,
    hs.hoststatus_id,
    ss.servicestatus_id,
    sg.servicegroup_id,
    sgo.object_id) AS statussummary GROUP BY servicegroup_name,
    servicegroup_alias) AS t;

Is it possible to used table views to speed up enviroment?
We noticed too that increasing Mysql Load cause Icinga2 poor performance when reloading .... and Icinga2 start to get slower.

Javier R.

@icinga-migration
Copy link
Author

Updated by elippmann on 2015-10-01 21:12:25 +00:00

  • Target Version changed from 2.0.0 to 273

@icinga-migration
Copy link
Author

Updated by elippmann on 2015-11-20 13:10:52 +00:00

  • Target Version changed from 273 to Backlog

@icinga-migration
Copy link
Author

Updated by deneu on 2015-12-11 14:37:38 +00:00

Anything new to this or is this solved in any kind of release?

@icinga-migration
Copy link
Author

Updated by elippmann on 2016-02-16 10:54:53 +00:00

  • Duplicated set to 9982

@icinga-migration
Copy link
Author

Updated by Tux12Fun on 2016-03-09 15:30:48 +00:00

I can also notice a performance decrease with icinga2/icingaweb2.

My MySQL is flooding slow-query log because of lots of temp tables and full table scans.
See Bug #10738
Today I upgraded to icingaweb2 2.2.0 and can notice that the performance is getting more worse.
If you wish I can add cases with parts of the slow query log.

@icinga-migration
Copy link
Author

Updated by elippmann on 2016-03-09 15:40:44 +00:00

Hi,

From which version did you upgrade? I'm not quite sure whether we changed any of the queries recently.

Best regards,
Eric

@icinga-migration
Copy link
Author

Updated by mnardin on 2016-03-11 11:38:57 +00:00

Hi,
we are in the process of migrating to a multinode icinga2 setup. Now that we have started using this environment we are experiencing some major performance issues on the db side. I've noticed them navigating icingweb2-frontend. I've started monitoring some mysql metrics:

  • index usage is at 80%
  • temporary tables on disk exceeds 50% on large queries (history - event overview)
  • table cache hitrate is nearly 0

I don't know if the configuration on our end is the cause for this performance.

This is our my.cnf:

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0

# InnoDB settings
innodb-buffer-pool-size = 1G
innodb-file-per-table

[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

Best regards
Mirko

@icinga-migration
Copy link
Author

Updated by elippmann on 2016-10-05 11:24:40 +00:00

  • Duplicates set to 12732

@icinga-migration
Copy link
Author

Updated by elippmann on 2016-10-05 11:24:52 +00:00

  • Status changed from New to Closed
  • Target Version deleted Backlog

Closed in favor of #12732.

@icinga-migration icinga-migration added queue/important Blocks a release or needs immediate attention bug Something isn't working area/monitoring Affects the monitoring module labels Jan 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/monitoring Affects the monitoring module bug Something isn't working queue/important Blocks a release or needs immediate attention
Projects
None yet
Development

No branches or pull requests

1 participant