New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dev.icinga.com #10963] high load and memory consumption on icinga2 agent v2.4.1 #3835
Comments
Updated by mfriedrich on 2016-01-13 15:56:59 +00:00
|
Updated by elabedzki on 2016-01-14 09:07:43 +00:00 Our customer told me that memory leak now happend also on the masters. |
Updated by tgelf on 2016-01-14 18:25:57 +00:00 Blind guess: 957cf31 Line 54:
Cheers, |
Updated by elabedzki on 2016-01-14 19:07:54 +00:00 Hi Tom, tgelf wrote:
yes, i can't find a delete on that char buffer. You found it. Best |
Updated by mfriedrich on 2016-01-14 20:00:09 +00:00 Keep in mind that 1) the leak exists in 2.4.1 stable (that git commit is from master) 2) Base64 is only used for rest api auth, which isn't enabled on clients I guess there are more possible leaks, Valgrind will hopefully unveil them. |
Updated by elabedzki on 2016-01-14 20:02:41 +00:00 I am confident |
Updated by tgelf on 2016-01-14 23:13:17 +00:00 Didn't know that we cultivate more of them ;) What about lib/base/tlsutility.cpp, RandomString seems to be missing delete [] bytes if RAND_bytes succeeds. Not sure whether this happens often enough to result in a serious leak... |
Updated by tgelf on 2016-01-15 10:20:41 +00:00 Created a pull request for those mentioned above: #61 |
Updated by tgelf on 2016-01-15 10:30:05 +00:00 Race condition :D |
Updated by jflach on 2016-01-15 11:42:32 +00:00 tgelf wrote:
I won :D We still have to test whether this fixes the issue |
Updated by tobiasvdk on 2016-01-15 15:56:46 +00:00 I think there is still a memory leak. Here is a diff between two
|
Updated by gbeutner on 2016-01-18 07:11:41 +00:00 While these leaks are definitely bugs RandomString isn't used in any code paths that are reachable via the 'daemon' CLI command. Also, the changes for the base64 functions weren't introduced until after 2.4.1 was released. |
Updated by itbess on 2016-01-18 20:18:14 +00:00 We are running in the same problem on 2.3.11. in pmap over time it generates more and more of these
@ |
Updated by gbeutner on 2016-01-19 15:26:13 +00:00
|
Updated by tgelf on 2016-01-20 12:45:28 +00:00 Hi Tobias! tobiasvdk wrote:
Could you please give the latest snapshot (version: v2.4.1-116-g55f0c58, commit: 55f0c58) a try? We are not sure why, but that one mitigated the problems for us - no more memory leak, much less CPU load. Cheers, |
Updated by tgelf on 2016-01-20 13:08:15 +00:00 STOP :) In case you are using the Icinga Agent you should better wait a little bit - the current master introduced another issue, will be fixed immediately... |
Updated by tobiasvdk on 2016-01-21 15:24:53 +00:00 Still leaking with r2.4.1-123-g72c3b6d:
|
Updated by mfriedrich on 2016-01-22 14:57:56 +00:00
|
Updated by mfriedrich on 2016-01-22 16:31:21 +00:00
|
Updated by gbeutner on 2016-02-04 12:00:35 +00:00 @tobiasvdk: Can you retest this with the latest snapshot? |
Updated by tobiasvdk on 2016-02-04 12:21:56 +00:00
Only my master is leaking, the satellites are ok. In our config only the satellites are connecting to the master.
@Shroud: should I run some gdb commands? |
Updated by tobiasvdk on 2016-02-04 13:04:43 +00:00 Maybe it's because the database currently cannot handle the load:
I will deactivate the ido feature and test again. |
Updated by mfriedrich on 2016-02-04 15:20:35 +00:00 The query queue holds all the remaining updates, that would explain your problem. |
Updated by tobiasvdk on 2016-02-04 20:29:16 +00:00 dnsmichi wrote:
But the queue has a length of 500000 which was already reached. Were are the other results being held? I need to have a look into the code. |
Updated by mfriedrich on 2016-02-04 22:08:05 +00:00
|
Updated by gbeutner on 2016-02-10 07:10:33 +00:00 tobiasvdk: Once the WorkQueue's size limit is reached the Enqueue() method blocks - which generally means other parts of Icinga become unresponsive. I'm not really happy with this behavior but there really are only a few options:
|
Updated by tobiasvdk on 2016-02-10 15:23:37 +00:00 gunnarbeutner wrote:
Also good would be to allow multiple connections #10953 ;) |
Updated by vytenis on 2016-02-17 14:50:29 +00:00 We also noticed very bad behaviour with IDO queue and had to bump it to work in our setup, as 500k was not nearly enough (SSDs+mysql tuning alone is not sufficient for 100k+ object setups) - see #10731 - while blocking on Enqueue() does not lead to hard freeze as it used to be back in 2.4.0, it will still happen eventually if the DB cannot keep up beyond the initial query load. Naturally, the queries do take up a LOT more RAM than Icinga itself requires. :) TBH, the IDO could be a lot more efficient - there's like ~10 queries per monitored object that have to be executed - the recent changes in git master really reduced the runtime load, though, especially if you do not care about history. |
Updated by gbeutner on 2016-02-23 09:59:37 +00:00
|
Updated by gbeutner on 2016-02-23 09:59:53 +00:00
|
This issue has been migrated from Redmine: https://dev.icinga.com/issues/10963
Created by elabedzki on 2016-01-13 15:36:54 +00:00
Assignee: gbeutner
Status: Resolved (closed on 2016-02-23 09:59:37 +00:00)
Target Version: 2.4.2
Last Update: 2016-02-23 09:59:53 +00:00 (in Redmine)
Hi guys,
we noticed a high load and memory consumption problem with some icinga2 agents ( in version 2.4.1 ), isn't really clear what is going on behind the scenes.
One of our customer has a hugh setup, described as follows...
Has anyone similar problems on his setup?
The CPU load is about, along with the extremely high memory utilization.
Icinga2 eats up to 70% of 2GB RAM and generates a load of 12 on a one core system.
At the same time we noticed in the log file on the masters that all agents are trying to reconnect, as you can see:
[2016-01-13 11:42:59 +0100] warning/ApiListener: Removing API client for endpoint 'mon-icingamaster-02.lxprod.obsfucated.customer.domain'. 106 API clients left.
[2016-01-13 11:42:59 +0100] information/JsonRpcConnection: No messages for identity 'mon-icingamaster-02.lxprod.obsfucated.customer.domain' have been received in the last 60 seconds.
[2016-01-13 11:42:59 +0100] warning/JsonRpcConnection: API client disconnected for identity 'mon-icingamaster-02.lxprod.obsfucated.customer.domain'
[2016-01-13 11:42:59 +0100] warning/ApiListener: Removing API client for endpoint 'mon-icingamaster-02.lxprod.obsfucated.customer.domain'. 105 API clients left.
[2016-01-13 11:42:59 +0100] information/JsonRpcConnection: No messages for identity 'mon-icingamaster-02.lxprod.obsfucated.customer.domain' have been received in the last 60 seconds.
[2016-01-13 11:42:59 +0100] warning/JsonRpcConnection: API client disconnected for identity 'mon-icingamaster-02.lxprod.obsfucated.customer.domain'
[2016-01-13 11:42:59 +0100] warning/ApiListener: Removing API client for endpoint 'mon-icingamaster-02.lxprod.obsfucated.customer.domain'. 104 API clients left.
[2016-01-13 11:42:59 +0100] information/JsonRpcConnection: No messages for identity 'mon-icingamaster-02.lxprod.obsfucated.customer.domain' have been received in the last 60 seconds.
[2016-01-13 11:42:59 +0100] warning/JsonRpcConnection: API client disconnected for identity 'mon-icingamaster-02.lxprod.obsfucated.customer.domain'
[2016-01-13 11:42:59 +0100] warning/ApiListener: Removing API client for endpoint 'mon-icingamaster-02.lxprod.obsfucated.customer.domain'. 103 API clients left.
[2016-01-13 11:42:59 +0100] information/JsonRpcConnection: No messages for identity 'mon-icingamaster-02.lxprod.obsfucated.customer.domain' have been received in the last 60 seconds.
[2016-01-13 11:42:59 +0100] warning/JsonRpcConnection: API client disconnected for identity 'mon-icingamaster-02.lxprod.obsfucated.customer.domain'
[2016-01-13 11:42:59 +0100] warning/ApiListener: Removing API client for endpoint 'mon-icingamaster-02.lxprod.obsfucated.customer.domain'. 102 API clients left.
[2016-01-13 11:42:59 +0100] information/JsonRpcConnection: No messages for identity 'mon-icingamaster-02.lxprod.obsfucated.customer.domain' have been received in the last 60 seconds.
[2016-01-13 11:42:59 +0100] warning/JsonRpcConnection: API client disconnected for identity 'mon-icingamaster-02.lxprod.obsfucated.customer.domain'
[2016-01-13 11:42:59 +0100] warning/ApiListener: Removing API client for endpoint 'mon-icingamaster-02.lxprod.obsfucated.customer.domain'. 101 API clients left.
[2016-01-13 11:42:59 +0100] information/JsonRpcConnection: No messages for identity 'mon-icingamaster-02.lxprod.obsfucated.customer.domain' have been received in the last 60 seconds.
[2016-01-13 11:42:59 +0100] warning/JsonRpcConnection: API client disconnected for identity 'mon-icingamaster-02.lxprod.obsfucated.customer.domain'
[2016-01-13 11:42:59 +0100] warning/ApiListener: Removing API client for endpoint 'mon-icingamaster-02.lxprod.obsfucated.customer.domain'. 100 API clients left.
[2016-01-13 11:42:59 +0100] information/JsonRpcConnection: No messages for identity 'mon-icingamaster-02.lxprod.obsfucated.customer.domain' have been received in the last 60 seconds.
[2016-01-13 11:42:59 +0100] warning/JsonRpcConnection: API client disconnected for identity 'mon-icingamaster-02.lxprod.obsfucated.customer.domain'
[2016-01-13 11:42:59 +0100] warning/ApiListener: Removing API client for endpoint 'mon-icingamaster-02.lxprod.obsfucated.customer.domain'. 99 API clients left.
[2016-01-13 11:42:59 +0100] information/JsonRpcConnection: No messages for identity 'mon-icingamaster-02.lxprod.obsfucated.customer.domain' have been received in the last 60 seconds.
[2016-01-13 11:42:59 +0100] warning/JsonRpcConnection: API client disconnected for identity 'mon-icingamaster-02.lxprod.obsfucated.customer.domain'
[2016-01-13 11:42:59 +0100] warning/ApiListener: Removing API client for endpoint 'mon-icingamaster-02.lxprod.obsfucated.customer.domain'. 98 API clients left.
[2016-01-13 11:43:00 +0100] information/ApiListener: New client connection for identity 'mon-icingamaster-02.lxprod.obsfucated.customer.domain'
[2016-01-13 11:43:00 +0100] information/ApiListener: New client connection for identity 'mon-icingamaster-02.lxprod.obsfucated.customer.domain'
[2016-01-13 11:43:00 +0100] information/ApiListener: New client connection for identity 'mon-icingamaster-02.lxprod.obsfucated.customer.domain'
[2016-01-13 11:43:00 +0100] information/ApiListener: New client connection for identity 'mon-icingamaster-02.lxprod.obsfucated.customer.domain'
[2016-01-13 11:43:00 +0100] information/ApiListener: New client connection for identity 'mon-icingamaster-02.lxprod.obsfucated.customer.domain'
[2016-01-13 11:43:00 +0100] information/ApiListener: New client connection for identity 'mon-icingamaster-02.lxprod.obsfucated.customer.domain'
[2016-01-13 11:43:00 +0100] information/ApiListener: New client connection for identity 'mon-icingamaster-02.lxprod.obsfucated.customer.domain'
[2016-01-13 11:43:03 +0100] information/ApiListener: New client connection for identity 'mon-icingamaster-01.lxprod.obsfucated.customer.domain'
[2016-01-13 11:43:03 +0100] information/ApiListener: New client connection for identity 'mon-icingamaster-01.lxprod.obsfucated.customer.domain'
[2016-01-13 11:43:03 +0100] information/ApiListener: New client connection for identity 'mon-icingamaster-01.lxprod.obsfucated.customer.domain'
Anyone some ideas what's going on here?
Best
Enrico
Attachments
Changesets
2016-01-15 09:11:52 +00:00 by jflach cb70d97
2016-01-19 14:24:17 +00:00 by (unknown) d50c8e1
2016-01-19 15:24:07 +00:00 by (unknown) b1aa6cc
2016-01-19 15:24:12 +00:00 by (unknown) e4b7111
2016-01-19 15:43:46 +00:00 by (unknown) db0c6ef
2016-01-19 16:25:28 +00:00 by (unknown) 55f0c58
2016-01-20 13:07:07 +00:00 by (unknown) e48ed33
2016-01-21 09:37:47 +00:00 by (unknown) 72c3b6d
2016-01-21 12:02:53 +00:00 by (unknown) 6d88d90
2016-01-21 15:37:52 +00:00 by (unknown) 6ca054e
2016-02-12 13:15:24 +00:00 by mfriedrich 04a4049
2016-02-16 12:08:21 +00:00 by (unknown) 9e9298f
2016-02-23 08:57:40 +00:00 by jflach e80b335
2016-02-23 08:57:49 +00:00 by (unknown) abfacd9
2016-02-23 09:46:13 +00:00 by (unknown) badeea7
2016-02-23 09:46:17 +00:00 by (unknown) b227dc7
2016-02-23 09:46:17 +00:00 by (unknown) 087ad3f
2016-02-23 09:46:17 +00:00 by (unknown) 3cfa871
2016-02-23 09:46:18 +00:00 by (unknown) 80fdccc
2016-02-23 09:46:18 +00:00 by (unknown) 7985e93
2016-02-23 09:46:18 +00:00 by (unknown) c415dd3
2016-02-23 09:46:18 +00:00 by (unknown) fc90265
2016-02-23 09:46:19 +00:00 by mfriedrich f6378c9
2016-02-23 09:46:19 +00:00 by (unknown) c998665
Relations:
The text was updated successfully, but these errors were encountered: