New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dev.icinga.com #11390] Command pipe overloaded: Can't send external Icinga command to the local command file #4037
Comments
Updated by critical on 2016-03-15 17:54:36 +00:00 Versions:
|
Updated by critical on 2016-03-15 18:42:52 +00:00 Logs
|
Updated by critical on 2016-03-15 18:57:02 +00:00 https://dev.icinga.org/issues/8815: Already have this patch. |
Updated by critical on 2016-03-15 21:41:22 +00:00 Can someone move this to icinga2? I've applied the patches from 0a6505c#diff-41f5f9b62fd89e63b82e66bbe76c0e73 to my 2.3.11 release and still I am having problems. After reviewing the code further I see some implementation problems. Specifically, https://github.com/Icinga/icinga2/blob/master/lib/compat/externalcommandlistener.cpp#L120, there is a break command for all return codes of sock->Read(...). This should only occur if errno != EAGAIN because we are using non blocking resources.
After applying these fixes and running my tests I have not seen the pipe unexpectedly close. Some extra work might be required in base/socket.cpp Read(...) to decrease log spam, these shouldn't be critical:
Doing more testing. |
Updated by mfriedrich on 2016-03-16 09:16:50 +00:00
First, thanks for all the comments. I would suggest though to get a test vm with 2.4.x+ allowing you to develop and test changes. We'll happily review and test your patch then. |
Updated by critical on 2016-03-16 14:59:18 +00:00 Not familiar with centos or vagrant. Can we spin up a Debian 8.1 i386 minimal rather than using icinga-vagrant? |
Updated by mfriedrich on 2016-03-16 15:08:21 +00:00 If you are developing and compiling stuff it is fairly up to you which distribution you prefer :) |
Updated by critical on 2016-03-16 17:36:50 +00:00 VM Info
Testing
OUTPUT:
LOG:
Note: Seems to fail both on rc < 0 and rc == 0 when it shouldn't [1].
LOG:
A way to verify that all data has passed through the pipe when this situation occurs. References |
Updated by mfriedrich on 2016-03-18 10:10:40 +00:00
Cool, thanks. Please let us know when you've got a final patch :) |
Updated by tgelf on 2016-04-06 10:35:34 +00:00 Probably same problem here, but running still 2.4.3. Not so funny for automation, as commands tend to "randomly fail" based on the number of commands sent through the pipe. Cheers, |
Updated by mfriedrich on 2016-04-06 15:33:56 +00:00 Do you have any updates from your side? :) Thanks. |
Updated by mfriedrich on 2016-04-07 08:26:36 +00:00
|
Updated by elippmann on 2016-04-11 08:54:21 +00:00
|
Updated by gbeutner on 2016-04-20 16:35:41 +00:00
|
Updated by gbeutner on 2016-04-21 07:47:00 +00:00
|
Updated by mfriedrich on 2016-04-21 15:41:17 +00:00 Hm, I'm looking into a reasonably easy way to reproduce the issue. Putting my Icinga 2 box under stress level (10k hosts, 100k services) and then firing that small script does not give any errors at the moment. Any hints or scripts which would help tackle the issue? Thanks.
|
Updated by critical on 2016-04-21 16:17:08 +00:00 Unfortunately I have no updates on my end - I have been using the patch I have provided to mitigate this issue. Would this bash script fail if the CMDFILE did not exist? Or would it write a file instead? Could you try using a C alternative?
Also, what are the specs of the machine you are using? When I am executing these commands my machine's CPU usage (icinga2 and MySQL) raises to 80-90%. Could you try these tests on a VM where you can limit the CPU to 1 or 2 cores with 1GB memory? |
Updated by mfriedrich on 2016-05-11 09:10:04 +00:00
Applied in changeset a529725. |
Updated by mfriedrich on 2016-05-11 09:18:17 +00:00
|
Updated by mfriedrich on 2016-05-11 09:34:36 +00:00 FYI - the tests were run on my macbook pro (early 2015, i5, 8 gb ram) taking the resources for the icinga2 and mariadb process when adding 10k hosts and 100k services. At some point the database might be blocking, but all tests were fired before that point. Your patch is reasonable (checking EAGAIN as well as rc=0 continue) so we applied it. Please test the git master / snapshot packages :) |
This issue has been migrated from Redmine: https://dev.icinga.com/issues/11390
Created by critical on 2016-03-15 17:47:41 +00:00
Assignee: mfriedrich
Status: Resolved (closed on 2016-05-11 09:10:04 +00:00)
Target Version: 2.4.8
Last Update: 2016-05-11 09:34:36 +00:00 (in Redmine)
I have a custom icingaweb2 module that uses the icinga php command pipe library to schedule dynamic downtime in icinga2.
Before I schedule the downtime (above) I have to make sure that all other scheduled downtimes are cleared. Without knowing the downtime ID's I have to make a query:
And then use the command pipe library:
So for each host that is currently in downtime I have two commands to send (clear service and host downtime), and if the host requires downtime scheduling I need one more. So with 100 hosts I have (at worst) 300 commands to send. When this happens a few of my `$this~~transport~~>send($cmd);` sometimes fail with:
But the pipe exists prior and after the crash. It seems to get overloaded with data, close, and then reopen.
Helpful info:
Changesets
2016-05-11 09:04:28 +00:00 by mfriedrich a529725
2016-05-12 09:11:02 +00:00 by mfriedrich b39634d
Relations:
The text was updated successfully, but these errors were encountered: