[dev.icinga.com #4958] icinga 1.10 breaks usage of mod_gearman / icinga crash on startup #1373
Comments
Updated by mfriedrich on 2013-10-26 09:24:52 +00:00
looks like strdup on NULL within get_results() in mod_gearman itsself. anyways, where are these packages from? i don't recall any icinga-mod_gearman package in the repos. did you only upgrade icinga 1.9 to 1.10 or was there any gearman related upgrade involved as well (check zypp log). |
Updated by mfriedrich on 2013-10-26 09:25:31 +00:00
|
Updated by netmax on 2013-10-26 09:45:06 +00:00 the packages are build by myself, you can get them here: The upgrade was only from icinga 1.9 to 1.10, gearman was not changed. |
Updated by mfriedrich on 2013-10-26 11:15:53 +00:00 ok. i don't have any sles11 around where i might just stash those packages into. |
Updated by netmax on 2013-10-26 11:45:08 +00:00 I use a local open buildservice instance for managing the package builds. My icinga-mod_gearman package includes a "BuildRequires: icinga" which is the reason, To get somewhat further i diffed release 1.9.3 against 1.10.0 sources and found some changes in broker.c, i'm not sure if these are related to my problem, maybe you can say something about these changes?
|
Updated by mfriedrich on 2013-10-26 12:15:00 +00:00 git blame or changelog will unveil that those changes were done with #4709 and #4754 but do not interfere with existing broker modules - the additional fields in the nebstructs/objects were added at the end where old modules (those compiled against nagios headers) would never recognize them due to compiler tricks with object casts. event broker modules shouldn't touch nor use any of the functions in broker.c - they rather subscribe themselves to callbacks using "NEBCALLBACK_" as prefix identifier. maybe it's possible for you to get an unstripped binary/module with exported symbols in order to identify the root cause when looking for backtrace variables on the segfault. |
Updated by netmax on 2013-10-26 13:51:14 +00:00 I now got a bit further. I setup a clean system with the same packages and a basic configuration, with one host and one service to check. # this line in icinga.cfg crashes icinga
# commenting out broker_module line above nad activate mod_gearman.cfg in modules dir works:
I got that working because gearmand was not running, when starting up icinga. So mod_gearman gets loaded on icinga startup without connecting to gearmand. After starting up gearmand the mod_gearman neb creates a check_results worker,
If needed i can give access to the test system. |
Updated by netmax on 2013-10-26 14:07:33 +00:00 another backtrace with more informations:
at neb_module/result_thread.c:177
|
Updated by netmax on 2013-10-26 14:09:52 +00:00 the bt seems to differ a bit on every try:
at neb_module/result_thread.c:203
|
Updated by mfriedrich on 2013-10-26 15:31:31 +00:00 that seems to happen only with mod_gearman. i've tested both ways of adding idomod as a neb broker module (module object and broker_module) and it's working. though i don't have any mod_gearman install here (yet) to quickly get an insight what's going on. still, i would be interested which mod_gearman was working with icinga 1.9.x previously. |
Updated by netmax on 2013-10-26 15:43:04 +00:00 i just edited my post above, because i noticed that gearmand needs to running, to crash the icinga process, The mod_gearman version was the same when running icinga 1.9.x |
Updated by mfriedrich on 2013-10-26 16:10:20 +00:00 a quick shot on debian jessie with 1.4.10 from packages just unveils yet another problem i am not really keen on debugging (worker error: gearman_worker_grab_job(GEARMAN_UNEXPECTED_PACKET) but in terms of starting up both ways (module object and broker_modules) do work for me with mod_gearman even. do you have some sort of protection mechanism running, especially for copying files to /tmp and loading them from over there? that was a reverted change introduced in 1.10 in order to support multiple module objects at the same time. though it does not make any sense in regards of the memory corruption at all. how does idomod perform at your place? any errors? not sure how to proceed here though. i've seen that you've enable debugging in your spec file. might be worth a shot to disable it to see whether behaviour changes or not. other than that, valgrind may unveil possible memory leaks and corruption too. |
Updated by netmax on 2013-10-26 16:23:16 +00:00 there are no protection mechanism running on my system, i noticed that there are stale icinganebmod* files in /tmp. I think the are there because of the crashes:
I'm not using idomod on both systems right now, but i can test this. The debug option is just enabled for now (enabled today) to get some more details for this problem, in general it's disabled. Which gearman version did you use in your test? |
Updated by netmax on 2013-10-27 09:29:25 +00:00 I ran icinga through valgrind over the last night, it keeps "running" and seems to work, but gives following output:
|
Updated by mfriedrich on 2013-10-28 09:27:43 +00:00 Hm. I don't have a working mod_gearman setup but what comes to mind at last - mod_gearman manipulates core memory (which is a violation of the neb api even if the author claims otherwise) in order to merge checkresult objects into the core's checkresult list for later processing. it could be that the check source attribute causes irritations here. maybe you'll revert |
Updated by netmax on 2013-10-28 11:49:34 +00:00 I reverted those patches and it seems to work stable without them. The only files i can't revert are these, but not relevant for testing: So what could be the final solution? |
Updated by mfriedrich on 2013-10-28 11:54:38 +00:00
thanks for the fast feedback. solution for 1.10.1 - keeping check_source within idoutils schema and classic ui for icinga 2 only, and reverting the feature for icinga core 1.x due to mod_gearman touching the checkresult lists in memory. it was just an idea to support that feature in icinga core 1.x but if addons prevent innovative features it's just yet another argument for icinga 2 as rewrite from scratch. i might come up with a cleaner revert patch, i'd be happy if you can test that one (should be doable after work at home hopefully). |
Updated by mfriedrich on 2013-10-28 19:37:40 +00:00 a quick & dirty revert done in ~30min - please test, i've squashed all the involved commits into one diff restoring the functionality for idoutils schema and classic ui. https://git.icinga.org/?p=icinga-core.git;a=shortlog;h=refs/heads/fix/mod-gearman-check-source-4958 |
Updated by netmax on 2013-10-28 20:33:02 +00:00 Works as expected, tested on three setups, with and without idoutils. |
Updated by mfriedrich on 2013-10-28 20:54:19 +00:00
wow, you're fast, thanks. merged to support/1.10 and scheduled for 1.10.1 soon. resolving here. |
Updated by mcp on 2013-10-29 20:04:36 +00:00 works for me too. |
Updated by mcp on 2013-10-29 22:36:06 +00:00
Moin Michael, you forgot 2 left-over check_source things for extinfo.cgi in eca694a ;) attached patch removes them. |
Updated by mfriedrich on 2013-10-29 22:46:38 +00:00 they'll stay as icinga 2 makes use of this feature - so the leftover was intended, but thanks for having a closer look. if the backend provides the host/service status.dat objects with 'check_source' it will be read by the cgis and presented to the viewer. for 1.x there's no chance to have that feature implemented unless mod_gearman would stop manipulating inner core structures. in 2.x we do have all the functionality enabled and implemented for clustering and setting the check_source attribute correctly for the instance executing the check - therefore this is a nice gimmick for everyone to play with icinga 2 clustering provided with 0.0.3 |
Updated by mfriedrich on 2013-10-31 11:32:09 +00:00 btw - it's not only caused by check result themselves getting added to the core's checkresult list, but could also be affected by the fake orphaned check results generated to mark a check being orphaned. https://github.com/sni/mod\_gearman/blob/master/neb\_module/mod\_gearman.c#L625 |
This issue has been migrated from Redmine: https://dev.icinga.com/issues/4958
Created by netmax on 2013-10-26 08:14:18 +00:00
Assignee: mfriedrich
Status: Resolved (closed on 2013-10-28 20:54:19 +00:00)
Target Version: 1.10.1
Last Update: 2013-10-31 11:32:09 +00:00 (in Redmine)
After upgrading my installation from 1.9 to 1.10 with enabled mod_gearman, icinga crashes on startup.
When i disable the load of mod_gearman, icinga starts up normally.
I collected these informations from my system and will also open a bug report for mod_gearman with the same informations:
(gdb) bt
(gdb) quit
moni:
> rpm -qf /usr/lib64/libgearman.so.6> rpm -q icingagearmand-0.25-6.2
moni:
icinga-1.10.0-1.2
moni:
> rpm -q icinga-mod_gearman> cat /etc/SuSE-releaseicinga-mod_gearman-1.4.10-4.7
moni:
SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 2
Attachments
Changesets
2013-10-28 19:35:50 +00:00 by (unknown) eca694a
The text was updated successfully, but these errors were encountered: