Skip to content
This repository has been archived by the owner on Jan 15, 2019. It is now read-only.

[dev.icinga.com #4958] icinga 1.10 breaks usage of mod_gearman / icinga crash on startup #1373

Closed
icinga-migration opened this issue Oct 26, 2013 · 24 comments

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/4958

Created by netmax on 2013-10-26 08:14:18 +00:00

Assignee: mfriedrich
Status: Resolved (closed on 2013-10-28 20:54:19 +00:00)
Target Version: 1.10.1
Last Update: 2013-10-31 11:32:09 +00:00 (in Redmine)

Icinga Version: 1.10.0
OS Version: SUSE Linux Enterprise 11 SP2

After upgrading my installation from 1.9 to 1.10 with enabled mod_gearman, icinga crashes on startup.
When i disable the load of mod_gearman, icinga starts up normally.

I collected these informations from my system and will also open a bug report for mod_gearman with the same informations:

moni:~> /usr/sbin/icinga /etc/icinga/icinga.cfg

Icinga 1.10.0
Copyright (c) 2009-2013 Icinga Development Team (http://www.icinga.org)
Copyright (c) 2009-2013 Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad                                          
Last Modified: 10-24-2013                                                      
License: GPL                                                                   

Icinga 1.10.0 starting... (PID=20930)
Local time is Sat Oct 26 10:06:31 CEST 2013
*** glibc detected *** /usr/sbin/icinga: malloc(): memory corruption: 0x000000000075ef50 ***
======= Backtrace: =========                                                                
/lib64/libc.so.6(+0x76518)[0x7fc5fc996518]                                                  
/lib64/libc.so.6(+0x794cf)[0x7fc5fc9994cf]                                                  
/lib64/libc.so.6(__libc_malloc+0x77)[0x7fc5fc99b5a7]                                        
/lib64/libc.so.6(__strdup+0x22)[0x7fc5fc9a06e2]                                             
/tmp/icinganebmodWKFAg1(get_results+0x263)[0x7fc5fc47c406]                                  
/usr/lib64/libgearman.so.6(_ZN10FunctionV18callbackEP14gearman_job_stPv+0x53)[0x7fc5fc25d933]
/usr/lib64/libgearman.so.6(gearman_worker_work+0x105)[0x7fc5fc263685]                        
/tmp/icinganebmodWKFAg1(result_worker+0x90)[0x7fc5fc47c9c8]                                  
/lib64/libpthread.so.0(+0x77b6)[0x7fc5fcea87b6]                                              
/lib64/libc.so.6(clone+0x6d)[0x7fc5fc9f9c5d]                                                 
======= Memory map: ========                                                                 
00400000-004a7000 r-xp 00000000 ca:02 1862190                            /usr/sbin/icinga    
006a6000-006a7000 r--p 000a6000 ca:02 1862190                            /usr/sbin/icinga    
006a7000-006a8000 rw-p 000a7000 ca:02 1862190                            /usr/sbin/icinga    
006a8000-00838000 rw-p 00000000 00:00 0                                  [heap]              
7fc5f4000000-7fc5f4031000 rw-p 00000000 00:00 0                                              
7fc5f4031000-7fc5f8000000 ---p 00000000 00:00 0                                              
7fc5fa921000-7fc5fa922000 ---p 00000000 00:00 0                                              
7fc5fa922000-7fc5fb122000 rw-p 00000000 00:00 0                                              
7fc5fb122000-7fc5fb123000 ---p 00000000 00:00 0                                              
7fc5fb123000-7fc5fb923000 rw-p 00000000 00:00 0                                              
7fc5fb923000-7fc5fb92b000 r-xp 00000000 ca:02 409040                     /lib64/librt-2.11.3.so
7fc5fb92b000-7fc5fbb2a000 ---p 00008000 ca:02 409040                     /lib64/librt-2.11.3.so
7fc5fbb2a000-7fc5fbb2b000 r--p 00007000 ca:02 409040                     /lib64/librt-2.11.3.so
7fc5fbb2b000-7fc5fbb2c000 rw-p 00008000 ca:02 409040                     /lib64/librt-2.11.3.so
7fc5fbb2c000-7fc5fbb30000 r-xp 00000000 ca:02 408852                     /lib64/libuuid.so.1.3.0
7fc5fbb30000-7fc5fbd2f000 ---p 00004000 ca:02 408852                     /lib64/libuuid.so.1.3.0
7fc5fbd2f000-7fc5fbd30000 r--p 00003000 ca:02 408852                     /lib64/libuuid.so.1.3.0
7fc5fbd30000-7fc5fbd31000 rw-p 00004000 ca:02 408852                     /lib64/libuuid.so.1.3.0
7fc5fbd31000-7fc5fbd46000 r-xp 00000000 ca:02 408849                     /lib64/libgcc_s.so.1   
7fc5fbd46000-7fc5fbf45000 ---p 00015000 ca:02 408849                     /lib64/libgcc_s.so.1   
7fc5fbf45000-7fc5fbf46000 r--p 00014000 ca:02 408849                     /lib64/libgcc_s.so.1   
7fc5fbf46000-7fc5fbf47000 rw-p 00015000 ca:02 408849                     /lib64/libgcc_s.so.1   
7fc5fbf47000-7fc5fc032000 r-xp 00000000 ca:02 1856587                    /usr/lib64/libstdc++.so.6.0.16
7fc5fc032000-7fc5fc232000 ---p 000eb000 ca:02 1856587                    /usr/lib64/libstdc++.so.6.0.16
7fc5fc232000-7fc5fc23a000 r--p 000eb000 ca:02 1856587                    /usr/lib64/libstdc++.so.6.0.16
7fc5fc23a000-7fc5fc23c000 rw-p 000f3000 ca:02 1856587                    /usr/lib64/libstdc++.so.6.0.16
7fc5fc23c000-7fc5fc251000 rw-p 00000000 00:00 0                                                        
7fc5fc251000-7fc5fc26b000 r-xp 00000000 ca:02 1863376                    /usr/lib64/libgearman.so.6.0.0
7fc5fc26b000-7fc5fc46a000 ---p 0001a000 ca:02 1863376                    /usr/lib64/libgearman.so.6.0.0
7fc5fc46a000-7fc5fc46b000 r--p 00019000 ca:02 1863376                    /usr/lib64/libgearman.so.6.0.0
7fc5fc46b000-7fc5fc46c000 rw-p 0001a000 ca:02 1863376                    /usr/lib64/libgearman.so.6.0.0
7fc5fc46c000-7fc5fc489000 r-xp 00000000 ca:02 1259143                    /tmp/icinganebmodWKFAg1 (deleted)
7fc5fc489000-7fc5fc688000 ---p 0001d000 ca:02 1259143                    /tmp/icinganebmodWKFAg1 (deleted)
7fc5fc688000-7fc5fc689000 r--p 0001c000 ca:02 1259143                    /tmp/icinganebmodWKFAg1 (deleted)
7fc5fc689000-7fc5fc68a000 rw-p 0001d000 ca:02 1259143                    /tmp/icinganebmodWKFAg1 (deleted)
7fc5fc68a000-7fc5fc71c000 rw-p 00000000 00:00 0                                                           
7fc5fc71c000-7fc5fc71e000 r-xp 00000000 ca:02 409004                     /lib64/libdl-2.11.3.so           
7fc5fc71e000-7fc5fc91e000 ---p 00002000 ca:02 409004                     /lib64/libdl-2.11.3.so           
7fc5fc91e000-7fc5fc91f000 r--p 00002000 ca:02 409004                     /lib64/libdl-2.11.3.so           
7fc5fc91f000-7fc5fc920000 rw-p 00003000 ca:02 409004                     /lib64/libdl-2.11.3.so           
7fc5fc920000-7fc5fca8d000 r-xp 00000000 ca:02 408842                     /lib64/libc-2.11.3.so            
7fc5fca8d000-7fc5fcc8d000 ---p 0016d000 ca:02 408842                     /lib64/libc-2.11.3.so            
7fc5fcc8d000-7fc5fcc91000 r--p 0016d000 ca:02 408842                     /lib64/libc-2.11.3.so            
7fc5fcc91000-7fc5fcc92000 rw-p 00171000 ca:02 408842                     /lib64/libc-2.11.3.so            
7fc5fcc92000-7fc5fcc97000 rw-p 00000000 00:00 0                                                           
7fc5fcc97000-7fc5fcca0000 r-xp 00000000 ca:02 1859485                    /usr/lib64/libltdl.so.7.2.0      
7fc5fcca0000-7fc5fce9f000 ---p 00009000 ca:02 1859485                    /usr/lib64/libltdl.so.7.2.0      
7fc5fce9f000-7fc5fcea0000 r--p 00008000 ca:02 1859485                    /usr/lib64/libltdl.so.7.2.0      
7fc5fcea0000-7fc5fcea1000 rw-p 00009000 ca:02 1859485                    /usr/lib64/libltdl.so.7.2.0      
7fc5fcea1000-7fc5fceb8000 r-xp 00000000 ca:02 409037                     /lib64/libpthread-2.11.3.so      
7fc5fceb8000-7fc5fd0b8000 ---p 00017000 ca:02 409037                     /lib64/libpthread-2.11.3.so      
7fc5fd0b8000-7fc5fd0b9000 r--p 00017000 ca:02 409037                     /lib64/libpthread-2.11.3.so      
7fc5fd0b9000-7fc5fd0ba000 rw-p 00018000 ca:02 409037                     /lib64/libpthread-2.11.3.so      
7fc5fd0ba000-7fc5fd0be000 rw-p 00000000 00:00 0                                                           
7fc5fd0be000-7fc5fd119000 r-xp 00000000 ca:02 409005                     /lib64/libm-2.11.3.so            
7fc5fd119000-7fc5fd318000 ---p 0005b000 ca:02 409005                     /lib64/libm-2.11.3.so            
7fc5fd318000-7fc5fd319000 r--p 0005a000 ca:02 409005                     /lib64/libm-2.11.3.so            
7fc5fd319000-7fc5fd337000 rw-p 0005b000 ca:02 409005                     /lib64/libm-2.11.3.so            
7fc5fd337000-7fc5fd356000 r-xp 00000000 ca:02 408807                     /lib64/ld-2.11.3.so              
7fc5fd544000-7fc5fd548000 rw-p 00000000 00:00 0                                                           
7fc5fd551000-7fc5fd555000 rw-p 00000000 00:00 0                                                           
7fc5fd555000-7fc5fd556000 r--p 0001e000 ca:02 408807                     /lib64/ld-2.11.3.so              
7fc5fd556000-7fc5fd557000 rw-p 0001f000 ca:02 408807                     /lib64/ld-2.11.3.so              
7fc5fd557000-7fc5fd558000 rw-p 00000000 00:00 0                                                           
7fffe37fb000-7fffe385e000 rw-p 00000000 00:00 0                          [stack]                          
7fffe3919000-7fffe391a000 r-xp 00000000 00:00 0                          [vdso]                           
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]                       
Aborted (core dumped)                                                                                     
moni:~> gdb /usr/sbin/icinga core                                                                         
GNU gdb (GDB) SUSE (7.3-0.6.1)                                                                            
Copyright (C) 2011 Free Software Foundation, Inc.                                                         
License GPLv3+: GNU GPL version 3 or later                              
This is free software: you are free to change and redistribute it.                                        
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"                                
and "show warranty" for details.                                                                          
This GDB was configured as "x86_64-suse-linux".                                                           
For bug reporting instructions, please see:                                                               
...                                                                
Reading symbols from /usr/sbin/icinga...(no debugging symbols found)...done.                              
[New LWP 20932]                                                                                           
[New LWP 20930]                                                                                           
[New LWP 20931]                                                                                           
Missing separate debuginfo for /lib64/libm.so.6                                                           
Try: zypper install -C "debuginfo(build-id)=e05f2e72f47391363a03eff3cde10ad4c007c045"                     
Missing separate debuginfo for /lib64/libpthread.so.0                                                     
Try: zypper install -C "debuginfo(build-id)=09dae90d04b1e2e43758ce58845f026b4085aec9"                     
Missing separate debuginfo for /usr/lib64/libltdl.so.7                                                    
Try: zypper install -C "debuginfo(build-id)=e6815d55401bd7a3b46768eec5c5eb36a244d44f"                     
Missing separate debuginfo for /lib64/libc.so.6                                                           
Try: zypper install -C "debuginfo(build-id)=469835eb7eeb4aa3f653537c36a72810e01fb602"                     
Missing separate debuginfo for /lib64/ld-linux-x86-64.so.2                                                
Try: zypper install -C "debuginfo(build-id)=181860a35c8e9a0456dd6675f85c2eb0f062e956"                     
Missing separate debuginfo for /lib64/libdl.so.2                                                          
Try: zypper install -C "debuginfo(build-id)=3e4f6bfee9fdf77ca975b77b8c325347d9228bb8"                     
Missing separate debuginfo for /tmp/icinganebmodWKFAg1                                                    
Try: zypper install -C "debuginfo(build-id)=753677138bcc35af19ea7f4594361195b00c92ca"                     
Missing separate debuginfo for /usr/lib64/libgearman.so.6                                                 
Try: zypper install -C "debuginfo(build-id)=7275e9bf887b5fff32cf3f8103ef7b7eac420a79"                     
Missing separate debuginfo for /usr/lib64/libstdc++.so.6                                                  
Try: zypper install -C "debuginfo(build-id)=3915e6988dbdfc8ebe704efa2e5e5d519c027f7b"                     
Missing separate debuginfo for /lib64/libgcc_s.so.1                                                       
Try: zypper install -C "debuginfo(build-id)=fe7c25bfb3e605f9d6c1cb00b3c5f96ed95be6e5"                     
Missing separate debuginfo for /lib64/libuuid.so.1                                                        
Try: zypper install -C "debuginfo(build-id)=f9998d18c497b2047b48e0702daa82204e2a944d"                     
Missing separate debuginfo for /lib64/librt.so.1                                                          
Try: zypper install -C "debuginfo(build-id)=44612b93c19e6567318299411987b113d2387081"                     
Missing separate debuginfo for                                                                            
Try: zypper install -C "debuginfo(build-id)=181e3dd540f2f7e962137d37ed678a7f63ed6189"                     
Missing separate debuginfo for /lib64/libm.so.6                                                           
Try: zypper install -C "debuginfo(build-id)=e05f2e72f47391363a03eff3cde10ad4c007c045"                     
Missing separate debuginfo for /lib64/libpthread.so.0                                                     
Try: zypper install -C "debuginfo(build-id)=09dae90d04b1e2e43758ce58845f026b4085aec9"                     
[Thread debugging using libthread_db enabled]                                                             
Missing separate debuginfo for /usr/lib64/libltdl.so.7                                                    
Try: zypper install -C "debuginfo(build-id)=e6815d55401bd7a3b46768eec5c5eb36a244d44f"                     
Missing separate debuginfo for /lib64/libc.so.6                                                           
Try: zypper install -C "debuginfo(build-id)=469835eb7eeb4aa3f653537c36a72810e01fb602"                     
Missing separate debuginfo for /lib64/ld-linux-x86-64.so.2                                                
Try: zypper install -C "debuginfo(build-id)=181860a35c8e9a0456dd6675f85c2eb0f062e956"                     
Missing separate debuginfo for /lib64/libdl.so.2                                                          
Try: zypper install -C "debuginfo(build-id)=3e4f6bfee9fdf77ca975b77b8c325347d9228bb8"                     
Missing separate debuginfo for /usr/lib64/libstdc++.so.6                                                  
Try: zypper install -C "debuginfo(build-id)=3915e6988dbdfc8ebe704efa2e5e5d519c027f7b"                     
Missing separate debuginfo for /lib64/libgcc_s.so.1                                                       
Try: zypper install -C "debuginfo(build-id)=fe7c25bfb3e605f9d6c1cb00b3c5f96ed95be6e5"
Missing separate debuginfo for /lib64/libuuid.so.1
Try: zypper install -C "debuginfo(build-id)=f9998d18c497b2047b48e0702daa82204e2a944d"
Missing separate debuginfo for /lib64/librt.so.1
Try: zypper install -C "debuginfo(build-id)=44612b93c19e6567318299411987b113d2387081"
Core was generated by `/usr/sbin/icinga /etc/icinga/icinga.cfg'.
Program terminated with signal 6, Aborted.

#0  0x00007fc5fc952b35 in raise () from /lib64/libc.so.6

(gdb) bt

#0  0x00007fc5fc952b35 in raise () from /lib64/libc.so.6
#1  0x00007fc5fc954111 in abort () from /lib64/libc.so.6
#2  0x00007fc5fc990def in __libc_message () from /lib64/libc.so.6
#3  0x00007fc5fc996518 in malloc_printerr () from /lib64/libc.so.6
#4  0x00007fc5fc9994cf in _int_malloc () from /lib64/libc.so.6
#5  0x00007fc5fc99b5a7 in malloc () from /lib64/libc.so.6
#6  0x00007fc5fc9a06e2 in strdup () from /lib64/libc.so.6
#7  0x00007fc5fc47c406 in ?? ()
#8  0x00007fc5fc485be2 in ?? ()
#9  0x00007fc5fc485bd4 in ?? ()
#10 0x00007fc5fc485bc0 in ?? ()
#11 0x00007fc5fc485bb6 in ?? ()
#12 0x00000000007ff610 in ?? ()
#13 0x000000000075ef08 in ?? ()
#14 0x000000000075ef18 in ?? ()
#15 0x0000000100000000 in ?? ()
#16 0x0000000000000000 in ?? ()

(gdb) quit
moni:> rpm -qf /usr/lib64/libgearman.so.6
gearmand-0.25-6.2
moni:
> rpm -q icinga
icinga-1.10.0-1.2
moni:> rpm -q icinga-mod_gearman
icinga-mod_gearman-1.4.10-4.7
moni:
> cat /etc/SuSE-release
SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 2

Attachments

Changesets

2013-10-28 19:35:50 +00:00 by (unknown) eca694a

core/idoutils: revert check_source attribute due to mod_gearman manipulating in-memory checkresult list

mod_gearman uses the old nagios headers where check_source is not
available. while using object compiler tricks with attribute at the end
will work on direct object casts, this does not work with a neb broker
addon which manipulates the doubly-linked check result list in memory
stashing objects of different type altogether causing memory corruption.

while the "thing" mod_gearman does in core memory remains a clear
violation of the neb api ("subscribe to a neb callback and do stuff")
it's just yet another proof that the neb api is not a safe place for
innovative core features at all.

reverting the check_source feature in 1.x for the sake of compatibility
and only using 'check_source' in Icinga 2 as additional attribute.

(been saying that for 4+ years now that core memory manipulation by addons
is a bad thing, but noone ever believed me)

refs #4958
@icinga-migration
Copy link
Author

Updated by mfriedrich on 2013-10-26 09:24:52 +00:00

  • Status changed from New to Feedback
  • Priority changed from High to Normal

looks like strdup on NULL within get_results() in mod_gearman itsself.
https://github.com/sni/mod\_gearman/blob/master/neb\_module/result\_thread.c#L83

anyways, where are these packages from? i don't recall any icinga-mod_gearman package in the repos.

did you only upgrade icinga 1.9 to 1.10 or was there any gearman related upgrade involved as well (check zypp log).

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2013-10-26 09:25:31 +00:00

  • Description updated

@icinga-migration
Copy link
Author

Updated by netmax on 2013-10-26 09:45:06 +00:00

the packages are build by myself, you can get them here:
http://download.obs.j0ke.net/server:/monitoring:/branches:/gearman:/0.25/SLE\_11\_SP2/

The upgrade was only from icinga 1.9 to 1.10, gearman was not changed.
The update included a rebuilded version of mod_gearman against the new icinga version.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2013-10-26 11:15:53 +00:00

ok. i don't have any sles11 around where i might just stash those packages into.
why did you rebuild the mod_gearman package? i thought that mod_gearman upstream just uses the nagios header files and won't need any change when a new icinga version is installed. is there a version upgrade involved with mod_gearman?

@icinga-migration
Copy link
Author

Updated by netmax on 2013-10-26 11:45:08 +00:00

I use a local open buildservice instance for managing the package builds.

My icinga-mod_gearman package includes a "BuildRequires: icinga" which is the reason,
why a rebuilded version of icinga-mod_gearman gets build and installed.
This is just a clean dependency path.

To get somewhat further i diffed release 1.9.3 against 1.10.0 sources and found some changes in broker.c, i'm not sure if these are related to my problem, maybe you can say something about these changes?

--- icinga19/icinga-1.9.3/base/broker.c 2013-07-07 17:50:50.000000000 +0200                                           
+++ icinga110/icinga-1.10.0/base/broker.c       2013-10-22 11:40:32.000000000 +0200                                   
@@ -119,6 +119,10 @@                                                                                                  

 /* send log data to broker */                                                                                        
 void broker_log_data(int type, int flags, int attr, char *data, unsigned long data_type, time_t entry_time, struct timeval *timestamp) {
+       broker_log_data_with_host_service(type, flags, attr, data, data_type, entry_time, timestamp, NULL, NULL);                        
+}                                                                                                                                       
+                                                                                                                                        
+void broker_log_data_with_host_service(int type, int flags, int attr, char *data, unsigned long data_type, time_t entry_time, struct timeval *timestamp, host *hst, service *svc) {                                                                                                    
        nebstruct_log_data ds;                                                                                                              

        if (!(event_broker_options & BROKER_LOGGED_DATA))                                                                                   
@@ -134,6 +138,19 @@                                                                                                                        
        ds.data_type = data_type;                                                                                                           
        ds.data = data;                                                                                                                     

+       if (hst != NULL && svc == NULL) {                                                                                                   
+               ds.host_name = hst->name;                                                                                                   
+               ds.service_description = NULL;                                                                                              
+       }                                                                                                                                   
+       else if (hst != NULL && svc != NULL) {
+               ds.host_name = svc->host_name;
+               ds.service_description = svc->description;
+       }
+       else {
+               ds.host_name = NULL;
+               ds.service_description = NULL;
+       }
+
        /* make callbacks */
        neb_make_callbacks(NEBCALLBACK_LOG_DATA, (void *)&ds);

@@ -241,7 +258,7 @@


 /* send host check data to broker */
-int broker_host_check(int type, int flags, int attr, host *hst, int check_type, int state, int state_type, struct timeval start_time, struct timeval end_time, char *cmd, double latency, double exectime, int timeout, int early_timeout, int retcode, char *cmdline, char *output, char *long_output, char *perfdata, struct timeval *timestamp) {
+int broker_host_check(int type, int flags, int attr, host *hst, int check_type, int state, int state_type, struct timeval start_time, struct timeval end_time, char *cmd, double latency, double exectime, int timeout, int early_timeout, int retcode, char *cmdline, char *output, char *long_output, char *perfdata, struct timeval *timestamp, char *check_source) {
        char *command_buf = NULL;
        char *command_name = NULL;
        char *command_args = NULL;
@@ -300,7 +317,7 @@


 /* send service check data to broker */
-int broker_service_check(int type, int flags, int attr, service *svc, int check_type, struct timeval start_time, struct timeval end_time, char *cmd, double latency, double exectime, int timeout, int early_timeout, int retcode, char *cmdline, struct timeval *timestamp) {
+int broker_service_check(int type, int flags, int attr, service *svc, int check_type, struct timeval start_time, struct timeval end_time, char *cmd, double latency, double exectime, int timeout, int early_timeout, int retcode, char *cmdline, struct timeval *timestamp, char *check_source) {
        char *command_buf = NULL;
        char *command_name = NULL;
        char *command_args = NULL;

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2013-10-26 12:15:00 +00:00

git blame or changelog will unveil that those changes were done with #4709 and #4754 but do not interfere with existing broker modules - the additional fields in the nebstructs/objects were added at the end where old modules (those compiled against nagios headers) would never recognize them due to compiler tricks with object casts.

event broker modules shouldn't touch nor use any of the functions in broker.c - they rather subscribe themselves to callbacks using "NEBCALLBACK_" as prefix identifier.

maybe it's possible for you to get an unstripped binary/module with exported symbols in order to identify the root cause when looking for backtrace variables on the segfault.

@icinga-migration
Copy link
Author

Updated by netmax on 2013-10-26 13:51:14 +00:00

I now got a bit further.

I setup a clean system with the same packages and a basic configuration, with one host and one service to check.
I played a bit around with configuration options and found out that icinga only crashes when mod_gearman is loaded directly in icinga.cfg with a broker_module line.

# this line in icinga.cfg crashes icinga

broker_module=/usr/lib64/mod_gearman/mod_gearman.o config=/etc/mod_gearman/mod_gearman_neb.conf

# commenting out broker_module line above nad activate mod_gearman.cfg in modules dir works:

icinga-test:/etc/icinga/modules # cat mod_gearman.cfg
define module{
        module_name     modgearman
        module_type     neb
        path            /usr/lib64/mod_gearman/mod_gearman.o
        args            config=/etc/mod_gearman/mod_gearman_neb.conf
        }

I got that working because gearmand was not running, when starting up icinga.

So mod_gearman gets loaded on icinga startup without connecting to gearmand. After starting up gearmand the mod_gearman neb creates a check_results worker,
but after resheduleing a servicecheck icinga dies, so there seems that something is still broken:

 Queue Name         | Worker Available | Jobs Waiting | Jobs Running
---------------------------------------------------------------------
 check_results      |               0  |           5  |           0
 eventhandler       |               1  |           0  |           0
 host               |               1  |           0  |           0
 service            |               1  |           0  |           0
 worker_icinga-test |               1  |           0  |           0
---------------------------------------------------------------------

If needed i can give access to the test system.

@icinga-migration
Copy link
Author

Updated by netmax on 2013-10-26 14:07:33 +00:00

another backtrace with more informations:

(gdb) bt

#0  0x00007fa9136c7b35 in raise () from /lib64/libc.so.6
#1  0x00007fa9136c9111 in abort () from /lib64/libc.so.6
#2  0x00007fa91370b72d in __malloc_assert () from /lib64/libc.so.6
#3  0x00007fa91370e9b0 in _int_malloc () from /lib64/libc.so.6
#4  0x00007fa9137105a7 in malloc () from /lib64/libc.so.6
#5  0x00007fa9137156e2 in strdup () from /lib64/libc.so.6
#6  0x00007fa9131f1406 in get_results (job=0x7fa90c005360, context=, result_size=, ret_ptr=0x7fa90c005620)

at neb_module/result_thread.c:177

#7  0x00007fa912fd2933 in FunctionV1::callback(gearman_job_st*, void*) () from /usr/lib64/libgearman.so.6
#8  0x00007fa912fd8685 in gearman_worker_work () from /usr/lib64/libgearman.so.6
#9  0x00007fa9131f19c8 in result_worker (data=) at neb_module/result_thread.c:61
#10 0x00007fa913c1d7b6 in start_thread () from /lib64/libpthread.so.0
#11 0x00007fa91376ec5d in clone () from /lib64/libc.so.6
#12 0x0000000000000000 in ?? ()

@icinga-migration
Copy link
Author

Updated by netmax on 2013-10-26 14:09:52 +00:00

the bt seems to differ a bit on every try:

(gdb) bt

#0  0x00007f3b6d44cb35 in raise () from /lib64/libc.so.6
#1  0x00007f3b6d44e111 in abort () from /lib64/libc.so.6
#2  0x00007f3b6d48adef in __libc_message () from /lib64/libc.so.6
#3  0x00007f3b6d490518 in malloc_printerr () from /lib64/libc.so.6
#4  0x00007f3b6d4934cf in _int_malloc () from /lib64/libc.so.6
#5  0x00007f3b6d4955a7 in malloc () from /lib64/libc.so.6
#6  0x00007f3b6d49a6e2 in strdup () from /lib64/libc.so.6
#7  0x00007f3b6cf72b20 in string2timeval (value=0xe7c , t=0xe7e) at common/utils.c:1129
#8  0x00007f3b6cf76626 in get_results (job=0x6eef30, context=, result_size=, ret_ptr=0x6ef1f0)

at neb_module/result_thread.c:203

#9  0x00007f3b6cd57933 in FunctionV1::callback(gearman_job_st*, void*) () from /usr/lib64/libgearman.so.6
#10 0x00007f3b6cd5d685 in gearman_worker_work () from /usr/lib64/libgearman.so.6
#11 0x00007f3b6cf769c8 in result_worker (data=) at neb_module/result_thread.c:61
#12 0x00007f3b6d9a27b6 in start_thread () from /lib64/libpthread.so.0
#13 0x00007f3b6d4f3c5d in clone () from /lib64/libc.so.6
#14 0x0000000000000000 in ?? ()

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2013-10-26 15:31:31 +00:00

that seems to happen only with mod_gearman. i've tested both ways of adding idomod as a neb broker module (module object and broker_module) and it's working. though i don't have any mod_gearman install here (yet) to quickly get an insight what's going on.

still, i would be interested which mod_gearman was working with icinga 1.9.x previously.

@icinga-migration
Copy link
Author

Updated by netmax on 2013-10-26 15:43:04 +00:00

i just edited my post above, because i noticed that gearmand needs to running, to crash the icinga process,
so the way mod_gearman is loaded may not be relevant.

The mod_gearman version was the same when running icinga 1.9.x

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2013-10-26 16:10:20 +00:00

a quick shot on debian jessie with 1.4.10 from packages just unveils yet another problem i am not really keen on debugging (worker error: gearman_worker_grab_job(GEARMAN_UNEXPECTED_PACKET) but in terms of starting up both ways (module object and broker_modules) do work for me with mod_gearman even.

do you have some sort of protection mechanism running, especially for copying files to /tmp and loading them from over there? that was a reverted change introduced in 1.10 in order to support multiple module objects at the same time. though it does not make any sense in regards of the memory corruption at all.

how does idomod perform at your place? any errors?

not sure how to proceed here though. i've seen that you've enable debugging in your spec file. might be worth a shot to disable it to see whether behaviour changes or not. other than that, valgrind may unveil possible memory leaks and corruption too.

@icinga-migration
Copy link
Author

Updated by netmax on 2013-10-26 16:23:16 +00:00

there are no protection mechanism running on my system, i noticed that there are stale icinganebmod* files in /tmp. I think the are there because of the crashes:

icinga-test:/tmp # ls -la /tmp/icinganebmod*
-rw------- 1 icinga icinga      0 26. Okt 15:05 /tmp/icinganebmod8CVHh8
-rwxr--r-- 1 icinga users  926188 26. Okt 16:07 /tmp/icinganebmod9KWKUr
-rwxr--r-- 1 icinga users  926188 26. Okt 16:05 /tmp/icinganebmodHQf4Cp
-rwxr--r-- 1 icinga users  926188 26. Okt 16:03 /tmp/icinganebmodihijLd
-rwxr--r-- 1 icinga icinga 926188 26. Okt 16:02 /tmp/icinganebmodIqGOPt
-rwxr--r-- 1 icinga icinga 926188 26. Okt 16:02 /tmp/icinganebmodJ3QhnD
-rwxr--r-- 1 icinga users  926188 26. Okt 16:03 /tmp/icinganebmodmrm01M
-rwxr--r-- 1 icinga users  926188 26. Okt 16:03 /tmp/icinganebmodSDZP6D
-rwxr--r-- 1 icinga users  926188 26. Okt 16:03 /tmp/icinganebmodULqnZO
-rw------- 1 icinga users       0 26. Okt 15:05 /tmp/icinganebmodxrcMOu
-rwxr--r-- 1 icinga users  926188 26. Okt 16:03 /tmp/icinganebmodZDrmdi
-rwxr--r-- 1 icinga users  926188 26. Okt 16:03 /tmp/icinganebmodzi5TPE

I'm not using idomod on both systems right now, but i can test this.

The debug option is just enabled for now (enabled today) to get some more details for this problem, in general it's disabled.

Which gearman version did you use in your test?

@icinga-migration
Copy link
Author

Updated by netmax on 2013-10-27 09:29:25 +00:00

I ran icinga through valgrind over the last night, it keeps "running" and seems to work, but gives following output:

icinga-test:~> valgrind /usr/sbin/icinga /etc/icinga/icinga.cfg 
==5555== Memcheck, a memory error detector                      
==5555== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==5555== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==5555== Command: /usr/sbin/icinga /etc/icinga/icinga.cfg                 
==5555==                                                                  

Icinga 1.10.0
Copyright (c) 2009-2013 Icinga Development Team (http://www.icinga.org)
Copyright (c) 2009-2013 Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad                                          
Last Modified: 10-24-2013                                                      
License: GPL                                                                   

Icinga 1.10.0 starting... (PID=5555)
Local time is Sat Oct 26 19:25:39 CEST 2013
[2013-10-26 19:25:39][5555][TRACE] parse_args_line(logfile=/var/log/mod_gearman/mod_gearman_neb.log, 1)
[2013-10-26 19:25:39][5555][TRACE] parse_args_line(server=localhost:4730, 1)                           
[2013-10-26 19:25:40][5555][TRACE] parse_args_line(eventhandler=yes, 1)                                
[2013-10-26 19:25:40][5555][TRACE] parse_args_line(services=yes, 1)                                    
[2013-10-26 19:25:40][5555][TRACE] parse_args_line(hosts=yes, 1)                                       
[2013-10-26 19:25:40][5555][TRACE] parse_args_line(do_hostchecks=yes, 1)                               
[2013-10-26 19:25:40][5555][TRACE] parse_args_line(route_eventhandler_like_checks=no, 1)               
[2013-10-26 19:25:40][5555][TRACE] parse_args_line(encryption=yes, 1)                                  
[2013-10-26 19:25:40][5555][TRACE] parse_args_line(key=should_be_changed, 1)                           
[2013-10-26 19:25:40][5555][TRACE] parse_args_line(use_uniq_jobs=on, 1)                                
[2013-10-26 19:25:40][5555][TRACE] parse_args_line(localhostgroups=, 1)                                
[2013-10-26 19:25:40][5555][TRACE] parse_args_line(localservicegroups=, 1)                             
[2013-10-26 19:25:40][5555][TRACE] parse_args_line(result_workers=1, 1)                                
[2013-10-26 19:25:40][5555][TRACE] parse_args_line(perfdata=no, 1)                                     
[2013-10-26 19:25:40][5555][TRACE] parse_args_line(perfdata_mode=1, 1)                                 
[2013-10-26 19:25:40][5555][TRACE] parse_args_line(orphan_host_checks=yes, 1)                          
[2013-10-26 19:25:40][5555][TRACE] parse_args_line(orphan_service_checks=yes, 1)                       
[2013-10-26 19:25:40][5555][TRACE] parse_args_line(accept_clear_results=no, 1)                         
==5555== Thread 3:                                                                                     
==5555== Invalid write of size 8                                                                       
==5555==    at 0x444666: init_check_result (in /usr/sbin/icinga)                                       
==5555==    by 0x5E5C30B: get_results (result_thread.c:156)                                            
==5555==    by 0x6108932: FunctionV1::callback(gearman_job_st*, void*) (in /usr/lib64/libgearman.so.6.0.0)
==5555==    by 0x610E684: gearman_worker_work (in /usr/lib64/libgearman.so.6.0.0)                         
==5555==    by 0x5E5C9C7: result_worker (result_thread.c:61)                                              
==5555==    by 0x50B17B5: start_thread (in /lib64/libpthread-2.11.3.so)                                   
==5555==  Address 0x74d01d8 is 0 bytes after a block of size 136 alloc'd                                  
==5555==    at 0x4C28F09: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)               
==5555==    by 0x5E5C2EF: get_results (result_thread.c:152)                                               
==5555==    by 0x6108932: FunctionV1::callback(gearman_job_st*, void*) (in /usr/lib64/libgearman.so.6.0.0)
==5555==    by 0x610E684: gearman_worker_work (in /usr/lib64/libgearman.so.6.0.0)                         
==5555==    by 0x5E5C9C7: result_worker (result_thread.c:61)                                              
==5555==    by 0x50B17B5: start_thread (in /lib64/libpthread-2.11.3.so)                                   
==5555==                                                                                                  
==5555== Thread 1:                                                                                        
==5555== Invalid read of size 8                                                                           
==5555==    at 0x41EF0F: handle_async_host_check_result_3x (in /usr/sbin/icinga)                          
==5555==    by 0x421E04: reap_check_results (in /usr/sbin/icinga)                                         
==5555==    by 0x430E57: handle_timed_event (in /usr/sbin/icinga)                                         
==5555==    by 0x4311E1: event_execution_loop (in /usr/sbin/icinga)                                       
==5555==    by 0x415D15: main (in /usr/sbin/icinga)                                                       
==5555==  Address 0x74d01d8 is 0 bytes after a block of size 136 alloc'd                                  
==5555==    at 0x4C28F09: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)               
==5555==    by 0x5E5C2EF: get_results (result_thread.c:152)                                               
==5555==    by 0x6108932: FunctionV1::callback(gearman_job_st*, void*) (in /usr/lib64/libgearman.so.6.0.0)
==5555==    by 0x610E684: gearman_worker_work (in /usr/lib64/libgearman.so.6.0.0)                         
==5555==    by 0x5E5C9C7: result_worker (result_thread.c:61)                                              
==5555==    by 0x50B17B5: start_thread (in /lib64/libpthread-2.11.3.so)                                   
==5555==                                                                                                  
==5555== Invalid read of size 8                                                                           
==5555==    at 0x41F27D: handle_async_host_check_result_3x (in /usr/sbin/icinga)                          
==5555==    by 0x421E04: reap_check_results (in /usr/sbin/icinga)                                         
==5555==    by 0x430E57: handle_timed_event (in /usr/sbin/icinga)                                         
==5555==    by 0x4311E1: event_execution_loop (in /usr/sbin/icinga)                                       
==5555==    by 0x415D15: main (in /usr/sbin/icinga)
==5555==  Address 0x74d01d8 is 0 bytes after a block of size 136 alloc'd
==5555==    at 0x4C28F09: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==5555==    by 0x5E5C2EF: get_results (result_thread.c:152)
==5555==    by 0x6108932: FunctionV1::callback(gearman_job_st*, void*) (in /usr/lib64/libgearman.so.6.0.0)
==5555==    by 0x610E684: gearman_worker_work (in /usr/lib64/libgearman.so.6.0.0)
==5555==    by 0x5E5C9C7: result_worker (result_thread.c:61)
==5555==    by 0x50B17B5: start_thread (in /lib64/libpthread-2.11.3.so)
==5555==
==5555== Invalid read of size 8
==5555==    at 0x420D4C: handle_async_service_check_result (in /usr/sbin/icinga)
==5555==    by 0x421D28: reap_check_results (in /usr/sbin/icinga)
==5555==    by 0x430E57: handle_timed_event (in /usr/sbin/icinga)
==5555==    by 0x4311E1: event_execution_loop (in /usr/sbin/icinga)
==5555==    by 0x415D15: main (in /usr/sbin/icinga)
==5555==  Address 0x7595498 is 0 bytes after a block of size 136 alloc'd
==5555==    at 0x4C28F09: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==5555==    by 0x5E5C2EF: get_results (result_thread.c:152)
==5555==    by 0x6108932: FunctionV1::callback(gearman_job_st*, void*) (in /usr/lib64/libgearman.so.6.0.0)
==5555==    by 0x610E684: gearman_worker_work (in /usr/lib64/libgearman.so.6.0.0)
==5555==    by 0x5E5C9C7: result_worker (result_thread.c:61)
==5555==    by 0x50B17B5: start_thread (in /lib64/libpthread-2.11.3.so)
==5555==
==5555== Invalid read of size 8
==5555==    at 0x420A65: handle_async_service_check_result (in /usr/sbin/icinga)
==5555==    by 0x421D28: reap_check_results (in /usr/sbin/icinga)
==5555==    by 0x430E57: handle_timed_event (in /usr/sbin/icinga)
==5555==    by 0x4311E1: event_execution_loop (in /usr/sbin/icinga)
==5555==    by 0x415D15: main (in /usr/sbin/icinga)
==5555==  Address 0x7595498 is 0 bytes after a block of size 136 alloc'd
==5555==    at 0x4C28F09: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==5555==    by 0x5E5C2EF: get_results (result_thread.c:152)
==5555==    by 0x6108932: FunctionV1::callback(gearman_job_st*, void*) (in /usr/lib64/libgearman.so.6.0.0)
==5555==    by 0x610E684: gearman_worker_work (in /usr/lib64/libgearman.so.6.0.0)
==5555==    by 0x5E5C9C7: result_worker (result_thread.c:61)
==5555==    by 0x50B17B5: start_thread (in /lib64/libpthread-2.11.3.so)
==5555==

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2013-10-28 09:27:43 +00:00

Hm. I don't have a working mod_gearman setup but what comes to mind at last - mod_gearman manipulates core memory (which is a violation of the neb api even if the author claims otherwise) in order to merge checkresult objects into the core's checkresult list for later processing.

it could be that the check source attribute causes irritations here. maybe you'll revert
66557d4 and 44b8ee7 in order to verify if that's causing your issue. maybe i'll find a time slot next weekend to fully setup mod_gearman on a test box, but since then you should test yourself anyways.

@icinga-migration
Copy link
Author

Updated by netmax on 2013-10-28 11:49:34 +00:00

I reverted those patches and it seems to work stable without them.

The only files i can't revert are these, but not relevant for testing:
module/idoutils/db/oracle/upgrade/oracle-upgrade-1.10.0.sql
module/idoutils/db/pgsql/upgrade/pgsql-upgrade-1.10.0.sql
module/idoutils/db/mysql/upgrade/mysql-upgrade-1.10.0.sql

So what could be the final solution?

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2013-10-28 11:54:38 +00:00

  • Category set to Event Broker
  • Status changed from Feedback to Assigned
  • Assigned to set to mfriedrich
  • Priority changed from Normal to High
  • Target Version set to 1.10.1

thanks for the fast feedback.

solution for 1.10.1 - keeping check_source within idoutils schema and classic ui for icinga 2 only, and reverting the feature for icinga core 1.x due to mod_gearman touching the checkresult lists in memory. it was just an idea to support that feature in icinga core 1.x but if addons prevent innovative features it's just yet another argument for icinga 2 as rewrite from scratch.

i might come up with a cleaner revert patch, i'd be happy if you can test that one (should be doable after work at home hopefully).

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2013-10-28 19:37:40 +00:00

a quick & dirty revert done in ~30min - please test, i've squashed all the involved commits into one diff restoring the functionality for idoutils schema and classic ui.

https://git.icinga.org/?p=icinga-core.git;a=shortlog;h=refs/heads/fix/mod-gearman-check-source-4958

@icinga-migration
Copy link
Author

Updated by netmax on 2013-10-28 20:33:02 +00:00

Works as expected, tested on three setups, with and without idoutils.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2013-10-28 20:54:19 +00:00

  • Status changed from Assigned to Resolved
  • Done % changed from 0 to 100

wow, you're fast, thanks.

merged to support/1.10 and scheduled for 1.10.1 soon. resolving here.

@icinga-migration
Copy link
Author

Updated by mcp on 2013-10-29 20:04:36 +00:00

works for me too.

@icinga-migration
Copy link
Author

Updated by mcp on 2013-10-29 22:36:06 +00:00

  • File added remove-leftover-check_source.patch

Moin Michael,

you forgot 2 left-over check_source things for extinfo.cgi in eca694a ;)

attached patch removes them.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2013-10-29 22:46:38 +00:00

they'll stay as icinga 2 makes use of this feature - so the leftover was intended, but thanks for having a closer look.

if the backend provides the host/service status.dat objects with 'check_source' it will be read by the cgis and presented to the viewer. for 1.x there's no chance to have that feature implemented unless mod_gearman would stop manipulating inner core structures. in 2.x we do have all the functionality enabled and implemented for clustering and setting the check_source attribute correctly for the instance executing the check - therefore this is a nice gimmick for everyone to play with icinga 2 clustering provided with 0.0.3

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2013-10-31 11:32:09 +00:00

btw - it's not only caused by check result themselves getting added to the core's checkresult list, but could also be affected by the fake orphaned check results generated to mark a check being orphaned.

https://github.com/sni/mod\_gearman/blob/master/neb\_module/mod\_gearman.c#L625

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant