[dev.icinga.com #702] Solaris 10: Bus Error (core dumped) when starting icinga #353
Comments
Updated by mfriedrich on 2010-08-12 11:02:08 +00:00 how is this built? which compiler? |
Updated by mfriedrich on 2010-08-12 22:01:07 +00:00 ok, some things to think about. this patch add profiler_init() without any checks on enabled/disabled in icinga.c could be a possible leak for solaris dumping the core. althouth the trace leads the way to verifying the config. but as a matter of fact, the output in #572 points out that the drop_privileges function with getgid and getuid are faulty. this leads to the following ideas:
and check on what's been changing since 1.0.1 as this was working fine. |
Updated by mfriedrich on 2010-08-12 22:19:52 +00:00 and furthermore, after dropping the prvilegues, probably the reading of the objects fails in some way? read_object_config_data => xodtemplate. |
Updated by antonxx on 2010-08-13 15:00:34 +00:00 GCC version. Note: I compile as normal user. As normal user I get:
as root I get
does this help? |
Updated by raindog on 2010-08-18 16:34:13 +00:00 I've encountered the same issue on Solaris 10 with Icinga 1.0.3, complied as non root - core with cgis only. bash-3.00$ /sw_ux/scripts/icinga checkconfig bash-3.00$ gcc --version Also another issue when compiling that's been around for a while. I know the work around to copy the sprintf.o to the common directory. bash-3.00$ make all ***** Error code 1 ***** Error code 1 |
Updated by mfriedrich on 2010-08-18 16:57:47 +00:00
the snprintf target was an attempt for solaris in this issues. it has not been touched ever since missing any more feedback. https://dev.icinga.org/issues/521 changes remain in https://git.icinga.org/?p=icinga-core.git;a=shortlog;h=refs/heads/mfriedrich/sun it would be great if you can test that branch, and report feedback on this. besides - are there any ready-to-use solaris vm's available? |
Updated by raindog on 2010-08-18 17:50:23 +00:00 Tried your changes for the snprintf issue ... ***** Error code 1 ***** Error code 1 |
Updated by antonxx on 2010-08-18 20:38:18 +00:00 dnsmichi wrote:
After registration you can go to: http://www.oracle.com/technetwork/server-storage/solaris/downloads/index.html and here you can grab a virtualbox appliance (get it from www.virtualbox.org). After unzipping the zip file, start your virtualbox and go to: File -> import appliance You can use this vm for free, but as I understand, only for development purposes, (By the way oracle just announced they would stop OpenSolaris!) |
Updated by Meier on 2010-09-10 18:01:42 +00:00 It is already known that the change in question was frmo 1.0.1 to 1.0.2 https://dev.icinga.org/issues/572#note-11 Why is this not a duplicate of https://dev.icinga.org/issues/572 ? |
Updated by Meier on 2010-09-10 18:06:42 +00:00 antonxx wrote:
And they just released Solaris 10u9. Also there are some plans about Solaris Express. |
Updated by LarsEngels on 2010-09-14 14:03:24 +00:00 FWIW: I got the same error on a SPARC machine (Solaris 10 Update 7, Compiler: gcc 3.4.6). |
Updated by LarsEngels on 2010-09-14 15:37:49 +00:00 gdb shows: gdb ./icinga /var/core/core_ecpmon01_icinga_0_0_1284475992_11339
(gdb) bt
(gdb) |
Updated by LarsEngels on 2010-09-14 15:40:04 +00:00 common/macros.c line 2509 add_macrox_name(HOSTNAME); Macro: |
Updated by mfriedrich on 2010-09-14 16:05:41 +00:00
i consider gcc3 as root of all evil, and as a matter of fact that #define trick does not work with gcc3 then. the strdup cannot duplicate the string as there is no source address in memory - best guess so far. in order to remove this bug, I'll revert the commit d60c8af but leave the notificationsescalated macrofix in place. |
Updated by mfriedrich on 2010-09-23 18:22:05 +00:00 ok. taking gurrent git master from 23-09-2010 18:00 dbe4749 gcc version 3.4.6 installed like this, with some ssl configure hacks: https://dev.icinga.org/projects/icinga-core/wiki/Setup\_Solaris\_VM 40b98f2 in mfriedrich/solaris compiled as user, installed via sudo into /usr/local/icinga run as daemon, root: fine -bash-3.00# /usr/local/icinga/bin/icinga /usr/local/icinga/etc/icinga.cfg run via init-script -bash-3.00# /etc/init.d/icinga start but with -d it does not on the shell.
next hackup - echo the initscript output of chkconfig, where the segfault happens. -bash-3.00# /usr/local/icinga/bin/icinga -v /usr/local/icinga/etc/icinga.cfg > /usr/local/icinga/var/icinga.chk 2>&1 -bash-3.00# /usr/local/icinga/bin/icinga -v /usr/local/icinga/etc/icinga.cfg > /dev/null 2>&1 taken old init-script from 1.0.1 - same dump. so it has to do something with the > ... param somwhow. opened pipe while opening files? prohibited by some mechanism like selinux? the segfault clearly shows an access violation in mmap. which leads into the shared.c directive introduced after 1.0.1 |
Updated by mfriedrich on 2010-09-24 08:43:14 +00:00 regarding memory allocation. i've now done reversed quicksort. got commit sha1 from 1.0.2 and 1.0.1 and stepped half way down, running gdb all the time on the checked out branches (iirc i was at test25 then)
point is, that WORKS OK SEGFAULT this is when the eventprofiler steps in. running through what it does. icinga.cprofiler_init(); is called. even if event_profiling is disabled. profiler.cwithin profiler_init() several profiler_add() calls. profiler_add() allocates memory like this
afterwards, nothing special happens. reallochttp://opensolaris.org/jive/message.jspa?messageID=89269 profiler_item is int, int, double, char* ok, so it just re-allocates more memory. what if it allocates too much for the current process? ok, man pages.
what can be resolved => comment profiler_init(); call in icinga.c - everything works fine (x86 and sparc tested).
|
Updated by mfriedrich on 2010-09-24 10:25:11 +00:00
http://www.totalviewtech.com/support/documentation/tips/realloc\_issue.html needed to debug if dangling pointers might happen. |
Updated by mfriedrich on 2010-09-24 17:06:49 +00:00
tests for longer runs needed, til monday :) |
Updated by mfriedrich on 2010-09-27 08:06:37 +00:00
runs fine on x86 and sparc. x86 gdb session over the weekend did not throw anything special. re-open if you consider any other error. |
This issue has been migrated from Redmine: https://dev.icinga.com/issues/702
Created by antonxx on 2010-08-11 07:33:53 +00:00
Assignee: mfriedrich
Status: Resolved (closed on 2010-09-27 08:06:37 +00:00)
Target Version: 1.2 (Stable)
Last Update: 2010-09-27 08:06:37 +00:00 (in Redmine)
Hi,
I did now the same steps on solaris which I did when compiling on linux.
My actual status:
icinga with classical web interface works on suse linux 11.1 64 bit.
On solaris 10 (sparc) I stumble over the step in
the quickstart documentation:
When looking at the dump, I see:
Note: I just compiled nagion 3.2.1 + the nagios plugins 1.4.15 (the same used with icinga)
and the system works on solaris ... so it must be a difference since the fork...
Changesets
2010-09-24 12:03:12 +00:00 by mfriedrich 69d5fab
2010-09-24 16:38:26 +00:00 by mfriedrich 8ca33ed
Relations:
The text was updated successfully, but these errors were encountered: