[dev.icinga.com #2271] unsafe null ptr passed to debug log, segfault on solaris #850
Comments
Updated by Tommi on 2012-01-28 17:12:34 +00:00 Its the same as already logged in #1253 and happens with syslog call as well. |
Updated by mfriedrich on 2012-01-29 00:38:48 +00:00 i'd rather fix the small places where null strings are being passed by accident than writing a wrapper. |
Updated by crfriend on 2012-02-05 15:23:15 +00:00 dnsmichi wrote:
Respectfully, I concur with dnsmichi. Where a null-pointer gets accidentally passed -- a clear error -- I would prefer a proper segfault to let the programmer know that he's committed a blunder in pointer-handling. "Safing" a routine by way of assigning something "harmless" to replace a null-pointer could have unintended consequences downstream. This is not so harmful in simple logging/debugging calls, but if it gets into downstream data the consequences could be serious indeed (e.g. "What happens if you send the string "null" to another piece of code or to a hardware device of some sort?"). A segfault is the safe way to handle this situation. By the way, I was the guy who noticed this and put it onto the mailling-list, and it's absolutely related to #2292, in which I provided a patch to catch it. |
Updated by mfriedrich on 2012-04-03 11:42:29 +00:00
so, are these commits in dev/ido now finally tested on solaris and working? |
Updated by mfriedrich on 2012-04-04 11:13:39 +00:00 does not work. causes the symptons debugged in #2500 getting FUTEX_WAIT_PRIVATE on the parant, waiting for the child to terminate, which got a lock never free'd on the debuglog caused by the vasprintf implementation change. so rather check the string containing a NULL value before passing to log_debug_info as stated. |
Updated by Tommi on 2012-04-04 17:41:59 +00:00 currently i dont understood the relationship between a buffer in a function and the lock you detected. Maybe optimiser was too strong and assigned the same memory region for each thread. Is there no parameter to get such buffer really private? |
Updated by mfriedrich on 2012-04-04 22:28:54 +00:00 try to debug it yourself why the futex happens and have a reading hour about possible faulty conditions. your fix does harm non-solaris systems so it' s not the way to go. define dev guidelines where strings passed to logging functions must be checked if null. that enforces awareness of possible errors as well. |
Updated by Tommi on 2012-04-08 10:15:04 +00:00
looks like after changing back to vfprintf with #2500 the error targeted in this ticket is back again and caused #2503. This prevents idoutils to start on solaris |
Updated by Tommi on 2012-04-09 09:10:17 +00:00
added several null pointer checks for #2503, which maybe solves this issue as well. |
Updated by mfriedrich on 2012-04-09 17:16:20 +00:00
the variable args list must be checked upon null string, and continue/bail early, if you really want to sneak into the function itsself. did i mention that i "love" solaris for being that dumb? http://bugs.mysql.com/bug.php?id=37850 describes a similar issue see the test program in http://bugs.mysql.com/bug.php?id=54478
the discussion leads to
as said previously, i'd go with the "clean the code on %s and log_debug_info". this will keeo the debug on solaris safe - because it seems that without debug logging enabled this is not a problem and does not cause issues. marking this low priority as not release critical imho. |
Updated by crfriend on 2012-04-09 21:01:55 +00:00 dnsmichi wrote:
Sometimes "dumb" is its own reward. It puts the onus on the programmer to "do the right thing" and helps keep downstream stuff (somewhat) clean.
The only caveat to this is that the first reaction that somebody will likely have upon getting into trouble will be to increase the verbosity of debugging, and that, in this particular case, made things vastly worse for me -- to the point where it was a code-dive, and not everybody is going to be up to that.
The only aspect to this is that it should be documented somewhere that upping the debug level may cause additional problems on non-Linux platforms. It is especially worth noting that if there hasn't been a successful definition of the instance name in the database (yet) ido2db will segfault (at least on Solaris) every time with very verbose logging turned on. Ideally, the safety harness should go on every call, but likely that's overkill. |
Updated by mfriedrich on 2012-04-10 09:55:24 +00:00 well don't get me wrong, i have full respect for such an intention. it just drives me crazy as it limits the scope of the real problems to be touched and solved. your argument with debug enabled is true, but as a matter of fact i don't have any solaris or bsd vm around to test that now - i'll try within the next days. such a safety place should be considered when working on the irpe logging in #2489 |
Updated by crfriend on 2012-04-10 20:45:40 +00:00 dnsmichi wrote:
I didn't read you wrongly at all. I work for a software house, so I completely understand the pressures to "get stuff out the door"; however there is the matter of compatibility when it comes to distributing source-code as the "finished product" to be built by folks who wish to use said product. Thanks, by the way, for the test-case above. I'm a bit light on systems at the moment, but put that through IRIX (on MIPS) 6.5 with the SGI compilers and it produced an output of "(null)" instead of a segfault.
Identifying the potential areas where tweaks need to be made can likely be partially automated -- at least for the logging bits. I'll see what I can find in git and take a look. |
Updated by mfriedrich on 2012-04-25 16:43:06 +00:00 can this be resolved, or does it need to be postponed? |
Updated by crfriend on 2012-04-25 22:04:55 +00:00 On Wed, 25 Apr 2012, development@icinga.org wrote:
Am I on the hook for this, or is somebody else? As far as IDOUtils go, I have not experienced a single segfault Cheers! Sincerely, ---------------------------------------------------------------------+ ---------------------------------------------------------------------+ |
Updated by Tommi on 2012-04-26 19:06:47 +00:00 On my system i dont have it anymore with 1.7.0dev. For me its resolved. |
Updated by mfriedrich on 2012-04-30 16:11:04 +00:00
ok thanks for testing! |
Updated by mfriedrich on 2014-12-08 14:46:33 +00:00
|
This issue has been migrated from Redmine: https://dev.icinga.com/issues/2271
Created by mfriedrich on 2012-01-16 12:33:59 +00:00
Assignee: mfriedrich
Status: Resolved (closed on 2012-04-30 16:11:04 +00:00)
Target Version: 1.7
Last Update: 2014-12-08 14:46:33 +00:00 (in Redmine)
some coding issue moving the debug out before verifying that the idi as well as instance_name is not a NULLed char ptr.
the debug log expects a fully checked variable string already
on gnu we are safe (vfprintf will allocate a "(null)" buffer itsself and return that), but on solaris we are screwed leading into a segfault.
(gdb) x/32 0xffbff760
0xffbff760: 0x00000000 0x00000000 0x00000000 0x00000000
0xffbff770: 0x00000000 0x00000000 0x00000000 0x00000000
0xffbff780: 0x00000000 0x00000000 0x00000000 0x00000000
0xffbff790: 0x00000000 0x00000000 0x00000000 0x00000000
0xffbff7a0: 0x00000000 0x00000000 0x00000000 0x00000000
0xffbff7b0: 0x00000000 0x00000000 0x00000000 0x00000000
0xffbff7c0: 0x00000000 0x00000000 0x00000000 0x00000000
0xffbff7d0: 0x00000000 0x00000000 0x00000000 0x00000000
(gdb)
My reading of the above is that the idi structure should be
present at 0xffbff760 but the contents therein are all zero,
including several char pointers including the "instance_name"
which is making the routines in libc crash (and rightly so).
Changesets
2012-01-16 12:40:53 +00:00 by mfriedrich 6ccc98e
2012-02-03 19:32:51 +00:00 by Tommi 2256b95
2012-02-05 08:49:05 +00:00 by Tommi 76ecd02
2012-02-05 18:26:28 +00:00 by mfriedrich d8dd9b1
2012-04-04 11:21:20 +00:00 by mfriedrich 07d2f27
2012-04-09 08:48:29 +00:00 by Tommi 83671d3
2012-04-28 09:18:01 +00:00 by mfriedrich 93cd6d8
2012-04-28 09:20:08 +00:00 by Tommi 3fe913f
Relations:
The text was updated successfully, but these errors were encountered: