Skip to content
This repository has been archived by the owner on Jan 15, 2019. It is now read-only.

[dev.icinga.com #2271] unsafe null ptr passed to debug log, segfault on solaris #850

Closed
icinga-migration opened this issue Jan 16, 2012 · 18 comments

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/2271

Created by mfriedrich on 2012-01-16 12:33:59 +00:00

Assignee: mfriedrich
Status: Resolved (closed on 2012-04-30 16:11:04 +00:00)
Target Version: 1.7
Last Update: 2014-12-08 14:46:33 +00:00 (in Redmine)

Icinga Version: 1.10.0
OS Version: any

some coding issue moving the debug out before verifying that the idi as well as instance_name is not a NULLed char ptr.
the debug log expects a fully checked variable string already

on gnu we are safe (vfprintf will allocate a "(null)" buffer itsself and return that), but on solaris we are screwed leading into a segfault.

Program received signal SIGSEGV, Segmentation fault.
0xff032150 in strlen () from /lib/libc.so.1
(gdb) bt

#0  0xff032150 in strlen () from /lib/libc.so.1
#1  0xff09d704 in _ndoprnt () from /lib/libc.so.1
#2  0xff09fbe0 in vfprintf () from /lib/libc.so.1
#3  0x2d190 in ido2db_log_debug_info (level=1, verbosity=2, fmt=0x32bf8 "ido2db_handle_client_input(instance_name=%s) start\n")    at logging.c:99
#4  0x17070 in ido2db_handle_client_input (idi=0xffbff760, buf=0x57340 "") at ido2db.c:1512
#5  0x16fc8 in ido2db_check_for_client_input (idi=0xffbff760) at ido2db.c:1476
#6  0x16dc4 in ido2db_handle_client_connection (sd=10) at ido2db.c:1354
#7  0x16c1c in ido2db_wait_for_connections () at ido2db.c:1156
#8  0x15194 in main (argc=0, argv=0xffbffb44) at ido2db.c:303

(gdb) x/32 0xffbff760
0xffbff760: 0x00000000 0x00000000 0x00000000 0x00000000
0xffbff770: 0x00000000 0x00000000 0x00000000 0x00000000
0xffbff780: 0x00000000 0x00000000 0x00000000 0x00000000
0xffbff790: 0x00000000 0x00000000 0x00000000 0x00000000
0xffbff7a0: 0x00000000 0x00000000 0x00000000 0x00000000
0xffbff7b0: 0x00000000 0x00000000 0x00000000 0x00000000
0xffbff7c0: 0x00000000 0x00000000 0x00000000 0x00000000
0xffbff7d0: 0x00000000 0x00000000 0x00000000 0x00000000
(gdb)

My reading of the above is that the idi structure should be
present at 0xffbff760 but the contents therein are all zero,
including several char pointers including the "instance_name"
which is making the routines in libc crash (and rightly so).

Changesets

2012-01-16 12:40:53 +00:00 by mfriedrich 6ccc98e

fix unsafe null ptr passed to debug log, segfault on solaris #2271

refs #2271

2012-02-03 19:32:51 +00:00 by Tommi 2256b95

idoutils: replace direct write with vfprintf with gnu vasprint/fprintf, which take care about null pointer arguments causing SEGV in Solaris #2271
refs #2271

2012-02-05 08:49:05 +00:00 by Tommi 76ecd02

idoutils: change snprintf prototype assignments, fix unwanted delete of snprintf.h #2271
refs #2271

2012-02-05 18:26:28 +00:00 by mfriedrich d8dd9b1

fix unsafe null ptr passed to debug log, segfault on solaris #2271

refs #2271

2012-04-04 11:21:20 +00:00 by mfriedrich 07d2f27

partly revert #2271, add soft_lock around debug log, use reentrant localtime_r #2500

the recent change in #2271 introduced a change of vfprintf
to vasprintf afterwards. while at first sight the usage of
the non-reentrant / non-threadsafe localtime() seemed to be
the root cause (fixed in this commit to use localtime_r())
the real cause is the change of writing to that file
descriptor.

the observed behaviour sets the parent process to FUTEX_WAIT_PRIVATE
waiting for the child to terminate which is not possible
still having a lock on the file descriptor.

adding a soft_lock on the debug log not to interfere with
locks inherited from parent to child is good & added, but
does not solve the root cause either.

the reason for that changed behaviour is the usage of vasprintf
and a local buffer rather than vfprintf on fmt directly.

passing a NULL ptr to log_debug_info should be checked upon
call to that function by the caller himself.

this is a partly revert of 2256b95874b5eead76b7aef2a68d9f7b90858a68
and solves #2500

refs #2271
refs #2500

2012-04-09 08:48:29 +00:00 by Tommi 83671d3

idoutils: add several null pointer checks before handover strings to debuglog #2503
refs #2503, #2271

2012-04-28 09:18:01 +00:00 by mfriedrich 93cd6d8

fix unsafe null ptr passed to debug log, segfault on solaris #2271

refs #2271

2012-04-28 09:20:08 +00:00 by Tommi 3fe913f

idoutils: add several null pointer checks before handover strings to debuglog #2503
refs #2503, #2271

Conflicts:

	Changelog

Relations:

@icinga-migration
Copy link
Author

Updated by Tommi on 2012-01-28 17:12:34 +00:00

Its the same as already logged in #1253 and happens with syslog call as well.
I suggested use the included snprintf code at all (undef HAVE__SNPRINTF) instead of the solaris natives, and create a wrapper function which maps 'null' to a fixed empty string (safestring). But this was rejected.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2012-01-29 00:38:48 +00:00

i'd rather fix the small places where null strings are being passed by accident than writing a wrapper.

@icinga-migration
Copy link
Author

Updated by crfriend on 2012-02-05 15:23:15 +00:00

dnsmichi wrote:

i'd rather fix the small places where null strings are being passed by accident than writing a wrapper.

Respectfully, I concur with dnsmichi. Where a null-pointer gets accidentally passed -- a clear error -- I would prefer a proper segfault to let the programmer know that he's committed a blunder in pointer-handling.

"Safing" a routine by way of assigning something "harmless" to replace a null-pointer could have unintended consequences downstream. This is not so harmful in simple logging/debugging calls, but if it gets into downstream data the consequences could be serious indeed (e.g. "What happens if you send the string "null" to another piece of code or to a hardware device of some sort?"). A segfault is the safe way to handle this situation.

By the way, I was the guy who noticed this and put it onto the mailling-list, and it's absolutely related to #2292, in which I provided a patch to catch it.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2012-04-03 11:42:29 +00:00

  • Status changed from Assigned to Feedback
  • Assigned to changed from mfriedrich to Tommi
  • Done % changed from 0 to 90

so, are these commits in dev/ido now finally tested on solaris and working?

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2012-04-04 11:13:39 +00:00

does not work. causes the symptons debugged in #2500 getting FUTEX_WAIT_PRIVATE on the parant, waiting for the child to terminate, which got a lock never free'd on the debuglog caused by the vasprintf implementation change.

so rather check the string containing a NULL value before passing to log_debug_info as stated.

@icinga-migration
Copy link
Author

Updated by Tommi on 2012-04-04 17:41:59 +00:00

currently i dont understood the relationship between a buffer in a function and the lock you detected. Maybe optimiser was too strong and assigned the same memory region for each thread. Is there no parameter to get such buffer really private?
Checking each string befor passing it appears to be not feasible because of variable argument. This is exactly, what the local sprintf functions are doing.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2012-04-04 22:28:54 +00:00

try to debug it yourself why the futex happens and have a reading hour about possible faulty conditions. your fix does harm non-solaris systems so it' s not the way to go.

define dev guidelines where strings passed to logging functions must be checked if null. that enforces awareness of possible errors as well.

@icinga-migration
Copy link
Author

Updated by Tommi on 2012-04-08 10:15:04 +00:00

  • Status changed from Feedback to Assigned
  • Assigned to changed from Tommi to mfriedrich
  • Priority changed from Low to Normal
  • Done % changed from 90 to 0

looks like after changing back to vfprintf with #2500 the error targeted in this ticket is back again and caused #2503. This prevents idoutils to start on solaris

@icinga-migration
Copy link
Author

Updated by Tommi on 2012-04-09 09:10:17 +00:00

  • Done % changed from 0 to 90

added several null pointer checks for #2503, which maybe solves this issue as well.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2012-04-09 17:16:20 +00:00

  • Priority changed from Normal to Low

the variable args list must be checked upon null string, and continue/bail early, if you really want to sneak into the function itsself.

did i mention that i "love" solaris for being that dumb?

http://bugs.mysql.com/bug.php?id=37850 describes a similar issue

see the test program in http://bugs.mysql.com/bug.php?id=54478

#include 
#include 


void print(const char *format,...)
{
  va_list args;

  char cvtbuf[1024];

  va_start(args,format);

  vsnprintf(cvtbuf, sizeof(cvtbuf), format, args);
  printf("%s\n", cvtbuf);

  va_end(args);
}


int main()
{
  print("vsnprintf-test: %s", "Testing");
  print("vsnprintf-test: %s", NULL);

  return 0;
}

the discussion leads to

as said previously, i'd go with the "clean the code on %s and log_debug_info". this will keeo the debug on solaris safe - because it seems that without debug logging enabled this is not a problem and does not cause issues.

marking this low priority as not release critical imho.

@icinga-migration
Copy link
Author

Updated by crfriend on 2012-04-09 21:01:55 +00:00

dnsmichi wrote:

did i mention that i "love" solaris for being that dumb?

Sometimes "dumb" is its own reward. It puts the onus on the programmer to "do the right thing" and helps keep downstream stuff (somewhat) clean.

as said previously, i'd go with the "clean the code on %s and log_debug_info". this will keeo the debug on solaris safe - because it seems that without debug logging enabled this is not a problem and does not cause issues.

The only caveat to this is that the first reaction that somebody will likely have upon getting into trouble will be to increase the verbosity of debugging, and that, in this particular case, made things vastly worse for me -- to the point where it was a code-dive, and not everybody is going to be up to that.

marking this low priority as not release critical imho.

The only aspect to this is that it should be documented somewhere that upping the debug level may cause additional problems on non-Linux platforms. It is especially worth noting that if there hasn't been a successful definition of the instance name in the database (yet) ido2db will segfault (at least on Solaris) every time with very verbose logging turned on.

Ideally, the safety harness should go on every call, but likely that's overkill.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2012-04-10 09:55:24 +00:00

well don't get me wrong, i have full respect for such an intention. it just drives me crazy as it limits the scope of the real problems to be touched and solved.

your argument with debug enabled is true, but as a matter of fact i don't have any solaris or bsd vm around to test that now - i'll try within the next days.

such a safety place should be considered when working on the irpe logging in #2489

@icinga-migration
Copy link
Author

Updated by crfriend on 2012-04-10 20:45:40 +00:00

dnsmichi wrote:

well don't get me wrong, i have full respect for such an intention. it just drives me crazy as it limits the scope of the real problems to be touched and solved.

I didn't read you wrongly at all. I work for a software house, so I completely understand the pressures to "get stuff out the door"; however there is the matter of compatibility when it comes to distributing source-code as the "finished product" to be built by folks who wish to use said product.

Thanks, by the way, for the test-case above. I'm a bit light on systems at the moment, but put that through IRIX (on MIPS) 6.5 with the SGI compilers and it produced an output of "(null)" instead of a segfault.

such a safety place should be considered when working on the irpe logging in #2489

Identifying the potential areas where tweaks need to be made can likely be partially automated -- at least for the logging bits. I'll see what I can find in git and take a look.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2012-04-25 16:43:06 +00:00

can this be resolved, or does it need to be postponed?

@icinga-migration
Copy link
Author

Updated by crfriend on 2012-04-25 22:04:55 +00:00

On Wed, 25 Apr 2012, development@icinga.org wrote:

Issue #2271 has been updated by dnsmichi.

can this be resolved, or does it need to be postponed?

Am I on the hook for this, or is somebody else?

As far as IDOUtils go, I have not experienced a single segfault
with the patches I applied (and submitted). This does not mean that
it's fixed; all it means is that I slew the demons that were chasing
me in my own case.

Cheers!

Sincerely,

---------------------------------------------------------------------+
| Carl Richard Friend (UNIX Sysadmin) | West Boylston |
| Minicomputer Collector / Enthusiast | Massachusetts, USA |
| mailto:crfriend@rcn.com ---------------------
| http://users.rcn.com/crfriend/museum | ICBM: 42:22N 71:47W |

---------------------------------------------------------------------+

@icinga-migration
Copy link
Author

Updated by Tommi on 2012-04-26 19:06:47 +00:00

On my system i dont have it anymore with 1.7.0dev. For me its resolved.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2012-04-30 16:11:04 +00:00

  • Status changed from Assigned to Resolved
  • Done % changed from 90 to 100

ok thanks for testing!

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2014-12-08 14:46:33 +00:00

  • Project changed from 18 to Core, Classic UI, IDOUtils
  • Category changed from 71 to IDOUtils
  • Icinga Version set to 1
  • OS Version set to any

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant