Skip to content
This repository has been archived by the owner on Jan 15, 2019. It is now read-only.

[dev.icinga.com #2500] idoutils oracle start hangs #926

Closed
icinga-migration opened this issue Apr 4, 2012 · 9 comments
Closed

[dev.icinga.com #2500] idoutils oracle start hangs #926

icinga-migration opened this issue Apr 4, 2012 · 9 comments
Milestone

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/2500

Created by mfriedrich on 2012-04-04 08:04:52 +00:00

Assignee: mfriedrich
Status: Resolved (closed on 2012-04-19 13:07:17 +00:00)
Target Version: 1.7
Last Update: 2014-12-08 14:35:58 +00:00 (in Redmine)

Icinga Version: 1.10.0
OS Version: any

rhel 5.8 x64

for some reason, starting ido2db via initscript hangs, but ido2db gets idomod connected and gets operational. it seems that on fork, it does not cleanly free ressources.

this is a new behaviour. did not have that with 1.6.x

a previously fully killed ido2db brings that on startup

# service ido2db restart
Stopping ido2db: /etc/init.d/ido2db: line 114: kill: (18544) - No such process
done.
Starting ido2db:



# ps aux | grep ido
root     18575  0.0  0.0  64016  1352 pts/2    S+   09:31   0:00 /bin/sh /sbin/service ido2db restart
root     18580  0.0  0.0  66108  1444 pts/2    S+   09:31   0:00 /bin/sh /etc/init.d/ido2db restart
root     18590  0.0  0.0  66112  1460 pts/2    S+   09:31   0:00 /bin/sh /etc/init.d/ido2db start
nagios   18599  0.0  0.3 188436  6992 pts/2    S+   09:31   0:00 /usr/bin/ido2db -c /etc/nagios/ido2db.cfg
nagios   18600  0.0  0.0      0     0 pts/2    Z+   09:31   0:00 [ido2db] 
nagios   18601  0.0  0.0 188436  1992 ?        Ss   09:31   0:00 /usr/bin/ido2db -c /etc/nagios/ido2db.cfg
nagios   18674  0.7  0.5 203172 12112 ?        Sl   09:32   0:04 /usr/bin/ido2db -c /etc/nagios/ido2db.cfg
root     20268  0.0  0.0  61300   764 pts/3    S+   09:41   0:00 grep ido

possible that this has something to do with the oci pre-initializer to get the version info?

doing a killall -9 ido2db results in this.

# service ido2db restart
Stopping ido2db: /etc/init.d/ido2db: line 114: kill: (18544) - No such process
done.
Starting ido2db:



/sbin/service: line 66: 18580 Killed                  env -i LANG="$LANG" PATH="$PATH" TERM="$TERM" "${SERVICEDIR}/${SERVICE}" ${OPTIONS}

that /sbin/service output can be remove by just invoking the init script, where the exact same happens then.

# /etc/init.d/ido2db restart
Stopping ido2db: /etc/init.d/ido2db: line 114: kill: (18601) - No such process
done.
Starting ido2db:Killed

pinned down to manual start

# rm -f /var/nagios/ido.sock /var/nagios/ido2db.lock ;  /usr/bin/ido2db -c /etc/nagios/ido2db.cfg

hangs ...

nagios   21645  0.0  0.3 188436  6980 pts/2    S+   09:59   0:00 /usr/bin/ido2db -c /etc/nagios/ido2db.cfg
nagios   21646  0.0  0.0      0     0 pts/2    Z+   09:59   0:00 [ido2db] 
nagios   21647  0.0  0.0 188436  2004 ?        Ss   09:59   0:00 /usr/bin/ido2db -c /etc/nagios/ido2db.cfg
nagios   21653  1.8  0.4 201304 10224 ?        Sl   09:59   0:00 /usr/bin/ido2db -c /etc/nagios/ido2db.cfg
root     21729  0.0  0.0  61296   760 pts/3    S+   10:00   0:00 grep ido

# strace -p 21645
Process 21645 attached - interrupt to quit
futex(0x3ee67529e0, FUTEX_WAIT_PRIVATE, 2, NULL

# strace -p 21646
attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted

# strace -p 21647
Process 21647 attached - interrupt to quit
accept(6, 

# strace -p 21653
Process 21653 attached - interrupt to quit
read(8, "\n203:\n1=400\n2=0\n3=0\n4=1333526618"..., 511) = 214
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
write(5, "Wed Apr  4 10:03:38 2012 .091718"..., 120) = 120
lseek(5, 0, SEEK_CUR)                   = 55420121
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
write(5, "Wed Apr  4 10:03:38 2012 .092098"..., 107) = 107
lseek(5, 0, SEEK_CUR)                   = 55420228
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
write(5, "Wed Apr  4 10:03:38 2012 .092408"..., 130) = 130
lseek(5, 0, SEEK_CUR)                   = 55420358
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
write(5, "Wed Apr  4 10:03:38 2012 .092743"..., 107) = 107
lseek(5, 0, SEEK_CUR)                   = 55420465
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
write(5, "Wed Apr  4 10:03:38 2012 .093055"..., 130) = 130
lseek(5, 0, SEEK_CUR)                   = 55420595
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
write(5, "Wed Apr  4 10:03:38 2012 .093342"..., 117) = 117
lseek(5, 0, SEEK_CUR)                   = 55420712
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
write(5, "Wed Apr  4 10:03:38 2012 .093651"..., 106) = 106
lseek(5, 0, SEEK_CUR)                   = 55420818
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
write(5, "Wed Apr  4 10:03:38 2012 .093962"..., 107) = 107
lseek(5, 0, SEEK_CUR)                   = 55420925
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
write(5, "Wed Apr  4 10:03:38 2012 .094250"..., 105) = 105
lseek(5, 0, SEEK_CUR)                   = 55421030
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
write(5, "Wed Apr  4 10:03:38 2012 .094559"..., 104) = 104
lseek(5, 0, SEEK_CUR)                   = 55421134
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
write(5, "Wed Apr  4 10:03:38 2012 .094867"..., 107) = 107
lseek(5, 0, SEEK_CUR)                   = 55421241
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
write(5, "Wed Apr  4 10:03:38 2012 .095155"..., 107) = 107
lseek(5, 0, SEEK_CUR)                   = 55421348
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
write(5, "Wed Apr  4 10:03:38 2012 .095437"..., 130) = 130
lseek(5, 0, SEEK_CUR)                   = 55421478
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2211, ...}) = 0
write(5, "Wed Apr  4 10:03:38 2012 .095768"..., 117) = 117

debug log looks rather normal in that case.

Wed Apr  4 09:29:36 2012 .805278 [001.2] [pid=16637] [tid=47373143706560] ido2db_open_debug_log() end
Wed Apr  4 09:29:36 2012 .805343 [001.2] [pid=16637] [tid=47373143706560] processing signal '15'
Wed Apr  4 09:29:36 2012 .805362 [001.2] [pid=16637] [tid=47373143706560] forward signal to all members of this group of processes
Wed Apr  4 09:29:36 2012 .808643 [001.2] [pid=16637] [tid=47373143706560] ido2db_cleanup_socket() start
Wed Apr  4 09:29:36 2012 .809585 [001.2] [pid=16637] [tid=47373143706560] ido2db_cleanup_socket() end
Wed Apr  4 09:29:36 2012 .809749 [001.2] [pid=16637] [tid=47373143706560] ido2db_free_program_memory() start
Wed Apr  4 09:29:36 2012 .809779 [001.2] [pid=16637] [tid=47373143706560] ido2db_free_program_memory() end
Wed Apr  4 09:29:36 2012 .809798 [001.2] [pid=16637] [tid=47373143706560] ido2db_parent_sighandler() end
Wed Apr  4 09:29:37 2012 .803533 [001.2] [pid=18216] [tid=47627644107712] ido2db_open_debug_log() end
Wed Apr  4 09:29:37 2012 .803583 [001.2] [pid=18216] [tid=47627644107712] ido2db_wait_for_connections() start
Wed Apr  4 09:29:37 2012 .803633 [001.2] [pid=18216] [tid=47627644107712] ido2db_daemonize() start
Wed Apr  4 09:29:37 2012 .804045 [001.2] [pid=18217] [tid=47627644107712] ido2db_daemonize() child forks again
Wed Apr  4 09:29:37 2012 .804461 [001.2] [pid=18218] [tid=47627644107712] ido2db_daemonize() grandchild continues and  becomes session leader
Wed Apr  4 09:29:37 2012 .804694 [001.2] [pid=18218] [tid=47627644107712] ido2db_daemonize() end
Wed Apr  4 09:29:37 2012 .804786 [001.2] [pid=18217] [tid=47627644107712] ido2db_daemonize() first child process goes away
Wed Apr  4 09:29:37 2012 .804826 [001.2] [pid=18217] [tid=47627644107712] ido2db_free_program_memory() start
Wed Apr  4 09:29:37 2012 .804854 [001.2] [pid=18217] [tid=47627644107712] ido2db_free_program_memory() end
Wed Apr  4 09:29:37 2012 .805412 [001.2] [pid=18216] [tid=47627644107712] ido2db_parent_sighandler() start
Wed Apr  4 09:29:37 2012 .805491 [001.2] [pid=18216] [tid=47627644107712] processing signal '17'
Wed Apr  4 09:29:54 2012 .362904 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_connection() start
Wed Apr  4 09:29:54 2012 .363262 [001.2] [pid=18305] [tid=47627644107712] ido2db_open_debug_log() end
Wed Apr  4 09:29:54 2012 .363446 [001.2] [pid=18305] [tid=47627644107712] ido2db_idi_init() start
Wed Apr  4 09:29:54 2012 .363492 [001.2] [pid=18305] [tid=47627644107712] ido2db_idi_init() end
Wed Apr  4 09:29:54 2012 .363537 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_init() start
Wed Apr  4 09:29:54 2012 .367041 [001.2] [pid=18305] [tid=47627644107712] OCI_Initialize() ok
Wed Apr  4 09:29:54 2012 .367086 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_init() end
Wed Apr  4 09:29:54 2012 .367123 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_connect() start
Wed Apr  4 09:29:54 2012 .411842 [001.2] [pid=18305] [tid=47627644107712] Successfully connected to database 'ido2db'@'//host.com/ICINGADB' (server version 11.2.11, client version [1110], ocilib version 3.9.2) 
Wed Apr  4 09:29:54 2012 .411884 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_connect() prepare statements start
Wed Apr  4 09:29:54 2012 .411904 [001.2] [pid=18305] [tid=47627644107712] ido2db_oci_prepared_statement_objects_insert() start
Wed Apr  4 09:29:54 2012 .412042 [001.2] [pid=18305] [tid=47627644107712] ido2db_oci_prepared_statement_objects_insert() end
Wed Apr  4 09:29:54 2012 .416026 [001.2] [pid=18305] [tid=47627644107712] oci_prepared_statement_sequence_lastid() start
Wed Apr  4 09:29:54 2012 .416062 [002.2] [pid=18305] [tid=47627644107712] oci_prepared_statement_sequence_lastid query: SELECT last_number-1 from user_sequences where sequence_name=upper(:X1)
Wed Apr  4 09:29:54 2012 .416103 [001.2] [pid=18305] [tid=47627644107712] oci_prepared_statement_sequence_lastid end
Wed Apr  4 09:29:54 2012 .416125 [001.2] [pid=18305] [tid=47627644107712] ido2db_oci_prepared_statement_sequence_select prepare statements end
Wed Apr  4 09:29:54 2012 .416346 [001.2] [pid=18305] [tid=47627644107712] ido2db_oci_prepared_statement_dbversion_select() prepare statements end
Wed Apr  4 09:29:54 2012 .416392 [001.2] [pid=18305] [tid=47627644107712] ido2db_oci_set_trace_event:set trace level event to 0
Wed Apr  4 09:29:54 2012 .416481 [002.2] [pid=18305] [tid=47627644107712] ido2db_oci_set_trace_event Binds:1 BindVars -->[Name:':level',Type:uInt,Val:'0']
Wed Apr  4 09:29:54 2012 .419737 [002.2] [pid=18305] [tid=47627644107712] ido2db_oci_set_trace_event DBMSOUT:Error: No "Alter session" right
Wed Apr  4 09:29:54 2012 .420482 [001.2] [pid=18305] [tid=47627644107712] ido2db_oci_statement_free:Statement ido2db_oci_set_trace_event freed
Wed Apr  4 09:29:54 2012 .420517 [001.2] [pid=18305] [tid=47627644107712] ido2db_oci_set_trace_event end
Wed Apr  4 09:29:54 2012 .420536 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_connect(0) end
Wed Apr  4 09:29:54 2012 .420581 [001.2] [pid=18305] [tid=47627644107712] ido2db_check_for_client_input() dbuf.size=511
Wed Apr  4 09:29:54 2012 .420617 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input start
Wed Apr  4 09:29:54 2012 .420634 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input instance_name=(null)
Wed Apr  4 09:29:54 2012 .420655 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input start
Wed Apr  4 09:29:54 2012 .420671 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input instance_name=(null)
Wed Apr  4 09:29:54 2012 .420686 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input start
Wed Apr  4 09:29:54 2012 .420701 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input instance_name=(null)
Wed Apr  4 09:29:54 2012 .420726 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input() input_section
Wed Apr  4 09:29:54 2012 .420743 [001.2] [pid=18305] [tid=47627644107712] ido2db_free_connection_memory() start
Wed Apr  4 09:29:54 2012 .420758 [001.2] [pid=18305] [tid=47627644107712] ido2db_free_connection_memory() end
Wed Apr  4 09:29:54 2012 .420772 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input() end
Wed Apr  4 09:29:54 2012 .420787 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input start
Wed Apr  4 09:29:54 2012 .420802 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input instance_name=(null)
Wed Apr  4 09:29:54 2012 .420816 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input() input_section
Wed Apr  4 09:29:54 2012 .420852 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input() end
Wed Apr  4 09:29:54 2012 .420868 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input start
Wed Apr  4 09:29:54 2012 .420883 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input instance_name=(null)
Wed Apr  4 09:29:54 2012 .420897 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input() input_section
Wed Apr  4 09:29:54 2012 .420913 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input() end
Wed Apr  4 09:29:54 2012 .420928 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input start
Wed Apr  4 09:29:54 2012 .420943 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input instance_name=(null)
Wed Apr  4 09:29:54 2012 .420957 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input() input_section
Wed Apr  4 09:29:54 2012 .420972 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input() end
Wed Apr  4 09:29:54 2012 .420987 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input start
Wed Apr  4 09:29:54 2012 .421002 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input instance_name=(null)
Wed Apr  4 09:29:54 2012 .421016 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input() input_section
Wed Apr  4 09:29:54 2012 .421032 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input() end
Wed Apr  4 09:29:54 2012 .421047 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input start
Wed Apr  4 09:29:54 2012 .421061 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input instance_name=(null)
Wed Apr  4 09:29:54 2012 .421087 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input() input_section
Wed Apr  4 09:29:54 2012 .421104 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input() end
Wed Apr  4 09:29:54 2012 .421119 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input start
Wed Apr  4 09:29:54 2012 .421134 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input instance_name=(null)
Wed Apr  4 09:29:54 2012 .421148 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input() input_section
Wed Apr  4 09:29:54 2012 .421163 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input() end
Wed Apr  4 09:29:54 2012 .421178 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input start
Wed Apr  4 09:29:54 2012 .421210 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input instance_name=(null)
Wed Apr  4 09:29:54 2012 .421226 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input() input_section
Wed Apr  4 09:29:54 2012 .421241 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input() end
Wed Apr  4 09:29:54 2012 .421256 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input start
Wed Apr  4 09:29:54 2012 .421270 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input instance_name=(null)
Wed Apr  4 09:29:54 2012 .421284 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input() input_section
Wed Apr  4 09:29:54 2012 .421299 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input() end
Wed Apr  4 09:29:54 2012 .421314 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input start
Wed Apr  4 09:29:54 2012 .421328 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input instance_name=blub
Wed Apr  4 09:29:54 2012 .421343 [001.2] [pid=18305] [tid=47627644107712] ido2db_handle_client_input() input_section
Wed Apr  4 09:29:54 2012 .421361 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_hello() start
Wed Apr  4 09:29:54 2012 .421379 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_version_check () start 
Wed Apr  4 09:29:54 2012 .423127 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_version_check() query ok
Wed Apr  4 09:29:54 2012 .423212 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_version_check() fetchnext ok
Wed Apr  4 09:29:54 2012 .423244 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_hello(version=1.7.0)
Wed Apr  4 09:29:54 2012 .423263 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_version_check () end
Wed Apr  4 09:29:54 2012 .424823 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_hello() query ok
Wed Apr  4 09:29:54 2012 .424878 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_hello() fetchnext ok
Wed Apr  4 09:29:54 2012 .424932 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_hello(instance_id=2)
Wed Apr  4 09:29:54 2012 .433250 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_hello(552) conninfo_id
Wed Apr  4 09:29:54 2012 .433286 [001.2] [pid=18305] [tid=47627644107712] ido2db_oci_set_session_info:Set Session Info to IDOMOD
Wed Apr  4 09:29:54 2012 .433386 [002.2] [pid=18305] [tid=47627644107712] ido2db_oci_set_session_info Binds:2 BindVars -->[Name:':module',Type:Text Size:6,Val:'IDO2DB'][Name:':action',Type:Text Size:6,Val:'IDOMOD']
Wed Apr  4 09:29:54 2012 .435779 [001.2] [pid=18305] [tid=47627644107712] ido2db_oci_statement_free:Statement ido2db_oci_set_session_info freed
Wed Apr  4 09:29:54 2012 .436569 [001.2] [pid=18305] [tid=47627644107712] ido2db_oci_set_session_info: TimeZone Set to UTC OK
Wed Apr  4 09:29:54 2012 .436617 [001.2] [pid=18305] [tid=47627644107712] ido2db_oci_statement_free:Statement ido2db_oci_set_session_info freed
Wed Apr  4 09:29:54 2012 .436639 [001.2] [pid=18305] [tid=47627644107712] ido2db_oci_set_session_info end
Wed Apr  4 09:29:54 2012 .436655 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_hello() get cached object ids
Wed Apr  4 09:29:54 2012 .436674 [001.2] [pid=18305] [tid=47627644107712] ido2db_get_cached_object_ids() start
Wed Apr  4 09:29:54 2012 .438451 [001.2] [pid=18305] [tid=47627644107712] ido2db_get_cached_object_ids() fetchnext ok
Wed Apr  4 09:29:54 2012 .438508 [001.2] [pid=18305] [tid=47627644107712] ido2db_add_cached_object_id() start
Wed Apr  4 09:29:54 2012 .438810 [001.2] [pid=18305] [tid=47627644107712] ido2db_object_hashfunc() start
Wed Apr  4 09:29:54 2012 .438849 [001.2] [pid=18305] [tid=47627644107712] ido2db_object_hashfunc() end
Wed Apr  4 09:29:54 2012 .438871 [001.2] [pid=18305] [tid=47627644107712] ido2db_add_cached_object_id() end
Wed Apr  4 09:29:54 2012 .438886 [001.2] [pid=18305] [tid=47627644107712] ido2db_get_cached_object_ids(693) end
Wed Apr  4 09:29:54 2012 .438902 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_get_latest_data_time() start
Wed Apr  4 09:29:54 2012 .438949 [002.2] [pid=18305] [tid=47627644107712] ido2db_db_get_latest_data_time() SQL [ select max(localts2unixts(status_update_time)) from programstatus where instance_id=:ID ] prepared
Wed Apr  4 09:29:54 2012 .439007 [002.2] [pid=18305] [tid=47627644107712] ido2db_db_get_latest_data_time Binds:1 BindVars -->[Name:':ID',Type:uInt,Val:'2']
Wed Apr  4 09:29:54 2012 .446108 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_get_latest_data_time query ok
Wed Apr  4 09:29:54 2012 .446156 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_get_latest_data_time() latest=1333524572
Wed Apr  4 09:29:54 2012 .446195 [001.2] [pid=18305] [tid=47627644107712] ido2db_oci_statement_free:Statement ido2db_db_get_latest_data_time freed
Wed Apr  4 09:29:54 2012 .446215 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_get_latest_data_time() end
Wed Apr  4 09:29:54 2012 .446231 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_get_latest_data_time() start
Wed Apr  4 09:29:54 2012 .446285 [002.2] [pid=18305] [tid=47627644107712] ido2db_db_get_latest_data_time() SQL [ select max(localts2unixts(status_update_time)) from hoststatus where instance_id=:ID ] prepared
Wed Apr  4 09:29:54 2012 .446331 [002.2] [pid=18305] [tid=47627644107712] ido2db_db_get_latest_data_time Binds:1 BindVars -->[Name:':ID',Type:uInt,Val:'2']
Wed Apr  4 09:29:54 2012 .454831 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_get_latest_data_time query ok
Wed Apr  4 09:29:54 2012 .454880 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_get_latest_data_time() latest=1333524554
Wed Apr  4 09:29:54 2012 .454917 [001.2] [pid=18305] [tid=47627644107712] ido2db_oci_statement_free:Statement ido2db_db_get_latest_data_time freed
Wed Apr  4 09:29:54 2012 .454937 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_get_latest_data_time() end
Wed Apr  4 09:29:54 2012 .454953 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_get_latest_data_time() start
Wed Apr  4 09:29:54 2012 .454989 [002.2] [pid=18305] [tid=47627644107712] ido2db_db_get_latest_data_time() SQL [ select max(localts2unixts(status_update_time)) from servicestatus where instance_id=:ID ] prepared
Wed Apr  4 09:29:54 2012 .455027 [002.2] [pid=18305] [tid=47627644107712] ido2db_db_get_latest_data_time Binds:1 BindVars -->[Name:':ID',Type:uInt,Val:'2']
Wed Apr  4 09:29:54 2012 .556393 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_get_latest_data_time query ok
Wed Apr  4 09:29:54 2012 .556475 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_get_latest_data_time() latest=1333524567
Wed Apr  4 09:29:54 2012 .556524 [001.2] [pid=18305] [tid=47627644107712] ido2db_oci_statement_free:Statement ido2db_db_get_latest_data_time freed
Wed Apr  4 09:29:54 2012 .556545 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_get_latest_data_time() end
Wed Apr  4 09:29:54 2012 .556561 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_get_latest_data_time() start
Wed Apr  4 09:29:54 2012 .556608 [002.2] [pid=18305] [tid=47627644107712] ido2db_db_get_latest_data_time() SQL [ select max(localts2unixts(status_update_time)) from contactstatus where instance_id=:ID ] prepared
Wed Apr  4 09:29:54 2012 .556652 [002.2] [pid=18305] [tid=47627644107712] ido2db_db_get_latest_data_time Binds:1 BindVars -->[Name:':ID',Type:uInt,Val:'2']
Wed Apr  4 09:29:54 2012 .559262 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_get_latest_data_time query ok
Wed Apr  4 09:29:54 2012 .559314 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_get_latest_data_time() latest=1333452032
Wed Apr  4 09:29:54 2012 .559352 [001.2] [pid=18305] [tid=47627644107712] ido2db_oci_statement_free:Statement ido2db_db_get_latest_data_time freed
Wed Apr  4 09:29:54 2012 .559373 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_get_latest_data_time() end
Wed Apr  4 09:29:54 2012 .559388 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_get_latest_data_time() start
Wed Apr  4 09:29:54 2012 .559426 [002.2] [pid=18305] [tid=47627644107712] ido2db_db_get_latest_data_time() SQL [ select max(localts2unixts(queued_time)) from timedeventqueue where instance_id=:ID ] prepared
Wed Apr  4 09:29:54 2012 .559464 [002.2] [pid=18305] [tid=47627644107712] ido2db_db_get_latest_data_time Binds:1 BindVars -->[Name:':ID',Type:uInt,Val:'2']
Wed Apr  4 09:29:54 2012 .561832 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_get_latest_data_time query ok
Wed Apr  4 09:29:54 2012 .561877 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_get_latest_data_time() latest=0
Wed Apr  4 09:29:54 2012 .561923 [001.2] [pid=18305] [tid=47627644107712] ido2db_oci_statement_free:Statement ido2db_db_get_latest_data_time freed
Wed Apr  4 09:29:54 2012 .561960 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_get_latest_data_time() end
Wed Apr  4 09:29:54 2012 .561978 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_get_latest_data_time() start
Wed Apr  4 09:29:54 2012 .562021 [002.2] [pid=18305] [tid=47627644107712] ido2db_db_get_latest_data_time() SQL [ select max(localts2unixts(entry_time)) from comments where instance_id=:ID ] prepared
Wed Apr  4 09:29:54 2012 .562059 [002.2] [pid=18305] [tid=47627644107712] ido2db_db_get_latest_data_time Binds:1 BindVars -->[Name:':ID',Type:uInt,Val:'2']
Wed Apr  4 09:29:54 2012 .565038 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_get_latest_data_time query ok
Wed Apr  4 09:29:54 2012 .565091 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_get_latest_data_time() latest=1333521750
Wed Apr  4 09:29:54 2012 .565126 [001.2] [pid=18305] [tid=47627644107712] ido2db_oci_statement_free:Statement ido2db_db_get_latest_data_time freed
Wed Apr  4 09:29:54 2012 .565176 [001.2] [pid=18305] [tid=47627644107712] ido2db_db_get_latest_data_time() end

Changesets

2012-04-04 11:21:20 +00:00 by mfriedrich 07d2f27

partly revert #2271, add soft_lock around debug log, use reentrant localtime_r #2500

the recent change in #2271 introduced a change of vfprintf
to vasprintf afterwards. while at first sight the usage of
the non-reentrant / non-threadsafe localtime() seemed to be
the root cause (fixed in this commit to use localtime_r())
the real cause is the change of writing to that file
descriptor.

the observed behaviour sets the parent process to FUTEX_WAIT_PRIVATE
waiting for the child to terminate which is not possible
still having a lock on the file descriptor.

adding a soft_lock on the debug log not to interfere with
locks inherited from parent to child is good & added, but
does not solve the root cause either.

the reason for that changed behaviour is the usage of vasprintf
and a local buffer rather than vfprintf on fmt directly.

passing a NULL ptr to log_debug_info should be checked upon
call to that function by the caller himself.

this is a partly revert of 2256b95874b5eead76b7aef2a68d9f7b90858a68
and solves #2500

refs #2271
refs #2500

Relations:

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2012-04-04 09:15:29 +00:00

the program does not react to SIGTERM anymore, SIGKILL is the only way to stop it.

# gdb /home/xxx/icinga/icinga-core/module/idoutils/src/ido2db
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-42.el5)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
...
Reading symbols from /home/xxx/icinga/icinga-core/module/idoutils/src/ido2db...done.
(gdb) set follow-fork-mode child
(gdb) set args -c /etc/nagios/ido2db.cfg
(gdb) run
Starting program: /home/xxx/icinga/icinga-core/module/idoutils/src/ido2db -c /etc/nagios/ido2db.cfg
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x2aaaaaaab000
[Thread debugging using libthread_db enabled]
[New process 9269]
[Thread debugging using libthread_db enabled]
[New process 9270]
[Thread debugging using libthread_db enabled]


Program terminated with signal SIGKILL, Killed.
The program no longer exists.

could be an os problem as well, as this box was recently updated to rhel 5.8 too.

we had that futex problem a while back with livestatus on the core itsself.
https://dev.icinga.org/issues/786

the problem is the parent waiting for a child to terminate, which hangs somewhere - e.g. in localtime, possibly not being threadsafe???

the only occurence of localtime is in module/idoutils/src/logging.c which could possibly lead into a lock on the logfile not being returned properly? or is this the wrong direction to go?

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2012-04-04 11:11:28 +00:00

more of that, with some kind words as well
http://lists.op5.com/pipermail/op5-users/2010-November/001554.html

using localtime_r instead in debug logging solves it not entirely. reverting #2272 solves the problem, seems that vasprintf implementation does not work.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2012-04-04 11:19:04 +00:00

  • Category set to 71
  • Status changed from New to Feedback
  • Assigned to set to mfriedrich
  • Target Version set to 1.7
  • Done % changed from 0 to 50

reverting the change of #2271 does work. but in order to stay even more safe, we'll re-use the soft locking method of the core in base/logging.c trying to get a lock on the debug file ptr and bail out early if not possible.

this will prevent us from further locks in there, and drop the futex on the parent waiting for the first child to die at first stage (grandchild as session leader - that's the key after the 2 forks on startup).

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2012-04-04 11:46:01 +00:00

vasprintf and local char* buf is the root cause, re-changing to vfprintf makes it work again.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2012-04-04 12:24:07 +00:00

  • Done % changed from 50 to 100

@icinga-migration
Copy link
Author

Updated by Tommi on 2012-04-04 18:36:48 +00:00

i never saw similar on my centos x64 box or in solaris, which is both running for weeks with vasprintf enabled. Looks like not to be a general issue. concurrent file access might be really a issue, but the buffer things... , hmm. Hope i can test this new version on solaris before my vacations next week (and the general replacement of v/a/snprintf functions on solaris will work as expected)

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2012-04-04 22:34:17 +00:00

as stated in the other issue, changing this again is a nogo. look into the core logging.c where the exact same is done with vfprintf. why? because code is safe not passing null pointers to log functions. maybe idoutils will use a shared icinga library with logging scaffolds - not really the goal to keep things different in common projects.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2012-04-19 13:07:17 +00:00

  • Status changed from Feedback to Resolved

for now this works here.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2014-12-08 14:35:58 +00:00

  • Project changed from 18 to Core, Classic UI, IDOUtils
  • Category changed from 71 to IDOUtils
  • Icinga Version set to 1
  • OS Version set to any

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant