Skip to content
This repository has been archived by the owner on Jan 15, 2019. It is now read-only.

[dev.icinga.com #4357] after sig 11 kill ido2db has no activity and stop icinga core #1297

Closed
icinga-migration opened this issue Jun 28, 2013 · 15 comments
Milestone

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/4357

Created by volkerg on 2013-06-28 07:35:32 +00:00

Assignee: mfriedrich
Status: Resolved (closed on 2014-03-03 18:54:38 +00:00)
Target Version: 1.11
Last Update: 2014-12-08 14:38:08 +00:00 (in Redmine)

Icinga Version: 1.10.0
OS Version: any

ido2db reports the following in the debug log
After that, there is no ido2db activity (starce -p PID) except DB maintenance which will not work, icinga core will sopping work later on ...

stopping the ido2db with /etc/init.d/ido2db stop kills only one ido2db process, one keeps still alive, kill -1 PID stops it immediatly

hu Jun 27 17:06:17 2013 .714706 [001.2] [pid=5338] [tid=140296096589568] ido2db_db_escape_string oracle changed string ('win_service_restart')
Thu Jun 27 17:06:17 2013 .714720 [001.2] [pid=5338] [tid=140296096589568] ido2db_oci_prepared_statement_bind_null_param() param=:X4
Thu Jun 27 17:06:17 2013 .716276 [001.2] [pid=5338] [tid=140296096589568] ido2db_query_objects_select_name1_name2(win_service_restart,(null)) object id=3470 selected
Thu Jun 27 17:06:17 2013 .716292 [001.2] [pid=5338] [tid=140296096589568] ido2db_get_object_id(3470) end
Thu Jun 27 17:06:17 2013 .716301 [001.2] [pid=5338] [tid=140296096589568] ido2db_query_insert_or_update_eventhandlerdata_add() start
Thu Jun 27 17:06:17 2013 .718503 [001.2] [pid=5338] [tid=140296096589568] ido2db_query_insert_or_update_eventhandlerdata() bind clob
Thu Jun 27 17:06:17 2013 .718527 [002.2] [pid=5338] [tid=140296096589568] ido2db_oci_bind_clob:Bind :X17i,Null or empty
Thu Jun 27 17:06:17 2013 .718554 [002.2] [pid=5338] [tid=140296096589568] ido2db_oci_bind_clob:Bind :X17u,Null or empty
Thu Jun 27 17:06:17 2013 .718564 [002.2] [pid=5338] [tid=140296096589568] ido2db_oci_bind_clob:Bind :X18i,Null or empty
Thu Jun 27 17:06:17 2013 .718574 [002.2] [pid=5338] [tid=140296096589568] ido2db_oci_bind_clob:Bind :X18u,Null or empty
Thu Jun 27 17:06:17 2013 .847688 [001.2] [pid=5338] [tid=140296096589568] ido2db_query_insert_or_update_eventhandlerdata() executed
Thu Jun 27 17:06:17 2013 .850571 [001.2] [pid=5338] [tid=140296096589568] OCIERROR - MSG ORA-01403: no data found

Thu Jun 27 17:06:17 2013 .850584 [001.2] [pid=5338] [tid=140296096589568] OCIERROR - MSG ORA-01403: no data found

Thu Jun 27 17:06:17 2013 .850591 [001.2] [pid=5338] [tid=140296096589568] ido2db_query_insert_or_update_eventhandlerdata_add() end
Thu Jun 27 17:06:17 2013 .850598 [001.2] [pid=5338] [tid=140296096589568] ido2db_handle_eventhandlerdata() end
Thu Jun 27 17:06:17 2013 .850603 [001.2] [pid=5338] [tid=140296096589568] ido2db_free_input_memory() start
Thu Jun 27 17:06:17 2013 .850611 [001.2] [pid=5338] [tid=140296096589568] ido2db_free_input_memory() end
Thu Jun 27 17:06:17 2013 .850616 [001.2] [pid=5338] [tid=140296096589568] ido2db_end_input_data() end
Thu Jun 27 17:06:17 2013 .850621 [001.2] [pid=5338] [tid=140296096589568] ido2db_handle_client_input() end
Thu Jun 27 17:06:17 2013 .850629 [001.2] [pid=5338] [tid=140296096589568] ido2db_handle_client_input start
Thu Jun 27 17:06:17 2013 .850634 [001.2] [pid=5338] [tid=140296096589568] ido2db_handle_client_input instance_name=a7329001
Thu Jun 27 17:06:17 2013 .850640 [001.2] [pid=5338] [tid=140296096589568] ido2db_handle_client_input start
Thu Jun 27 17:06:17 2013 .850645 [001.2] [pid=5338] [tid=140296096589568] ido2db_handle_client_input instance_name=a7329001
Thu Jun 27 17:06:17 2013 .850659 [001.2] [pid=5338] [tid=140296096589568] ido2db_handle_client_input start
Thu Jun 27 17:06:17 2013 .850664 [001.2] [pid=5338] [tid=140296096589568] ido2db_handle_client_input instance_name=a7329001
Thu Jun 27 17:06:17 2013 .850670 [001.2] [pid=5338] [tid=140296096589568] ido2db_handle_client_input() input_section
Thu Jun 27 17:06:17 2013 .850676 [001.2] [pid=5338] [tid=140296096589568] ido2db_start_input_data() start
Thu Jun 27 17:06:17 2013 .850681 [001.2] [pid=5338] [tid=140296096589568] ido2db_free_input_memory() start
Thu Jun 27 17:06:17 2013 .850686 [001.2] [pid=5338] [tid=140296096589568] ido2db_free_input_memory() end
Thu Jun 27 17:06:17 2013 .850714 [001.2] [pid=5338] [tid=140296096589568] ido2db_child_sighandler() start
Thu Jun 27 17:06:17 2013 .850723 [001.2] [pid=5338] [tid=140296096589568] Child caught signal '11' exiting

Changesets

2013-06-29 10:44:28 +00:00 by Tommi 82a87cd

ido2db: Oracle fix wrong freed lob variables #4357
refs #4357

2013-06-29 14:46:19 +00:00 by (unknown) 2ae8b74

update Changelog from #4357

refs #4357
@icinga-migration
Copy link
Author

Updated by mfriedrich on 2013-06-28 07:41:31 +00:00

  • Category set to 24
  • Status changed from New to Assigned
  • Assigned to set to Tommi
  • Priority changed from High to Normal

@icinga-migration
Copy link
Author

Updated by Tommi on 2013-06-29 10:51:37 +00:00

first time analysis:
looks like we have different issues here

  1. "ORA 01403" in handle eventhandlerdata. this error means, a select doestnt returned any row.
 Thu Jun 27 17:06:17 2013 .847688 [001.2] [pid=5338] [tid=140296096589568] ido2db_query_insert_or_update_eventhandlerdata() executed
 Thu Jun 27 17:06:17 2013 .850571 [001.2] [pid=5338] [tid=140296096589568] OCIERROR - MSG ORA-01403: no data found
 Thu Jun 27 17:06:17 2013 .850584 [001.2] [pid=5338] [tid=140296096589568] OCIERROR - MSG ORA-01403: no data found

But this is not causing sig11, because function closed ok

 Thu Jun 27 17:06:17 2013 .850591 [001.2] [pid=5338] [tid=140296096589568] ido2db_query_insert_or_update_eventhandlerdata_add() end
Thu Jun 27 17:06:17 2013 .850598 [001.2] [pid=5338] [tid=140296096589568] ido2db_handle_eventhandlerdata() end
Thu Jun 27 17:06:17 2013 .850603 [001.2] [pid=5338] [tid=140296096589568] ido2db_free_input_memory() start
Thu Jun 27 17:06:17 2013 .850611 [001.2] [pid=5338] [tid=140296096589568] ido2db_free_input_memory() end
Thu Jun 27 17:06:17 2013 .850616 [001.2] [pid=5338] [tid=140296096589568] ido2db_end_input_data() end
Thu Jun 27 17:06:17 2013 .850621 [001.2] [pid=5338] [tid=140296096589568] ido2db_handle_client_input() end

reason for the two ORA- might be a wrong variable within the lob free section

if (lob_oi !=NULL) OCI_LobFree(lob_loi);
if (lob_ou !=NULL) OCI_LobFree(lob_lou);

fixed in changeset 82a87cd

# afterwards, there is new data, which is running into SEGV. there are 3 handle_client_input start lines, but only one input section line

Thu Jun 27 17:06:17 2013 .850629 [001.2] [pid=5338] [tid=140296096589568] ido2db_handle_client_input start
Thu Jun 27 17:06:17 2013 .850634 [001.2] [pid=5338] [tid=140296096589568] ido2db_handle_client_input instance_name=a7329001
Thu Jun 27 17:06:17 2013 .850640 [001.2] [pid=5338] [tid=140296096589568] ido2db_handle_client_input start
Thu Jun 27 17:06:17 2013 .850645 [001.2] [pid=5338] [tid=140296096589568] ido2db_handle_client_input instance_name=a7329001
Thu Jun 27 17:06:17 2013 .850659 [001.2] [pid=5338] [tid=140296096589568] ido2db_handle_client_input start
Thu Jun 27 17:06:17 2013 .850664 [001.2] [pid=5338] [tid=140296096589568] ido2db_handle_client_input instance_name=a7329001
Thu Jun 27 17:06:17 2013 .850670 [001.2] [pid=5338] [tid=140296096589568] ido2db_handle_client_input() input_section
Thu Jun 27 17:06:17 2013 .850676 [001.2] [pid=5338] [tid=140296096589568] ido2db_start_input_data() start
Thu Jun 27 17:06:17 2013 .850681 [001.2] [pid=5338] [tid=140296096589568] ido2db_free_input_memory() start
Thu Jun 27 17:06:17 2013 .850686 [001.2] [pid=5338] [tid=140296096589568] ido2db_free_input_memory() end
Thu Jun 27 17:06:17 2013 .850714 [001.2] [pid=5338] [tid=140296096589568] ido2db_child_sighandler() start
Thu Jun 27 17:06:17 2013 .850723 [001.2] [pid=5338] [tid=140296096589568] Child caught signal '11' exiting

# trimming task tries to use invalid statement handles

Mon Jun 24 19:27:34 2013 .010960 [001.2] [pid=30152] [tid=140408479090432] ido2db_oci_statement_free:Statement oci_statement_sla_services_select freed
Mon Jun 24 19:27:34 2013 .010969 [001.2] [pid=30152] [tid=140408479090432] ido2db_oci_statement_free:Statement oci_statement_sla_downtime_select freed
Mon Jun 24 19:27:34 2013 .010977 [001.2] [pid=30152] [tid=140408479090432] ido2db_oci_statement_free:Statement oci_statement_sla_history_select freed
Mon Jun 24 19:27:34 2013 .010989 [001.2] [pid=30152] [tid=140408479090432] ido2db_oci_statement_free:Statement oci_statement_sla_history_merge freed
Mon Jun 24 19:27:34 2013 .010997 [001.2] [pid=30152] [tid=140408479090432] ido2db_oci_statement_free:Statement oci_statement_sla_history_delete freed
Mon Jun 24 19:41:12 2013 .844631 [001.2] [pid=30152] [tid=140408355215104] ido2db_db_perform_maintenance() start
Mon Jun 24 19:41:12 2013 .844877 [001.2] [pid=30152] [tid=140408355215104] ido2db_db_trim_data_table() start, time=1372066872
Mon Jun 24 19:41:12 2013 .845281 [001.2] [pid=30152] [tid=140408355215104] OCILIB - MSG The statement must be prepared to perform this operation at pos 0 in QUERY '(null)'

looks like the statements are cleaned after SIG11,but trimming process tries to reuse these already freed handles.

the last two points i currently dont know how to fix

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2013-06-29 10:58:33 +00:00

  • Target Version set to 1.9.2
  1. please rebase your fix into the branch "fix/sig11-ido2db-4357" and remove the one below tdressler/* following the branching model
  2. enable_sla is deprecated and will be removed in 1.10 so no need to care here anyways

@icinga-migration
Copy link
Author

Updated by Tommi on 2013-06-29 11:14:52 +00:00

i suspect the sig11 in start_input_data, because there is a start, but no end within the logs. free_input_memory writes its end task, so looks like the sig11 may only occur in the buffer allocation part. But cant see an error within this.

Thu Jun 27 17:06:17 2013 .850676 [001.2] [pid=5338] [tid=140296096589568] ido2db_start_input_data() start
Thu Jun 27 17:06:17 2013 .850681 [001.2] [pid=5338] [tid=140296096589568] ido2db_free_input_memory() start
Thu Jun 27 17:06:17 2013 .850686 [001.2] [pid=5338] [tid=140296096589568] ido2db_free_input_memory() end
Thu Jun 27 17:06:17 2013 .850714 [001.2] [pid=5338] [tid=140296096589568] ido2db_child_sighandler() start
Thu Jun 27 17:06:17 2013 .850723 [001.2] [pid=5338] [tid=140296096589568] Child caught signal '11' exiting

int ido2db_start_input_data(ido2db_idi *idi) {
    int x;

    ido2db_log_debug_info(IDO2DB_DEBUGL_PROCESSINFO, 2, "ido2db_start_input_data() start\n");

    if (idi == NULL)
        return IDO_ERROR;

    /* sometimes ido2db_end_input_data() isn't called, so free memory if we find it */
    ido2db_free_input_memory(idi);

    /* allocate memory for holding buffered input */
    if ((idi->buffered_input = (char **)malloc(sizeof(char *) * IDO_MAX_DATA_TYPES)) == NULL)
        return IDO_ERROR;

    /* initialize buffered input slots */
    for (x = 0; x < IDO_MAX_DATA_TYPES; x++)
        idi->buffered_input[x] = NULL;

    ido2db_log_debug_info(IDO2DB_DEBUGL_PROCESSINFO, 2, "ido2db_start_input_data() end\n");

    return IDO_OK;
}

@icinga-migration
Copy link
Author

Updated by Tommi on 2013-06-29 12:12:58 +00:00

  • Done % changed from 0 to 30

branch is moved as suggested. dont got the point its allowed now to write direct into fixes/features branches without using my prefix

trimming tries to use oci_statement_instances_delete_time prepared statement, which is deallocated in ido2db_db_disconnect called from id2db_terminate_threads in child_signal_handler

ido2db_oci_statement_free(idi->dbinfo.oci_statement_instances_delete_time, "oci_statement_instances_delete_time");

therefore trimming must fail if its using the same idi structure. it would be consequent to stop the trimming task in case of a child signal or connect seperately with its own connection and statement handlers. which way is the prefered one?

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2013-07-20 12:18:57 +00:00

  • Target Version changed from 1.9.2 to 1.10

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2013-09-14 15:26:04 +00:00

fix is in next, anything else to be done here?

@icinga-migration
Copy link
Author

Updated by Tommi on 2013-09-15 07:23:31 +00:00

There is an open point, which came up along this issue. Trimming task tries to use a prepared handle, which has been already deinitialized by the signal handler of the main task. Maybe it's an option to defer the statement deinitializing after the trimming task is really terminated or give the trimming task its own handlers. But up to now i dont have any glue which is the right solution nor how to implement it.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2013-09-21 16:30:55 +00:00

  • Target Version changed from 1.10 to 1.11

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2014-01-25 16:26:41 +00:00

Any updates?

@icinga-migration
Copy link
Author

Updated by Tommi on 2014-01-26 10:36:33 +00:00

There was no further complains about it so i tend not to touch the threading things i had in mind. Looks like the already implemented patch is sufficient. I suggest to close this issue.

@icinga-migration
Copy link
Author

Updated by Tommi on 2014-02-21 17:34:48 +00:00

  • Status changed from Assigned to Feedback
  • Assigned to changed from Tommi to mfriedrich

looks like the already provided patch solves the situation in real. Maybe there is a potential problem with the order of statement termination for the threads. Anyway, this is hard to debug for me and it results only in a obsolete message.
I suggest to close it for now

@icinga-migration
Copy link
Author

Updated by volkerg on 2014-02-24 06:08:55 +00:00

Yes, the patch solved it, thx
never happened again.

regards

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2014-03-03 18:54:38 +00:00

  • Status changed from Feedback to Resolved
  • Done % changed from 30 to 100

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2014-12-08 14:38:09 +00:00

  • Project changed from 18 to Core, Classic UI, IDOUtils
  • Category changed from 24 to IDOUtils
  • Icinga Version changed from 1 to 1
  • OS Version set to any

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant