Skip to content
This repository has been archived by the owner on Jan 15, 2019. It is now read-only.

[dev.icinga.com #1934] [RFC] change order of data dumping in idoutils #746

Closed
icinga-migration opened this issue Sep 25, 2011 · 16 comments
Closed

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/1934

Created by mfriedrich on 2011-09-25 11:31:57 +00:00

Assignee: mfriedrich
Status: Closed (closed on 2013-09-21 17:34:25 +00:00)
Target Version: (none)
Last Update: 2014-12-08 14:37:24 +00:00 (in Redmine)


currently, idomod dumps various data based on different callbacks happening.

#1259 introduced a change already, moving the config dump into the event loop (the retained data would have happened immediately after being read).

by moving various possible manual dumps, independent from actual core callbacks dumping actual data, it could be even more a time saver.

there are the following types of dumps in idomod, handled by idomod_broker_data - see the wiki entry.

one overall question is now - the object config dump happens now only in NEBTYPE_PROCESS_EVENTLOOPSTART but not in NEBCALLBACK_RETENTION_DATA anymore. so actually there is no difference in the gotten data, because idomod_write_config dumps that straight from the memory.

        /****** dump command config ******/
        for (temp_command = command_list; temp_command != NULL; temp_command = temp_command->next) {

the original retained data would be called here

                /* retained config was just read, so dump it */
                if (rdata->type == NEBTYPE_RETENTIONDATA_ENDLOAD)
                        idomod_write_config(IDOMOD_CONFIG_DUMP_RETAINED);

this is invoked in

/* reads in initial host and state information */
int read_initial_state_information(void) {
        int result = OK;

        if (retain_state_information == FALSE)
                return OK;

#ifdef USE_EVENT_BROKER
        /* send data to event broker */
        broker_retention_data(NEBTYPE_RETENTIONDATA_STARTLOAD, NEBFLAG_NONE, NEBATTR_NONE, NULL);
#endif  

        /********* IMPLEMENTATION-SPECIFIC INPUT FUNCTION ********/
#ifdef USE_XRDDEFAULT
        result = xrddefault_read_state_information();
#endif

#ifdef USE_EVENT_BROKER
        /* send data to event broker */
        broker_retention_data(NEBTYPE_RETENTIONDATA_ENDLOAD, NEBFLAG_NONE, NEBATTR_NONE, NULL);
#endif  

        if (result == ERROR)
                return ERROR;

        return OK;
}

within this procedure after daemonizing

                        /* initialize status data unless we're starting */
                        if (sigrestart == FALSE)
                                initialize_status_data(config_file);

                        /* read initial service and host state information  */
                        initialize_retention_data(config_file);
                        read_initial_state_information();
                        sync_state_information();

                        /* initialize comment data */ 
                        initialize_comment_data(config_file);

                        /* initialize scheduled downtime data */
                        initialize_downtime_data(config_file);

                        /* initialize performance data */
                        initialize_performance_data(config_file);

                        /* initialize the event timing loop */
                        init_timing_loop();

                        /* initialize check statistics */
                        init_check_stats();

                        /* update all status data (with retained information) */
                        update_all_status_data();

                        /* log initial host and service state */
                        log_host_states(INITIAL_STATES, NULL);
                        log_service_states(INITIAL_STATES, NULL);

                        /* reset the restart flag */
                        sigrestart = FALSE;

#ifdef USE_EVENT_BROKER
                        /* send program data to broker */
                        broker_program_state(NEBTYPE_PROCESS_EVENTLOOPSTART, NEBFLAG_NONE, NEBATTR_NONE, NULL);
#endif

                        /* get event start time and save as macro */
                        event_start = time(NULL);
                        my_free(mac->x[MACRO_EVENTSTARTTIME]);
                        dummy = asprintf(&mac->x[MACRO_EVENTSTARTTIME], "%lu", (unsigned long)event_start);

                        /***** start monitoring all services *****/
                        /* (doesn't return until a restart or shutdown signal is encountered) */
                        event_execution_loop();


        case NEBCALLBACK_PROCESS_DATA:

                procdata = (nebstruct_process_data *)data;

                /* process has passed pre-launch config verification, so dump original config */
                if (procdata->type == NEBTYPE_PROCESS_START) {
                        idomod_write_config_files();
                        idomod_write_config(IDOMOD_CONFIG_DUMP_ORIGINAL);
                }

                /* process is starting the event loop, so dump runtime vars */
                if (procdata->type == NEBTYPE_PROCESS_EVENTLOOPSTART) {
                        idomod_write_runtime_variables();
                }

meaning that the original config plus the main config file would be read before daemonizing.

#ifdef USE_EVENT_BROKER
                        /* send program data to broker */
                        broker_program_state(NEBTYPE_PROCESS_START, NEBFLAG_NONE, NEBATTR_NONE, NULL);
#endif

                        /* enter daemon mode (unless we're restarting...) */
                        if (daemon_mode == TRUE && sigrestart == FALSE) {

so the main process does the following order

  • start
  • dump original configs
    (...block...)
  • daemonize
  • dump retained configs
    (...block...)
  • update all status data
    (...block...)
  • write status.dat
  • start event loop

the overall question is - who the fuck needs the configs at that stage? the database should only contain the configs the core is actually using, including retained information.

so changing this to

  • start
  • daemonize
  • update all status data
    (...block...)
  • write status.dat
  • dump (retained) configs
    (...block...)
  • start event loop

will remove the difference between original and retained configs, allow status data to be written sooner, and remove the blocking if using the queue threading with circular buffer.

  • start
  • daemonize
  • update all status data
    (...fill buffer...)
  • write status.dat
  • dump (retained) configs
    (...fill buffer...)
  • start event loop

so to clear this up - there is no actual difference between original and retained configs - it's just the timestamp happening causing troubles on the overall core performance / blocking.

the final question remains - how to handle the config_type attribute being sent to socket then ... (or 0 and 1 will be the same configs, but for compatibility reasons this will stay like it is).

Changesets

2011-09-27 18:31:10 +00:00 by mfriedrich 9dfb19f

idoutils: move thread start + origcfg dump to new NEBTYPE_PROCESS_INITSTART and retained back to normal #1934, make logging threadsafe #1962, correct overflow log #1931

this changes the thread not to be started in event loop (new neb type making it happen sooner).
the original config gets dumped immediately, but after daemonizing too, which NEBTYPE_PROCESS_INITSTART
indicates for now. the retained config dump stays where it happened, the target in timeline is perfectly
fine when using a circular buffer to cache these things.
while the consumer queue thread is reading and dumping the configs to the database, the core can start
the event loop, pushing status updates and current checks to the queue as well.

the main disadvantage on a non-blocking neb module will be the buffer holding the data and needing
the exact same time as the core was blocked to finally dump the whole config stuff. so this remains still
in testing mode, and needs possible enhancements, basically on the sleep time of the threads and so on.

the debug logging not being threadsafe was being fixed with the previous commit, holding a file pointer
and writing simultanously to it is not really a good idea. the main logging functions used a core internal
function which would invoke a neb callback to self() if something gets written to the icinga log file.
this has been removed for now, the syslog will be used like done in ido2db, further logging must be done
side by side with the demanded ido2db logging and their syslog facilities.
previously, the newly added warning on the buffer overflow when the retry was hit, resulted in a segfault
therefore this fix was done urgent.

refs #1931
refs #1934
refs #1962

2011-10-24 11:27:28 +00:00 by mfriedrich 776ecc4e4872898fdb7afb40c275327e3ad250b8

iod2db dbqueue threaded, working state 13:27

refs #1934

2011-10-24 13:23:06 +00:00 by mfriedrich 4a354b9fcb08f5674d953ad3f1c1fd9a49baed25

idoutilss: correct memory pointer allocation for buffered_input and mbuf in dbqueue push and pop

TODO
- only one dbqueue thread, needs multiple workers
- freeing of the dbqueue_item object (initial calloc in push and pop)
- reset pointers for buffered_input x members

push

from idi to dbqueue_item

buffered_input
1) allocate memory for buffered_input
2) initialize x members of buffered_input array (char* ptrs)
3) assign pointer to chars in loop
dbqueue_item->buffered_input[x] = idi->buffered_input[x];

mbuf
1) copy all x slots array pointers
dbqueue_item->mbuf[x].buffer = idi->mbuf[x].buffer;
2) copy all x slot y line to char* pointers
dbqueue_item->mbuf[x].buffer[y] = idi->mbuf[x].buffer[y];
3) reset all x slot y line pointers in idi
idi->mbuf[x].buffer[y] = NULL; /* line pointer to char* */
4) reset all x slots array pointers
idi->mbuf[x].buffer = NULL; /* slot pointer to array of lines*/
5) reset all counters on used|allocated_lines
idi->mbuf[x].used_lines = 0;
idi->mbuf[x].allocated_lines = 0;

pop

from dbqueue_item to idi

exact reverse assignments and resets are now for dbqueue_item

free_input_memory will be called ONLY after ido2db_handle_* has been run,
meaning that a db query has happened.

refs #1934

2011-10-24 14:09:28 +00:00 by mfriedrich 444ff5eaad73fa6478cc1ff191cf115b854b0c9b

idoutils: do not reset char* pointers just pointers to arrays

refs #1934

2011-10-24 14:37:14 +00:00 by mfriedrich 1774cbb097bbe84824860b4bf9d80b7a8a14c916

idoutils: enabled 10 dbqueue threads

refs #1934

2011-10-24 15:15:41 +00:00 by mfriedrich bc1d93edae17df064c9afdff50f22823626be672

cgis: don't be so aggressive on determining the last status update interval

many users complained about that and ever since
the root cause are event broker modules blocking
the core when they are dumping configs and so on
so right on real not running/fully blocked daemons
will be detected, but not short outages on dumps

refs #1934

2011-10-24 15:44:06 +00:00 by mfriedrich ee60702e91001b7291aac3c05aeb51f46bc71422

idoutils: rework debug logging a bit, move to compile defines

refs #1934

2011-10-24 16:40:03 +00:00 by mfriedrich dd5de2e0c33524c4ec302fa1e2a83ad4e6dbdafe

idoutils: fix logging.h inclusion, let dbqueue threads loop until main idi has read an instance_name

refs #1934

Relations:

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2011-09-25 11:32:34 +00:00

the wiki entry for callback types - https://wiki.icinga.org/display/Dev/idomod+callback+types

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2011-09-27 18:32:39 +00:00

  • Category set to 70
  • Status changed from Feedback to Assigned
  • Assigned to set to mfriedrich

in order to reflect the state "i have daemonized if called like that and will now start with the initialization", i've added 2 new callbacks to the core routine.

this will become handy for any neb modules wanting to load their threads before the event loops actually starts and before config data or similar is being read.

  • NEBTYPE_PROCESS_INITSTART
  • NEBTYPE_PROCESS_INITEND

idomod now starts the routine idomod_init_post() (renamed idomod_start_event_loop()) directly if the callback type matches NEBTYPE_PROCESS_INITSTART

this is also the point, where the original config is getting dumped (normally this would happen before daemonizoting on PROCESS_START. so this is by far the config order change.
dumping retained config data happens between INITSTART and INITEND, so this is reverted back to normal.

the reason i am chosing this way is the fact that the buffer needs to be worked on with config data, and saving space for future status updates. sometimes the config dump takes rather long, and the queue is stuffed, so the status updates won't reach the database that soon (even thoug the old status data still remains in there).
furthermore, dumping the config not during event loop start will prevent from interferring with any other things happening in between.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2011-09-27 18:53:55 +00:00

  • Done % changed from 0 to 50

btw - a reload and/or restart will always invoke the exact same config dumps. the only difference happening on a restart/reload is the handling in ido2db itsself - it will detect the restart and truncate all config tables and various status tables (not host/service status and downtime).

we might remove the config deletion, because afterwards all objects are flagged as inactive. and even if there is no config, all other data will invoke an insert into the objects table, only the config dump itsself would mark 'is_active' and therefore allowing the external application to determine if the data is the actual one. but an update will be less cost on the tables than a plain insert but if there then update thingy (like it's done withing postgresl).

        /* if process is starting up, clearstatus data, event queue, etc. */
        if (type == NEBTYPE_PROCESS_PRELAUNCH && tstamp.tv_sec >= idi->dbinfo.latest_realtime_data_time) {

this is where is_active=0 is set on the objects table.

                /* flag all objects as being inactive */
                /* if the core starts up, the fresh config is being pushed
                           into ido2db, marking actual config object ids as active. */
                ido2db_set_all_objects_as_inactive(idi);

the flagging as being active happens, when the configs are dumped, one by one.

e.g. a new host reaches the db

int ido2db_handle_hostdefinition(ido2db_idi *idi) {

        /* get the object id */
        result = ido2db_get_object_id_with_insert(idi, IDO2DB_OBJECTTYPE_HOST, idi->buffered_input[IDO_DATA_HOSTNAME], NULL, &object_id);

        /* flag the object as being active */
        ido2db_set_object_as_active(idi, IDO2DB_OBJECTTYPE_HOST, object_id);

so if the object exists, it's just selected and returned to be updated with is_active=1 afterwards. otherwise it's inserted, fetching its id and then flagging it as is_active=1.

so keeping the config tables filled would be make the update process easier (save the insert on innodb in mysql e.g.). but will still require the configs being finally dumped into the database then.

this is to be discussed.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2011-10-09 18:59:36 +00:00

  • Subject changed from change order of data dumping in idomod to change order of data dumping in idoutils

a better attempt would be to change ido2db and insert the configs within their own ido2db thread, so to speak having a database connection for it's own for

  • config
  • status
  • historical

and 3 workers for that (main thread is read socket, write buffer). this will need the objects hash list to be threadsafe too.

depending on the type being found, the 3 thread functions will decide which ido2db_handle_* to be called then, each working on their db connection for themselves.

this will probably allow the config data to be dumped on core (re)start, processing all data in blocked mode, and if the buffer is shared with status data (retained status data!), the status worker thread can asynchronously work on that, independant from the config dump itsself. the icinga web query selects will only work when the tables are filled properly ofc.

if working with 3 db connections (plus one for the housekeeping thread), this requires a plain copy of the *idi object, but changing the descripting and reopening an own connection.

one main problem could be the read from the buffer by one worker blocking the rest. but this is only the read itsself, then releasing the mutex lock. the main processing of data happens independent of the buffer locking then.

this could be adapted like it is done now, adding 2 more worker threads, and extending the connection *idi object. the function which decides which data to handle, needs to be split then.
keep in mind that duplicated *idi objects will hold a lot of prepared statements for oracle in memory - decide if some could be free'd if not needed by this type of connection.
the overall object_id hashlist and selecting thingy needs to be made threadsafe, same goes for the debug log functions. furthermore the db connection stuff needs to be tested properly.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2011-10-13 22:51:27 +00:00

the idea with multiple worker threads is pretty good, but the location of the buffer is not very good. it's where plain data from the socket being read, playing socket buffer due to blockings.

so keeping the queue buffer as is, having a main producer and threaded consumer for RAW data is good either way.

the data runs through various processing steps. there are 2 different types of data

  • single data
  • multiple data, written to mbuf

we can't interrupt during this method, but we need to wait until all data is written to those mbufs.

main: ido2db_check_for_client_input calls ido2db_write_to_sink_queue
worker: ido2db_read_from_sink_queue calls ido2db_handle_client_input

the client input is put in RAW format, and then being split and analyzed on the type.

current_input_section, IDO2DB_INPUT_SECTION_DATA: (working as a state machine, with flagging)

if IDO_API_ENDDATA is recognized, it is being signalled that the data can be further processed. otherwise, data is being added with ido2db_add_input_data_item

this data is being saved withing the *idi object.

single: idi->buffered_input[type] = newbuf;
multiple: idi~~mbuf[mbuf_slot].buffer[idi~~>mbuf[mbuf_slot].used_lines] = buf;

if the end is reached, the actual handling of the data read happens within ido2db_end_input_data

so to speak the buffers do not remain global, but within the idi object. the question is, what queue could be put in between, to allow one producer and e.g. 10 threads having 10 database connections, doing the actual calls to ido2db_handle_*

the overall problem with config data is - there is the need of a chronological insert, otherwise the the multiple lines of data do not make sense.

all other realtime data remains

  • get a callback, dump the gotten information
  • single message object

dumping e.g. a host definition does the following

  • loop through the host list
  • dump one host
  • dump all parent hosts of this host
  • dump contactgroups of this host
  • dump individual contacts of this host
  • dump customvars
  • signal end with IDO_API_ENDDATA

so when a host start being dumped, you get a long list of

  • single line with host
  • parents, contactgroups, contacts, customvars als multiple line

all of this is being read from buffered_input and mbuf array.

so summarize

  • commands are single
  • timeperiods include multiple timeranges
  • contacts include multiple address, host notification commands, service notification commands, customvars
  • contactgroups include multiple members
  • hosts include include multiple parents, contactgroups, contacts, customvars
  • hostgroups include multiple members
  • services include multiple contactgroups, contacts, customvars
  • servicegroups include multiple members
  • hostescalations include multiple contactgroups, contacts
  • serviceescalations include multiple contactgroups, contacts
  • hostdependencies are single
  • servicedependencies are single
  • main config - configfilevariables are multi

so to speak - instead of creating a generic solution, it must be split between config and realtime data.

so the realtime data gets its own queue, where the buffered_input array is put on and the realtimedata thread reads that, assigns to its idi object and then calls the appropriate handle function.

the config data also gets its own queue, where the buffered_input array and the mbuf slot-item array is being pushed. this holds a complete transaction of a config object being dumped, signalled by IDO_API_ENDDATA, and for what it's worth, having exactly that read into the 2 buffers, a config thread can read that from the queue, assign to its idi object and then call the appropriate handle function.

the char* bufs added are plain copies through the function calls themselves, so the idi object points to their location, and can free those afterwards. if we don't free it, but reset the address to NULL and reassign the address in the queue object to the actual reading thread, it can be freed when freeing the buffer, or better, after the client has processed the data, the idi copies should be free'd either way.

realtime data threads should be configurable by runtime, such as config thread numbers should be. the only difference - config threads should be created on configdumpstart, and joined after configdumpend, not to harm the overall realtime threads on the scheduler.
the db_hello being used must be evaluated - the selection on the instance in the database must be locked to one being the first (the main thread?).

this needs a deeper evaluation on the actual code, and performance tests. furthermore, users must be aware of multiple database connections per thread then needed.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2011-10-18 20:12:57 +00:00

the problem remains with the 2 arrays and their actual copying into a queue.

looping through those and copying each cell is not a wanted behaviour, and happily, memcpy is the function we are looking for. due to the nature of n dimensional arrays in C keeping their addresses continous, we can keep the memory layout by just copying that.

due to the fact that we have the following things to copy, define a new struct having these members

  • int current_input_type (to know what is being added to the database)
  • char * buffer_input[];
  • mbuf (own struct, but also memcpy possible - ido2db_mbuf_struct)

memcpy(destination, source, sizeof(destination));

using that when pushing to and poping from the queue with the new datatype with that struct, the various threads will keep their own working copy of the data and not interfere on free.

furthermore, the registered consumers will assign the destination to their idi->... buffers, and just call the handle functions with their own db connection type.

since handling data is only invoked when a dump has finished, signalled by IDO_API_ENDDATA, the buffers will hold all valuable information "in a row" and can be copied around in memory.

possible threats:

socket - main thread (producer) - queue - queue thread (consumer) - extract data from RAW, fill buffers (producer) - dbqueue - n db threads (consumer)

requirements - start n threads with n idi db connections before the queue thread starts.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2011-10-22 19:50:38 +00:00

one for idomod again.

i've been reading a lot in the past days, also how other neb modules handle the amount of data on startup - everyone needs the configs and even the status updates.

merlin does it like this

  • register only one hook on nebmodule_init, calling post_init
  • this makes sure that the startup events like status_updates from retention.dat and such are not even registered or 0 handled
  • then it sends the path of config, objects.cache and status.dat to the merlind which processes that data independently, not event over a socket itsself
  • while merlind is processing the data, the merlin module keeps stalling, and when the reaper thread gets a ctrl_resume event, the stalling is stopped and the desired neb_callbacks are registered

this idea is pretty awesome, but we just can't get ido2db to behave like that. if ido2db would support only unix sockets, and no tcp, we would just fork another child, reading objects.cache and status.dat and then dumping that data independant from the core to the database, while idomod does nothing / does the normal operation.

so another idea.

  • start 2 threads, one for config, one for status data (we will be using the config thread in the first place)
  • add a config_dump_lock mutex
  • add a signal_config_dump_start, signal_config_dump_end
  • add a signal_data_dump_start, set to FALSE and return OK all status and check updates to the core
  • if a nebcallback indicates that a config dump can happen, set signal_config_dump_start=TRUE
  • the config thread will catch on that signal, starting to dump the config with the provided functions
    • it also sets a lock on the mutex to allow hard locking for the main thread
  • during the time the mutex is locked, the signal_config_dump_started==TRUE, all realtime data will be dropped, meaning that the idomod is stalling, waiting for the config dump to be finished
  • when the config thread sends the signal_config_dump_end, it exits normally - we just need to make sure the thread is started on each reload/restart
  • when the main thread gets the signal_config_dump_end, it starts to accept data provided by the neb_callbacks

this method could be enhanced and looping through the hosts and service status lists in memory and dump the actual status also while the core is doing checks, if needed.

contra - everything on realtime during config dump is lost
pro - the core checks and alerts during config dump

overall, the ido2db must be enhanced to process more than just one packet either way. so it#s a 2 times story again.

revamping contra - not that much anymore is lost. the configs are the essential part of the story in the first place.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2011-10-23 12:21:29 +00:00

possible race condition on ido2db dbqueue workers - db_hello with have_instance. solution might to be looping until there is a instance_name found. see #1111

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2011-10-23 12:54:36 +00:00

should be ok, because the initial startup does not invoke any data handling, but processes

  • IDO_API_HELLO
    =>
  • IDO2DB_INPUT_SECTION_HEADER
    =>
    • IDO_API_PROTOCOL idi->protocol_version
    • IDO_API_INSTANCENAME idi->instance_name
    • IDO_API_AGENT idi->agent_name
    • IDO_API_AGENTVERSION idi->agent_version
    • IDO_API_DISPOSITION idi->disposition
    • IDO_API_CONNECTION idi->connect_source
    • IDO_API_CONNECTTYPE idi->connect_type
    • IDO_API_STARTTIME idi->data_start_time
      =>
  • IDO_API_STARTDATADUMP
    =>
  • ido2db_db_hello

note: we need to copy that exact information after idi->instance_id is not 0 and/or instance_name is not NULL.

note2: multiple idomod connections are not a problem, as there will be another child forked handling that type of information then. the unique identifier remains the instance_id ...

[1319373302.278850] [001.2] [pid=5197] ido2db_open_debug_log() end
[1319373302.278916] [001.2] [pid=5197] ido2db_wait_for_connections() start
[1319373302.278967] [001.2] [pid=5197] ido2db_daemonize() start
[1319373302.279310] [001.2] [pid=5197] ido2db_daemonize() parent process goes away
[1319373302.279347] [001.2] [pid=5197] ido2db_free_program_memory() start
[1319373302.279373] [001.2] [pid=5197] ido2db_free_program_memory() end
[1319373302.279398] [001.2] [pid=5198] ido2db_daemonize() child forks again
[1319373302.279861] [001.2] [pid=5198] ido2db_daemonize() first child process goes away
[1319373302.279901] [001.2] [pid=5198] ido2db_free_program_memory() start
[1319373302.279941] [001.2] [pid=5198] ido2db_free_program_memory() end
[1319373302.279953] [001.2] [pid=5199] ido2db_daemonize() grandchild continues and  becomes session leader
[1319373302.280321] [001.2] [pid=5199] ido2db_daemonize() end
[1319373364.253734] [001.2] [pid=10505] ido2db_handle_client_connection() start
[1319373364.254150] [001.2] [pid=10505] ido2db_open_debug_log() end
[1319373364.254447] [001.2] [pid=10505] ido2db_idi_init() start
[1319373364.254473] [001.2] [pid=10505] ido2db_idi_init() end
[1319373364.254489] [001.2] [pid=10505] ido2db_db_init() start
[1319373364.254772] [001.2] [pid=10505] ido2db_db_init() end
[1319373364.254817] [001.2] [pid=10505] ido2db_db_connect() start
[1319373364.278728] [001.2] [pid=10505] ido2db_db_connect(0) end
[1319373364.278864] [001.2] [pid=10505] ido2db_check_for_client_input() dbuf.size=511
[1319373364.278883] [001.2] [pid=10505] ido2db_handle_client_input(instance_name=(null)) start
[1319373364.278899] [001.2] [pid=10505] ido2db_handle_client_input(instance_name=(null)) start
[1319373364.278911] [001.2] [pid=10505] ido2db_handle_client_input(instance_name=(null)) start
[1319373364.278921] [001.2] [pid=10505] ido2db_handle_client_input() input_section
[1319373364.278933] [001.2] [pid=10505] ido2db_free_connection_memory() start
[1319373364.278943] [001.2] [pid=10505] ido2db_free_connection_memory() end
[1319373364.278953] [001.2] [pid=10505] ido2db_handle_client_input() end
[1319373364.278963] [001.2] [pid=10505] ido2db_handle_client_input(instance_name=(null)) start
[1319373364.278974] [001.2] [pid=10505] ido2db_handle_client_input() input_section
[1319373364.278986] [001.2] [pid=10505] ido2db_handle_client_input() end
[1319373364.278997] [001.2] [pid=10505] ido2db_handle_client_input(instance_name=(null)) start
[1319373364.279006] [001.2] [pid=10505] ido2db_handle_client_input() input_section
[1319373364.279016] [001.2] [pid=10505] ido2db_handle_client_input() end
[1319373364.279027] [001.2] [pid=10505] ido2db_handle_client_input(instance_name=(null)) start
[1319373364.279037] [001.2] [pid=10505] ido2db_handle_client_input() input_section
[1319373364.279048] [001.2] [pid=10505] ido2db_handle_client_input() end
[1319373364.279059] [001.2] [pid=10505] ido2db_handle_client_input(instance_name=(null)) start
[1319373364.279069] [001.2] [pid=10505] ido2db_handle_client_input() input_section
[1319373364.279080] [001.2] [pid=10505] ido2db_handle_client_input() end
[1319373364.279090] [001.2] [pid=10505] ido2db_handle_client_input(instance_name=(null)) start
[1319373364.279100] [001.2] [pid=10505] ido2db_handle_client_input() input_section
[1319373364.279111] [001.2] [pid=10505] ido2db_handle_client_input() end
[1319373364.279121] [001.2] [pid=10505] ido2db_handle_client_input(instance_name=(null)) start
[1319373364.279131] [001.2] [pid=10505] ido2db_handle_client_input() input_section
[1319373364.279142] [001.2] [pid=10505] ido2db_handle_client_input() end
[1319373364.279152] [001.2] [pid=10505] ido2db_handle_client_input(instance_name=(null)) start
[1319373364.279162] [001.2] [pid=10505] ido2db_handle_client_input() input_section
[1319373364.279172] [001.2] [pid=10505] ido2db_handle_client_input() end
[1319373364.279183] [001.2] [pid=10505] ido2db_handle_client_input(instance_name=(null)) start
[1319373364.279192] [001.2] [pid=10505] ido2db_handle_client_input() input_section
[1319373364.279202] [001.2] [pid=10505] ido2db_handle_client_input() end
[1319373364.279242] [001.2] [pid=10505] ido2db_handle_client_input(instance_name=default) start
[1319373364.279252] [001.2] [pid=10505] ido2db_handle_client_input() input_section
[1319373364.279260] [001.2] [pid=10505] ido2db_db_hello() start
[1319373364.279269] [001.2] [pid=10505] ido2db_db_version_check () start 
[1319373364.279283] [001.2] [pid=10505] ido2db_db_query() start
[1319373364.279422] [002.0] [pid=10505] SELECT version FROM icinga_dbversion WHERE name='idoutils'
[1319373364.280509] [001.2] [pid=10505] ido2db_db_query(0) end
[1319373364.280574] [001.2] [pid=10505] ido2db_db_version_check () end
[1319373364.280589] [001.2] [pid=10505] ido2db_db_query() start
[1319373364.280753] [002.0] [pid=10505] SELECT instance_id FROM icinga_instances WHERE instance_name='default'
[1319373364.281342] [001.2] [pid=10505] ido2db_db_query(0) end
[1319373364.281409] [001.2] [pid=10505] ido2db_db_hello(instance_id=1)

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2011-10-23 22:10:03 +00:00

  • Category changed from 70 to 82

it seems that it works with copying the buffer pointers and idi reassignment. for testing purposes, only one dbqueue thread is running.

Sun Oct 23 23:53:28 2011 .623072 [001.2] [pid=1374] [tid=140737294735104] ido2db_dbqueue_buf_pop() start
Sun Oct 23 23:53:28 2011 .623191 [001.2] [pid=1374] [tid=140737294735104] ido2db_dbqueue_buf_pop() end
Sun Oct 23 23:53:28 2011 .623223 [001.2] [pid=1374] [tid=140737294735104] ido2db_handle_configfilevariables() start
Sun Oct 23 23:53:28 2011 .623248 [002.0] [pid=1374] [tid=140737294735104] HANDLE_CONFIGFILEVARS [1]
Sun Oct 23 23:53:28 2011 .623265 [001.2] [pid=1374] [tid=140737294735104] ido2db_convert_standard_data_elements() start
Sun Oct 23 23:53:28 2011 .623289 [002.0] [pid=1374] [tid=140737294735104] HANDLE_CONFIGFILEVARS [2]
Sun Oct 23 23:53:28 2011 .623307 [002.0] [pid=1374] [tid=140737294735104] TSTAMP: 1319406808   LATEST: 1319400312
Sun Oct 23 23:53:28 2011 .623325 [002.0] [pid=1374] [tid=140737294735104] HANDLE_CONFIGFILEVARS [3]
Sun Oct 23 23:53:28 2011 .623343 [001.2] [pid=1374] [tid=140737294735104] ido2db_db_escape_string('/etc/icinga/icinga.cfg') start
Sun Oct 23 23:53:28 2011 .623363 [001.2] [pid=1374] [tid=140737294735104] ido2db_db_escape_string changed string ('/etc/icinga/icinga\.cfg')
Sun Oct 23 23:53:28 2011 .623380 [001.2] [pid=1374] [tid=140737294735104] ido2db_db_escape_string end
Sun Oct 23 23:53:28 2011 .623401 [001.2] [pid=1374] [tid=140737294735104] ido2db_query_insert_or_update_configfilevariables_add() start
Sun Oct 23 23:53:28 2011 .623432 [001.2] [pid=1374] [tid=140737294735104] ido2db_db_query() start
Sun Oct 23 23:53:28 2011 .623616 [001.2] [pid=1374] [tid=140737354012416] ido2db_write_to_sink_queue() start
Sun Oct 23 23:53:28 2011 .623667 [001.2] [pid=1374] [tid=140737354012416] ido2db_write_to_sink_queue() buf: 
Sun Oct 23 23:53:28 2011 .623686 [001.2] [pid=1374] [tid=140737354012416] ido2db_write_to_sink_queue() buffer items: 0/500000 head: 249 tail: 249
Sun Oct 23 23:53:28 2011 .623707 [001.2] [pid=1374] [tid=140737354012416] ido2db_sink_buffer_push() start
Sun Oct 23 23:53:28 2011 .623725 [001.2] [pid=1374] [tid=140737354012416] ido2db_sink_buffer_push() end
Sun Oct 23 23:53:28 2011 .623742 [001.2] [pid=1374] [tid=140737354012416] ido2db_write_to_sink_queue() success
Sun Oct 23 23:53:28 2011 .623775 [002.0] [pid=1374] [tid=140737294735104] INSERT INTO icinga_configfiles (instance_id, configfile_type, configfile_path) VALUES (1, 0, '/etc/icinga/icinga\.cfg') ON DUPLICATE KEY UPDATE instance_id=1, configfile_type=0, configfile_path='/etc/icinga/icinga\.cfg'

but there's still a problem with free_input_memory which causes a segfault.

(gdb) run -f -c /etc/icinga/ido2db.cfg
Starting program: /home/dnsmichi/coding/icinga/icinga-core/module/idoutils/src/ido2db -f -c /etc/icinga/ido2db.cfg
[Thread debugging using libthread_db enabled]
[New Thread 0x7ffff5efd700 (LWP 1392)]
[New Thread 0x7ffff532b700 (LWP 1393)]
[New Thread 0x7ffff4759700 (LWP 1394)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff4759700 (LWP 1394)]
*__GI___libc_free (mem=0x30) at malloc.c:3709
3709    malloc.c: Datei oder Verzeichnis nicht gefunden.
        in malloc.c
(gdb) bt full

#0  *__GI___libc_free (mem=0x30) at malloc.c:3709

ar_ptr =
p =

#1  0x0000000000403ce4 in ido2db_free_input_memory (idi=0x63cac0) at ido2db.c:2804

x =
y = 0

#2  0x0000000000403e00 in ido2db_handle_input_data (idi=0x63cac0) at ido2db.c:2776

result = 0

#3  0x0000000000405a76 in ido2db_dbqueue_handle (data=) at ido2db.c:2572

_cancel_buf = {cancel_jmp_buf = {{_cancel_jmp_buf = {140737294735104, -3471334635555649017, 140737347592608, 140737294735808, 140737354125376, 3,
3471359837929081351, -3471335183974528505}, _mask_was_saved = 0}},_pad = {0x7ffff4758f80, 0x0, 0x0, 0x0}}
__cancel_routine = 0x404130 <ido2db_thread_dbqueue_exit_handler>
__not_first_call =
delay = {tv_sec = 0, tv_nsec = 5000}
temp_buffer = 0x0
result =
idi_thread_id = 0
thread_data =
idi = 0x7fffffffe060

#4  0x00007ffff79b7b40 in start_thread (arg=) at pthread_create.c:304

__res =
pd = 0x7ffff4759700
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737294735104, -3471334635555649017, 140737347592608, 140737294735808, 140737354125376, 3, 3471359837920692743,
3471352532557498887}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call =
freesize =

PRETTY_FUNCTION = "start_thread"

#5  0x00007ffff748036d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112

No locals.

#6  0x0000000000000000 in ?? ()

No symbol table info available.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2011-10-24 13:51:02 +00:00

*** glibc detected *** corrupted double-linked list

I think it is referring to an internal linked list of memory blocks that
memory managers often use. There is no way for the compiler to know that
your data structures are implementing a linked list. Anyway, if you
overwrite the list pointers of the memory manager's list (Be it by writing
beyond the end of an array or maybe to memory that was already deleted or
not yet allocated properly), the memory manager can get corrupt. This is a
common source for crashes that occur not where the bug is but at some
seemingly unrelated part of the program (often the next time you
(de-)allocate memory).

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2011-10-24 17:12:18 +00:00

  • Subject changed from change order of data dumping in idoutils to [RFC] change order of data dumping in idoutils
  • Status changed from Assigned to Feedback
  • Target Version deleted 1.6

well. proof of concept. saved in mfriedrich/idomulti

  • leaks memory in push/pop
  • libdbi still leaks in dbi and mysql, using non threadsafe functions linked
  • due to the libdbi problems and the single connection handling, detection of a valid instance_name can't be done within threads, but must be taken into main initialisation. problems on determing the currect instance_id then

the main problem i see:

  • even if there are multiple workers handling the data within their own connection and idi data struct, you just don't see a startup benefit. starting took 300 seconds, after having multiple workers with their own libdbi connection, it's nearly ~300s again.
  • dumping configs and status data (which we actually need in the db) at the same time, waiting for eventloop start is hell
  • an external application actually reading objects.cache and status.dat and dump that initial information into the db might be a workaround
  • another workaround would be to register various nebcallbacks only when the eventloop started, to prevent retained state updates
  • afterall, an on demand daemon like icingamq could provide, fetch different data from the core, buffering itsself and then managing the database updates itsself (not with libdbi, but with native multithreaded connection pools) getting the configs (like idomod does now), getting the status (loop through the host and status lists) and finally registering hooks for realtime data pushing forward should easily resolve that problems.

getting idomod and/or ido2db into multithreaded worker environments isn't the way to go.

but for now, i'll leave that as is and focus on 1.6

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2011-10-24 17:34:16 +00:00

and just for the record - because libdbi's non threadsafety we need to lock each operation on the db among multiple threads

  • conn_ping on checking if connected
  • conn_query for actually issueing a query

so even if we have multiple threads working on the db, we are failing because libdbi's non-threadsafety requires us to lock us out ourselves. so we actually have one connection, where 10 workers want to write, seen from another pov.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2013-04-15 15:27:43 +00:00

this will be partly addressed with #3533 (adding a socket queue) and #3527 (wrapping config objects into transactions).

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2013-09-21 17:34:26 +00:00

  • Status changed from Feedback to Closed

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2014-12-08 14:37:24 +00:00

  • Project changed from 18 to Core, Classic UI, IDOUtils
  • Category changed from 82 to IDOUtils

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant