[dev.icinga.com #1934] [RFC] change order of data dumping in idoutils #746
Comments
Updated by mfriedrich on 2011-09-25 11:32:34 +00:00 the wiki entry for callback types - https://wiki.icinga.org/display/Dev/idomod+callback+types |
Updated by mfriedrich on 2011-09-27 18:32:39 +00:00
in order to reflect the state "i have daemonized if called like that and will now start with the initialization", i've added 2 new callbacks to the core routine. this will become handy for any neb modules wanting to load their threads before the event loops actually starts and before config data or similar is being read.
idomod now starts the routine idomod_init_post() (renamed idomod_start_event_loop()) directly if the callback type matches NEBTYPE_PROCESS_INITSTART this is also the point, where the original config is getting dumped (normally this would happen before daemonizoting on PROCESS_START. so this is by far the config order change. the reason i am chosing this way is the fact that the buffer needs to be worked on with config data, and saving space for future status updates. sometimes the config dump takes rather long, and the queue is stuffed, so the status updates won't reach the database that soon (even thoug the old status data still remains in there). |
Updated by mfriedrich on 2011-09-27 18:53:55 +00:00
btw - a reload and/or restart will always invoke the exact same config dumps. the only difference happening on a restart/reload is the handling in ido2db itsself - it will detect the restart and truncate all config tables and various status tables (not host/service status and downtime). we might remove the config deletion, because afterwards all objects are flagged as inactive. and even if there is no config, all other data will invoke an insert into the objects table, only the config dump itsself would mark 'is_active' and therefore allowing the external application to determine if the data is the actual one. but an update will be less cost on the tables than a plain insert but if there then update thingy (like it's done withing postgresl).
this is where is_active=0 is set on the objects table.
the flagging as being active happens, when the configs are dumped, one by one. e.g. a new host reaches the db
so if the object exists, it's just selected and returned to be updated with is_active=1 afterwards. otherwise it's inserted, fetching its id and then flagging it as is_active=1. so keeping the config tables filled would be make the update process easier (save the insert on innodb in mysql e.g.). but will still require the configs being finally dumped into the database then. this is to be discussed. |
Updated by mfriedrich on 2011-10-09 18:59:36 +00:00
a better attempt would be to change ido2db and insert the configs within their own ido2db thread, so to speak having a database connection for it's own for
and 3 workers for that (main thread is read socket, write buffer). this will need the objects hash list to be threadsafe too. depending on the type being found, the 3 thread functions will decide which ido2db_handle_* to be called then, each working on their db connection for themselves. this will probably allow the config data to be dumped on core (re)start, processing all data in blocked mode, and if the buffer is shared with status data (retained status data!), the status worker thread can asynchronously work on that, independant from the config dump itsself. the icinga web query selects will only work when the tables are filled properly ofc. if working with 3 db connections (plus one for the housekeeping thread), this requires a plain copy of the *idi object, but changing the descripting and reopening an own connection. one main problem could be the read from the buffer by one worker blocking the rest. but this is only the read itsself, then releasing the mutex lock. the main processing of data happens independent of the buffer locking then. this could be adapted like it is done now, adding 2 more worker threads, and extending the connection *idi object. the function which decides which data to handle, needs to be split then. |
Updated by mfriedrich on 2011-10-13 22:51:27 +00:00 the idea with multiple worker threads is pretty good, but the location of the buffer is not very good. it's where plain data from the socket being read, playing socket buffer due to blockings. so keeping the queue buffer as is, having a main producer and threaded consumer for RAW data is good either way. the data runs through various processing steps. there are 2 different types of data
we can't interrupt during this method, but we need to wait until all data is written to those mbufs. main: ido2db_check_for_client_input calls ido2db_write_to_sink_queue the client input is put in RAW format, and then being split and analyzed on the type. current_input_section, IDO2DB_INPUT_SECTION_DATA: (working as a state machine, with flagging) if IDO_API_ENDDATA is recognized, it is being signalled that the data can be further processed. otherwise, data is being added with ido2db_add_input_data_item this data is being saved withing the *idi object. single: idi->buffered_input[type] = newbuf; if the end is reached, the actual handling of the data read happens within ido2db_end_input_data so to speak the buffers do not remain global, but within the idi object. the question is, what queue could be put in between, to allow one producer and e.g. 10 threads having 10 database connections, doing the actual calls to ido2db_handle_* the overall problem with config data is - there is the need of a chronological insert, otherwise the the multiple lines of data do not make sense. all other realtime data remains
dumping e.g. a host definition does the following
so when a host start being dumped, you get a long list of
all of this is being read from buffered_input and mbuf array. so summarize
so to speak - instead of creating a generic solution, it must be split between config and realtime data. so the realtime data gets its own queue, where the buffered_input array is put on and the realtimedata thread reads that, assigns to its idi object and then calls the appropriate handle function. the config data also gets its own queue, where the buffered_input array and the mbuf slot-item array is being pushed. this holds a complete transaction of a config object being dumped, signalled by IDO_API_ENDDATA, and for what it's worth, having exactly that read into the 2 buffers, a config thread can read that from the queue, assign to its idi object and then call the appropriate handle function. the char* bufs added are plain copies through the function calls themselves, so the idi object points to their location, and can free those afterwards. if we don't free it, but reset the address to NULL and reassign the address in the queue object to the actual reading thread, it can be freed when freeing the buffer, or better, after the client has processed the data, the idi copies should be free'd either way. realtime data threads should be configurable by runtime, such as config thread numbers should be. the only difference - config threads should be created on configdumpstart, and joined after configdumpend, not to harm the overall realtime threads on the scheduler. this needs a deeper evaluation on the actual code, and performance tests. furthermore, users must be aware of multiple database connections per thread then needed. |
Updated by mfriedrich on 2011-10-18 20:12:57 +00:00 the problem remains with the 2 arrays and their actual copying into a queue. looping through those and copying each cell is not a wanted behaviour, and happily, memcpy is the function we are looking for. due to the nature of n dimensional arrays in C keeping their addresses continous, we can keep the memory layout by just copying that. due to the fact that we have the following things to copy, define a new struct having these members
memcpy(destination, source, sizeof(destination)); using that when pushing to and poping from the queue with the new datatype with that struct, the various threads will keep their own working copy of the data and not interfere on free. furthermore, the registered consumers will assign the destination to their idi->... buffers, and just call the handle functions with their own db connection type. since handling data is only invoked when a dump has finished, signalled by IDO_API_ENDDATA, the buffers will hold all valuable information "in a row" and can be copied around in memory. possible threats: socket - main thread (producer) - queue - queue thread (consumer) - extract data from RAW, fill buffers (producer) - dbqueue - n db threads (consumer) requirements - start n threads with n idi db connections before the queue thread starts. |
Updated by mfriedrich on 2011-10-22 19:50:38 +00:00 one for idomod again. i've been reading a lot in the past days, also how other neb modules handle the amount of data on startup - everyone needs the configs and even the status updates. merlin does it like this
this idea is pretty awesome, but we just can't get ido2db to behave like that. if ido2db would support only unix sockets, and no tcp, we would just fork another child, reading objects.cache and status.dat and then dumping that data independant from the core to the database, while idomod does nothing / does the normal operation. so another idea.
this method could be enhanced and looping through the hosts and service status lists in memory and dump the actual status also while the core is doing checks, if needed. contra - everything on realtime during config dump is lost overall, the ido2db must be enhanced to process more than just one packet either way. so it#s a 2 times story again. revamping contra - not that much anymore is lost. the configs are the essential part of the story in the first place. |
Updated by mfriedrich on 2011-10-23 12:21:29 +00:00 possible race condition on ido2db dbqueue workers - db_hello with have_instance. solution might to be looping until there is a instance_name found. see #1111 |
Updated by mfriedrich on 2011-10-23 12:54:36 +00:00 should be ok, because the initial startup does not invoke any data handling, but processes
note: we need to copy that exact information after idi->instance_id is not 0 and/or instance_name is not NULL. note2: multiple idomod connections are not a problem, as there will be another child forked handling that type of information then. the unique identifier remains the instance_id ...
|
Updated by mfriedrich on 2011-10-23 22:10:03 +00:00
it seems that it works with copying the buffer pointers and idi reassignment. for testing purposes, only one dbqueue thread is running.
but there's still a problem with free_input_memory which causes a segfault.
ar_ptr =
x =
result = 0
_cancel_buf = {cancel_jmp_buf = {{_cancel_jmp_buf = {140737294735104, -3471334635555649017, 140737347592608, 140737294735808, 140737354125376, 3,
__res = PRETTY_FUNCTION = "start_thread"
No locals.
No symbol table info available. |
Updated by mfriedrich on 2011-10-24 13:51:02 +00:00
|
Updated by mfriedrich on 2011-10-24 17:12:18 +00:00
well. proof of concept. saved in mfriedrich/idomulti
the main problem i see:
getting idomod and/or ido2db into multithreaded worker environments isn't the way to go. but for now, i'll leave that as is and focus on 1.6 |
Updated by mfriedrich on 2011-10-24 17:34:16 +00:00 and just for the record - because libdbi's non threadsafety we need to lock each operation on the db among multiple threads
so even if we have multiple threads working on the db, we are failing because libdbi's non-threadsafety requires us to lock us out ourselves. so we actually have one connection, where 10 workers want to write, seen from another pov. |
Updated by mfriedrich on 2013-09-21 17:34:26 +00:00
|
Updated by mfriedrich on 2014-12-08 14:37:24 +00:00
|
This issue has been migrated from Redmine: https://dev.icinga.com/issues/1934
Created by mfriedrich on 2011-09-25 11:31:57 +00:00
Assignee: mfriedrich
Status: Closed (closed on 2013-09-21 17:34:25 +00:00)
Target Version: (none)
Last Update: 2014-12-08 14:37:24 +00:00 (in Redmine)
currently, idomod dumps various data based on different callbacks happening.
by moving various possible manual dumps, independent from actual core callbacks dumping actual data, it could be even more a time saver.
there are the following types of dumps in idomod, handled by idomod_broker_data - see the wiki entry.
one overall question is now - the object config dump happens now only in NEBTYPE_PROCESS_EVENTLOOPSTART but not in NEBCALLBACK_RETENTION_DATA anymore. so actually there is no difference in the gotten data, because idomod_write_config dumps that straight from the memory.
the original retained data would be called here
this is invoked in
within this procedure after daemonizing
meaning that the original config plus the main config file would be read before daemonizing.
so the main process does the following order
(...block...)
(...block...)
(...block...)
the overall question is - who the fuck needs the configs at that stage? the database should only contain the configs the core is actually using, including retained information.
so changing this to
(...block...)
(...block...)
will remove the difference between original and retained configs, allow status data to be written sooner, and remove the blocking if using the queue threading with circular buffer.
(...fill buffer...)
(...fill buffer...)
so to clear this up - there is no actual difference between original and retained configs - it's just the timestamp happening causing troubles on the overall core performance / blocking.
the final question remains - how to handle the config_type attribute being sent to socket then ... (or 0 and 1 will be the same configs, but for compatibility reasons this will stay like it is).
Changesets
2011-09-27 18:31:10 +00:00 by mfriedrich 9dfb19f
2011-10-24 11:27:28 +00:00 by mfriedrich 776ecc4e4872898fdb7afb40c275327e3ad250b8
2011-10-24 13:23:06 +00:00 by mfriedrich 4a354b9fcb08f5674d953ad3f1c1fd9a49baed25
2011-10-24 14:09:28 +00:00 by mfriedrich 444ff5eaad73fa6478cc1ff191cf115b854b0c9b
2011-10-24 14:37:14 +00:00 by mfriedrich 1774cbb097bbe84824860b4bf9d80b7a8a14c916
2011-10-24 15:15:41 +00:00 by mfriedrich bc1d93edae17df064c9afdff50f22823626be672
2011-10-24 15:44:06 +00:00 by mfriedrich ee60702e91001b7291aac3c05aeb51f46bc71422
2011-10-24 16:40:03 +00:00 by mfriedrich dd5de2e0c33524c4ec302fa1e2a83ad4e6dbdafe
Relations:
The text was updated successfully, but these errors were encountered: