You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 15, 2019. It is now read-only.
Assignee: (none)
Status: New
Target Version: Backlog
Last Update: 2015-05-18 12:17:42 +00:00 (in Redmine)
Icinga Version: 1.8.4
OS Version: Ubuntu 10.04.4 LTS
prerequisites:
external_command_buffer_slots setting other than compiled-in default (4096)
process(es) writing to command file (icinga.cmd), that is, a command is queued (here we have seen this error) or (i guess) a command is taken from the command buffer
reload is performed
symptoms: right after a reload, we can see (using debug options and strace) that the external commands executed are lagging behind those, that are read from the fifo (in our case, a lot, e.g. 15 minutes).
e.g.: when 5 external commands enter the buffer, exactly 5 are read from it, but those 5 are 15 minutes old, as if the ring buffer pointers are invalid
analysis: we analysed the problem by introducing a validate_external_command_buffer()-method into utils.c (see attachment) and calling that on every modification of the ring buffer in submit_external_command() or check_for_external_commands() (in commands.c).
after a few (or more) "reload" by SIGHUP, the validation fails.
Example:
we had configured external_command_buffer_slots=32768.
we found out, that during reload, for a very brief amount of time, external_command_buffer_slots is set to the default of 4096:
(from icinga.log): validate_external_command_buffer(submit), strange buffer size. items=91/4096, head(write)=2578, tail(read)=18871
it seems, that, if, during reload, a command is submitted to the buffer, the head-adjustment ( head = (head+1)%slots ) used 4096 for the modulo operation, where it should have used 32768. or prevent submitting to the buffer at all.
after the reload, the validate failed with:
validate_external_command_buffer(submit), strange buffer size. items=92/32768, head(write)=2579, tail(read)=18871
workaround:
leave external_command_buffer_slots at the default value
suggested solution:
prevent the external_command_buffer from being size-modified even during reload, i.e. make external_command_buffer_slots a setting, that is NOT re-read during reload. most modifications of this value will lead to one error or the other.
Updated by mfriedrich on 2013-04-10 20:09:58 +00:00
for a small amount of time this could happen between the reset_variables() function and reading the main configuration file, re-setting the value to the configured value again.
base/icinga.c
/* keep monitoring things until we get a shutdown command */
do {
/* reset program variables */
reset_variables();
/* get PID */
nagios_pid = (int)getpid();
/* read in the configuration files (main and resource config files) */
result = read_main_config_file(config_file);
thing is, revoking the external_command_buffer_slots from being reset would not initialize them if this is a core start or restart, and the proper signal handling for HUP happens elswhere.
maybe checking for sigrestart == TRUE when resetting the external command buffer could help. though, i did not look closely into the code, that will require a longer debug session when i get more time.
This issue has been migrated from Redmine: https://dev.icinga.com/issues/3899
Created by ctolkmit on 2013-03-27 09:03:17 +00:00
Assignee: (none)
Status: New
Target Version: Backlog
Last Update: 2015-05-18 12:17:42 +00:00 (in Redmine)
prerequisites:
symptoms: right after a reload, we can see (using debug options and strace) that the external commands executed are lagging behind those, that are read from the fifo (in our case, a lot, e.g. 15 minutes).
e.g.: when 5 external commands enter the buffer, exactly 5 are read from it, but those 5 are 15 minutes old, as if the ring buffer pointers are invalid
analysis: we analysed the problem by introducing a validate_external_command_buffer()-method into utils.c (see attachment) and calling that on every modification of the ring buffer in submit_external_command() or check_for_external_commands() (in commands.c).
after a few (or more) "reload" by SIGHUP, the validation fails.
Example:
we had configured external_command_buffer_slots=32768.
we found out, that during reload, for a very brief amount of time, external_command_buffer_slots is set to the default of 4096:
(from icinga.log): validate_external_command_buffer(submit), strange buffer size. items=91/4096, head(write)=2578, tail(read)=18871
it seems, that, if, during reload, a command is submitted to the buffer, the head-adjustment ( head = (head+1)%slots ) used 4096 for the modulo operation, where it should have used 32768. or prevent submitting to the buffer at all.
after the reload, the validate failed with:
validate_external_command_buffer(submit), strange buffer size. items=92/32768, head(write)=2579, tail(read)=18871
workaround:
suggested solution:
Attachments
Relations:
The text was updated successfully, but these errors were encountered: