Skip to content
This repository has been archived by the owner on Jan 15, 2019. It is now read-only.

[dev.icinga.com #3899] Occasional error in external command buffer handling on reload with non-default external_command_buffer_slots setting #1244

Closed
icinga-migration opened this issue Mar 27, 2013 · 6 comments

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/3899

Created by ctolkmit on 2013-03-27 09:03:17 +00:00

Assignee: (none)
Status: New
Target Version: Backlog
Last Update: 2015-05-18 12:17:42 +00:00 (in Redmine)

Icinga Version: 1.8.4
OS Version: Ubuntu 10.04.4 LTS

prerequisites:

  • external_command_buffer_slots setting other than compiled-in default (4096)
  • process(es) writing to command file (icinga.cmd), that is, a command is queued (here we have seen this error) or (i guess) a command is taken from the command buffer
  • reload is performed

symptoms: right after a reload, we can see (using debug options and strace) that the external commands executed are lagging behind those, that are read from the fifo (in our case, a lot, e.g. 15 minutes).
e.g.: when 5 external commands enter the buffer, exactly 5 are read from it, but those 5 are 15 minutes old, as if the ring buffer pointers are invalid

analysis: we analysed the problem by introducing a validate_external_command_buffer()-method into utils.c (see attachment) and calling that on every modification of the ring buffer in submit_external_command() or check_for_external_commands() (in commands.c).
after a few (or more) "reload" by SIGHUP, the validation fails.

Example:
we had configured external_command_buffer_slots=32768.
we found out, that during reload, for a very brief amount of time, external_command_buffer_slots is set to the default of 4096:
(from icinga.log): validate_external_command_buffer(submit), strange buffer size. items=91/4096, head(write)=2578, tail(read)=18871
it seems, that, if, during reload, a command is submitted to the buffer, the head-adjustment ( head = (head+1)%slots ) used 4096 for the modulo operation, where it should have used 32768. or prevent submitting to the buffer at all.
after the reload, the validate failed with:
validate_external_command_buffer(submit), strange buffer size. items=92/32768, head(write)=2579, tail(read)=18871

workaround:

  • leave external_command_buffer_slots at the default value

suggested solution:

  • prevent the external_command_buffer from being size-modified even during reload, i.e. make external_command_buffer_slots a setting, that is NOT re-read during reload. most modifications of this value will lead to one error or the other.

Attachments


Relations:

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2013-04-10 20:09:58 +00:00

for a small amount of time this could happen between the reset_variables() function and reading the main configuration file, re-setting the value to the configured value again.

base/icinga.c

        /* keep monitoring things until we get a shutdown command */
        do {

            /* reset program variables */
            reset_variables();

            /* get PID */
            nagios_pid = (int)getpid();

            /* read in the configuration files (main and resource config files) */
            result = read_main_config_file(config_file);

thing is, revoking the external_command_buffer_slots from being reset would not initialize them if this is a core start or restart, and the proper signal handling for HUP happens elswhere.

maybe checking for sigrestart == TRUE when resetting the external command buffer could help. though, i did not look closely into the code, that will require a longer debug session when i get more time.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2013-04-21 10:14:01 +00:00

  • Status changed from New to Assigned
  • Assigned to set to mfriedrich

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2014-06-20 09:09:05 +00:00

  • Status changed from Assigned to New
  • Assigned to deleted mfriedrich

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2014-07-19 13:02:51 +00:00

  • Priority changed from High to Normal

Patches welcome.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2014-10-24 22:26:24 +00:00

  • Relates set to 7277

@icinga-migration
Copy link
Author

Updated by berk on 2015-05-18 12:17:42 +00:00

  • Target Version set to Backlog

@icinga-migration icinga-migration added this to the Backlog milestone Jan 17, 2017
@dnsmichi dnsmichi removed this from the Backlog milestone Dec 19, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants