Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev.icinga.com #13445] concurrent_checks in CheckerComponent not working when using command_endpoint #4841

Closed
icinga-migration opened this issue Dec 7, 2016 · 3 comments
Assignees
Labels
area/checks Check execution and results bug Something isn't working

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/13445

Created by thisismyname on 2016-12-07 08:52:37 +00:00

Assignee: (none)
Status: New
Target Version: (none)
Last Update: 2016-12-07 08:52:37 +00:00 (in Redmine)

Icinga Version: 2.5.4
Backport?: Not yet backported
Include in Changelog: 1

It seems that the limit of the checks that can run simultaneously, which can be implemented with the "concurrent_checks" parameter, doesn't work when using command_endpoint.

You can find a sketch of the setup attached.

Problem is the high load on the satellite server (mon3) which executes checks on network devices when we reload the master.

Attachments

@icinga-migration icinga-migration added bug Something isn't working Checker labels Jan 17, 2017
@gunnarbeutner gunnarbeutner added area/checks Check execution and results and removed Checker labels Feb 7, 2017
@iptizer
Copy link

iptizer commented Jun 19, 2017

Problem still exists. Is there anything we can do to help debugging?

@bezdek
Copy link

bezdek commented Sep 11, 2017

We running similar Icinga2 topology, "concurrent_checks" parameter doesn't work:

  • 1 master executes checks through command_endpoint to satellites
  • 2 satellites

On all nodes:
Version: r2.6.3-1
System information:
Platform: Debian GNU/Linux
Platform version: 8 (jessie)
Kernel: Linux
Kernel version: 3.16.0-4-amd64
Architecture: x86_64

Build information:
Compiler: GNU 4.9.2
Build host: smithers

Config

  • 900 hosts checks, 11000 services checks
  • zones global (on every node), master (on master only)

Example of service:

apply Service "HTTPS" {
import "https"

assign where host.vars.http_uri != ""
groups = [ "HTTPS" ]
command_endpoint = host.command_endpoint
vars.http_uri = host.vars.http_uri
vars.http_vhost = "$host.name$"
}

We're using concurrent_checks = 50 on master and satellites. Every reload of master makes load averages over 100 on satellites (VPS with 6vCPU and 4GB RAM). No matter which value in concurrent_checks is set.

-pb

@bezdek
Copy link

bezdek commented Sep 15, 2017

Update: All nodes runs r2.7.0-1 now and problem still exists. After every reload of master after a while I can see over 400 checks on satellites per second. If we are trying to run all checks (about 12000), it ends with: Remote Icinga instance 'satelliteX' is not connected to 'master' and master node starts to notify false-positives. In production, we have to disabled some tests to not overload satellites nodes.

Content of checker.conf on all nodes:

/etc/icinga2/features-enabled/checker.conf

  library "checker"
  
  object CheckerComponent "checker" {
          concurrent_checks = 50
  }

@N-o-X N-o-X self-assigned this Jan 16, 2018
N-o-X added a commit that referenced this issue Jan 29, 2018
N-o-X added a commit that referenced this issue Jan 29, 2018
N-o-X added a commit that referenced this issue Jan 29, 2018
N-o-X added a commit that referenced this issue Jan 29, 2018
N-o-X added a commit that referenced this issue Feb 5, 2018
N-o-X added a commit that referenced this issue Feb 5, 2018
@N-o-X N-o-X closed this as completed in e282771 Feb 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/checks Check execution and results bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants