Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev.icinga.com #9773] Add log for missing EventCommand for command_endpoints #3196

Closed
icinga-migration opened this issue Jul 29, 2015 · 7 comments
Labels
area/distributed Distributed monitoring (master, satellites, clients) bug Something isn't working
Milestone

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/9773

Created by emptywee on 2015-07-29 20:20:33 +00:00

Assignee: mfriedrich
Status: Resolved (closed on 2015-07-31 14:05:03 +00:00)
Target Version: 2.3.9
Last Update: 2015-08-12 08:34:00 +00:00 (in Redmine)

Icinga Version: 2.3.8
Backport?: Already backported
Include in Changelog: 1

Hello.
Created a simple eventcommand:

object EventCommand "cmd_service_restart" {
  import "plugin-event-command"

  command = "/usr/bin/test $service.state_id$ -gt 0 && /usr/bin/sudo /sbin/service $service_name$ restart"
}

Created a service:

apply Service "crond" {
  import "generic-service"

  check_command = "procs"

  if (host.vars.remote_client) {
    command_endpoint = host.vars.remote_client
  }
  vars.procs_command = "crond"
  vars.procs_critical = "1:"

  event_command = "cmd_service_restart"
  vars.service_name = host.vars.crond_name

  assign where host.vars.os == "Linux"
}

Defined a host with the following template:

template Host "generic-linux-host" {
  import "generic-host"

  vars.os = "Linux"

  vars.disks["disk"] = {
  }

  vars.disks["disk /"] = {
    disk_partitions = "/"
  }

  vars.notification["mail"] = {
    groups = [ "icingaadmins" ]
  }

  vars.crond_name = "crond"
  enable_event_handler = true
}

Brought down crond on the remote host and seeing this on the checker node:

[2015-07-29 19:29:54 +0000] notice/Checkable: State Change: Checkable dc1udtlhtst02.stack.qadev.corp!crond soft state change from OK to CRITICAL detected.
[2015-07-29 19:29:54 +0000] notice/Checkable: Executing event handler 'cmd_service_restart' for service 'dc1udtlhtst02.stack.qadev.corp!crond'
[2015-07-29 19:29:54 +0000] notice/ApiListener: Sending message to 'dc1udtlhtst02.stack.qadev.corp'
[2015-07-29 19:29:54 +0000] notice/ApiListener: Relaying 'event::CheckResult' message

EventCommand object on the checker:

Object 'cmd_service_restart' of type 'EventCommand':
  % declared in '/var/lib/icinga2/api/zones/global-templates/events.conf', lines 1:0-1:40
  * __name = "cmd_service_restart"
  * arguments = null
  * command = "/usr/bin/test $service.state_id$ -gt 0 && /usr/bin/sudo /sbin/service $service_name$ restart"
    % = modified in '/var/lib/icinga2/api/zones/global-templates/events.conf', lines 4:3-4:106
  * env = null
  * execute
    % = modified in '/usr/share/icinga2/include/command.conf', lines 47:2-47:22
    * type = "Function"
  * name = "cmd_service_restart"
  * templates = [ "cmd_service_restart", "plugin-event-command" ]
    % = modified in '/var/lib/icinga2/api/zones/global-templates/events.conf', lines 1:0-1:40
    % = modified in '/usr/share/icinga2/include/command.conf', lines 46:1-46:44
  * timeout = 60
  * type = "EventCommand"
  * vars = null
  * zone = "global-templates"

Apparently, no command is being really executed anywhere. I even tried the example from the docs with "by_ssh" event. Same result. Not sure how to debug it further. This is really critical when there's no ability to re-act for events.

# icinga2 --version
icinga2 - The Icinga 2 network monitoring daemon (version: v2.3.8)

Copyright (c) 2012-2015 Icinga Development Team (https://www.icinga.org)
License GPLv2+: GNU GPL version 2 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Application information:
  Installation root: /usr
  Sysconf directory: /etc
  Run directory: /var/run
  Local state directory: /var
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /var/run/icinga2/icinga2.pid
  Application type: icinga/IcingaApplication

System information:
  Operating system: Linux
  Operating system version: 2.6.32-573.1.1.el6.x86_64
  Architecture: x86_64
  Distribution: Red Hat Enterprise Linux Server release 6.7 (Santiago)

I hope I am not missing anything myself here.

Changesets

2015-07-31 14:04:03 +00:00 by (unknown) 0712a02

Add a warning if EventCommand is not found when using command_endpoint

fixes #9773

2015-08-12 08:33:44 +00:00 by (unknown) 1b3f377

Add a warning if EventCommand is not found when using command_endpoint

fixes #9773

Relations:

@icinga-migration
Copy link
Author

Updated by emptywee on 2015-07-30 14:15:24 +00:00

It seems like this is not fired when command_endpoint is set to a remote host. When I brought down crond service on the checker itself, eventcommand was executed.
When event happens for a service with command_endpoint set to a remote client address, here's what happens (I have added more debug on the checker):

Checker debug.log:

[2015-07-30 14:05:27 +0000] notice/Checkable: State Change: Checkable dc1udtlhtst02.stack.qadev.corp!crond soft state change from OK to CRITICAL detected.
[2015-07-30 14:05:27 +0000] notice/Checkable: Executing event handler 'cmd_service_restart' for service 'dc1udtlhtst02.stack.qadev.corp!crond'
[2015-07-30 14:05:27 +0000] notice/Checkable: Firing ec->Execute. Handler 'cmd_service_restart' for service 'dc1udtlhtst02.stack.qadev.corp!crond'
[2015-07-30 14:05:27 +0000] notice/ApiListener: Sending message to 'dc1udtlhicn01.stack.qadev.corp'
[2015-07-30 14:05:27 +0000] notice/Checkable: if endpoint true. Handler 'cmd_service_restart' for service 'dc1udtlhtst02.stack.qadev.corp!crond'
[2015-07-30 14:05:27 +0000] notice/Checkable: Params set for Host: dc1udtlhtst02.stack.qadev.corp. Handler 'cmd_service_restart' for service 'dc1udtlhtst02.stack.qadev.corp!crond'
[2015-07-30 14:05:27 +0000] notice/Checkable: Params set for Service: crond. Handler 'cmd_service_restart' for service 'dc1udtlhtst02.stack.qadev.corp!crond'
[2015-07-30 14:05:27 +0000] notice/ApiListener: Sending message to 'dc1udtlhtst02.stack.qadev.corp'
[2015-07-30 14:05:27 +0000] notice/Checkable: Listener true, sending message (sync). Handler 'cmd_service_restart' for service 'dc1udtlhtst02.stack.qadev.corp!crond'

Remote client:

[2015-07-30 14:05:27 +0000] notice/ApiClient: Received 'event::ExecuteCommand' message from 'dc1udtlhtst01.stack.qadev.corp'
[2015-07-30 14:05:27 +0000] notice/Process: Running command '/usr/lib64/nagios/plugins/check_procs' '-C' 'crond' '-c' '1:' '-w' '250': PID 28922
[2015-07-30 14:05:27 +0000] notice/Process: PID 28922 ('/usr/lib64/nagios/plugins/check_procs' '-C' 'crond' '-c' '1:' '-w' '250') terminated with exit code 2
[2015-07-30 14:05:27 +0000] notice/ApiListener: Sending message to 'dc1udtlhtst01.stack.qadev.corp'
[2015-07-30 14:05:27 +0000] notice/ApiClient: Received 'event::ExecuteCommand' message from 'dc1udtlhtst01.stack.qadev.corp'
[2015-07-30 14:05:28 +0000] notice/ApiClient: Received 'log::SetLogPosition' message from 'dc1udtlhtst01.stack.qadev.corp'
[2015-07-30 14:05:29 +0000] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 0; Checks/s: 0
[2015-07-30 14:05:29 +0000] debug/ApiListener: Not connecting to Endpoint 'dc1udtlhtst02.stack.qadev.corp' because that's us.
[2015-07-30 14:05:29 +0000] debug/ApiListener: Not connecting to Endpoint 'dc1udtlhtst01.stack.qadev.corp' because we're already connected to it.
[2015-07-30 14:05:29 +0000] notice/ApiListener: Setting log position for identity 'dc1udtlhtst01.stack.qadev.corp': 2015/07/29 13:16:12

It seems like the remote client receives the message, but ignores it for some reason. I am going to add more debug, maybe I'll find a clue.

@icinga-migration
Copy link
Author

Updated by emptywee on 2015-07-30 15:20:35 +00:00

Yeah, I think I figured it out. Remote client was looking for EventCommand 'cmd_service_restart':

[2015-07-30 15:16:52 +0000] notice/ApiEvents: *** command_type is event command
[2015-07-30 15:16:52 +0000] notice/ApiEvents: *** EventCommand::GetByname(cmd_service_restart) returned false.

So I have to register them on each remote client. Probably not a bug. Sorry, guys :)

@icinga-migration
Copy link
Author

Updated by emptywee on 2015-07-30 15:26:46 +00:00

Yes, that was it. Please, add this do debug log with something meaningful? This would help a lot and save time for somebody like me in the future :)

lib/icinga/apievents.cpp:

        } else if (command_type == "event_command") {
                if (!EventCommand::GetByName(command))
                {
                Log(LogNotice, "ApiEvents")
                    << "EventCommand::GetByname(" << command << ") returned false. Probably this EventCommand object is not defined on this Icinga2 instance.";

                        return Empty;
                }
        } else
                return Empty;

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-07-31 13:33:51 +00:00

  • Subject changed from EventCommand is not fired up to Add log for missing EventCommand for command_endpoints
  • Category changed from Checker to Cluster
  • Status changed from New to Assigned
  • Assigned to set to mfriedrich
  • Target Version set to 2.4.0
  • Estimated Hours set to 0.1

I'll add such a log message as warning - though you'll only see that on the remove instance. The check command is sent back, maybe we'll come up with a better approach similar to #9749.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-07-31 13:34:01 +00:00

  • Relates set to 9749

@icinga-migration
Copy link
Author

Updated by Anonymous on 2015-07-31 14:05:03 +00:00

  • Status changed from Assigned to Resolved
  • Done % changed from 0 to 100

Applied in changeset 0712a02.

@icinga-migration
Copy link
Author

Updated by gbeutner on 2015-08-12 08:34:00 +00:00

  • Target Version changed from 2.4.0 to 2.3.9
  • Backport? changed from TBD to Yes

@icinga-migration icinga-migration added bug Something isn't working area/distributed Distributed monitoring (master, satellites, clients) labels Jan 17, 2017
@icinga-migration icinga-migration added this to the 2.3.9 milestone Jan 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/distributed Distributed monitoring (master, satellites, clients) bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant