Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev.icinga.com #12611] Enriching debug.log with check output and combining error code and output in one log msg #4613

Closed
icinga-migration opened this issue Sep 1, 2016 · 2 comments
Labels
area/checks Check execution and results area/log Logging related enhancement New feature or request

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/12611

Created by saurabh_hirani on 2016-09-01 09:35:07 +00:00

Assignee: (none)
Status: New
Target Version: (none)
Last Update: 2016-09-01 09:35:07 +00:00 (in Redmine)

Backport?: Not yet backported
Include in Changelog: 1

icinga2 debug logs are immensely useful for viewing historical information via log monitoring tools.

As of now a check execution leads to the following debug logs:

[2016-08-31 06:47:57 -0700] notice/Process: Running command '/usr/lib/nagios/plugins/check_nrpe' '-H' '1.2.3.4' '-c' 'check_service' '-t' '30': PID 2847
[2016-08-31 06:47:57 -0700] debug/CheckerComponent: Check finished for object 'hostname!check_service-0'
[2016-08-31 06:47:57 -0700] notice/Process: PID 2847 ('/usr/lib/nagios/plugins/check_nrpe' '-H' '1.2.3.4' '-c' 'check_service' '-t' '30') terminated with exit code 0

This makes finding failed commands easy but doesn't give clarity on the following:

  1. If the check finished message is augmented with the output (with maybe a cap on max output size), it would help grepping out ad-hoc status

e.g.

[2016-08-31 06:47:57 0700] debug/CheckerComponent: Check finished for object 'hostname!check_swap-0' exit code - 2 - output - SSL handshake failure.

the logs would give insight on the failure exit code and the reason in one message.

The counter argument to this would be that alerts would happen on hard states and the user would know about it - but there are checks which go in and out of soft states which can only be viewed through historical log parsing + it gives more power to the developer to build his own analysis tools without resorting to any custom UIs.

Also some thought process around using structured logging (e.g. json) which is friendlier to elasticsearch would help in doing a lot of historical data extraction.

@icinga-migration icinga-migration added enhancement New feature or request Checker labels Jan 17, 2017
@gunnarbeutner gunnarbeutner added area/checks Check execution and results and removed Checker labels Feb 7, 2017
@dnsmichi
Copy link
Contributor

dnsmichi commented Feb 8, 2017

Such logging exists for plugins which return an exit code > 3 which indicates a real error upon execution.

I'm not sure about the impact of adding the output variable to the notice logging, but you can try patching it yourself in Process::DoEvents().

@dnsmichi dnsmichi added the area/log Logging related label Feb 8, 2017
@dnsmichi
Copy link
Contributor

That's not gonna happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/checks Check execution and results area/log Logging related enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants