Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev.icinga.com #6550] add log message for invalid performance data #1723

Closed
icinga-migration opened this issue Jun 24, 2014 · 20 comments
Closed
Labels
bug Something isn't working
Milestone

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/6550

Created by gvegidy on 2014-06-24 00:06:34 +00:00

Assignee: gvegidy
Status: Resolved (closed on 2014-08-05 09:40:39 +00:00)
Target Version: 2.0.2
Last Update: 2014-08-05 10:19:15 +00:00 (in Redmine)

Icinga Version: 2.0.0

I have seen that some performance data is not send to carbon-cache by the GraphiteWriter.

One example I've seen is the output by check_ups:

[2014-06-24 01:27:44 +0200] debug/CheckerComponent: Check finished for object 'localhost!UPS'
[2014-06-24 01:27:44 +0200] notice/Process: PID 3642 terminated with exit code 0
[2014-06-24 01:27:44 +0200] debug/DbEvents: add log entry history for 'localhost!UPS'
[2014-06-24 01:27:44 +0200] debug/DbEvents: add service check history for 'localhost!UPS'
[2014-06-24 01:27:44 +0200] debug/GraphiteWriter: GraphiteWriter: Add to metric list:'icinga.localhost.UPS.state 0 1403566064
'.
[2014-06-24 01:27:44 +0200] debug/GraphiteWriter: GraphiteWriter: Add to metric list:'icinga.localhost.UPS.current_attempt 1 1403566064
'.
[2014-06-24 01:27:44 +0200] debug/GraphiteWriter: GraphiteWriter: Add to metric list:'icinga.localhost.UPS.max_check_attempts 3 1403566064
'.
[2014-06-24 01:27:44 +0200] debug/GraphiteWriter: GraphiteWriter: Add to metric list:'icinga.localhost.UPS.state_type 1 1403566064
'.
[2014-06-24 01:27:44 +0200] debug/GraphiteWriter: GraphiteWriter: Add to metric list:'icinga.localhost.UPS.reachable 1 1403566064
'.
[2014-06-24 01:27:44 +0200] debug/GraphiteWriter: GraphiteWriter: Add to metric list:'icinga.localhost.UPS.latency 0 1403566064
'.
[2014-06-24 01:27:44 +0200] debug/GraphiteWriter: GraphiteWriter: Add to metric list:'icinga.localhost.UPS.execution_time 0.00823379 1403566064
'.
[2014-06-24 01:27:44 +0200] debug/DbObject: Updating object vars for 'localhost!UPS'

[...]

[2014-06-24 01:27:44 +0200] debug/IdoPgsqlConnection: Query: UPDATE icinga_servicestatus ... perfdata = 'voltage=236600mV;;;0 battery=100%;;;0;100 load=20%;;;0;100 temp=42degC;;;0',...

When looking at the performance data returned from the plugin, I don't see any obvious problems, except maybe the units aren't on the list defined at https://www.monitoring-plugins.org/doc/guidelines.html#AEN200. But since the units aren't sent to carbon-cache anyway, this should not matter.

If some performance data is ignored because it is in an illegal format, this should result in an error message by icinga.

Changesets

2014-08-04 22:57:18 +00:00 by gvegidy 21cd1d9

Add warning messages when performance data could not be parsed or not be sent to Graphite.

refs #6550

2014-08-04 22:57:29 +00:00 by gvegidy 7cfcabf

Beautify GraphiteWriter debug logging

refs #6550

2014-08-05 09:18:19 +00:00 by gvegidy b26220c

Add warning messages when performance data could not be parsed or not be sent to Graphite

refs #6550

2014-08-05 09:18:19 +00:00 by gvegidy 9a7946c

Beautify GraphiteWriter debug logging

refs #6550

2014-08-05 09:18:42 +00:00 by gbeutner 70c5106

Merge branch 'fix/relax-graphite-uom-6550'

fixes #6550

2014-08-05 09:23:36 +00:00 by gvegidy a4b7984

Add warning messages when performance data could not be parsed or not be sent to Graphite

refs #6550

2014-08-05 09:23:41 +00:00 by gvegidy de54850

Beautify GraphiteWriter debug logging

refs #6550

Relations:

@icinga-migration
Copy link
Author

Updated by mhoyer on 2014-07-27 16:13:34 +00:00

Metrics with comma-separated values seen to be ignored.

perfdata: 'Memory usage'=7598,92MB
--> no GraphiteWriter debug-log entry
--> no metric transfered to graphite

perfdata: 'Memory usage'=7598.92MB
--> GraphiteWriter debug-log: Add to metric list:'icinga.testhost.ram.Memory_usage 7968045137.9200001 1406477365
--> tcpdump output: icinga.testhost.ram.Memory_usage 7968045137.9200001 1406477365

@icinga-migration
Copy link
Author

Updated by gbeutner on 2014-07-30 09:48:15 +00:00

perfdata: 'Memory usage'=7598,92MB

isn't valid as far as the plugin spec is concerned.

@icinga-migration
Copy link
Author

Updated by gvegidy on 2014-07-30 11:16:36 +00:00

gunnarbeutner wrote:

> perfdata: 'Memory usage'=7598,92MB
> 

isn't valid as far as the plugin spec is concerned.

correct. This is probably a locale issue, the check is locale-dependent and the system is using e.g. a German locale. This would be a bug in the check, it must be locale-independent and always adhere to the plugin spec, regardless what the system uses as locale.

So the issue of mhoyer is not related to this bug.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2014-07-30 12:37:33 +00:00

  • Category set to Perfdata
  • Target Version set to 2.0.2
  • Estimated Hours set to 4

Required changes:

  • add a log warning if the perfdata parser fails
  • reproduce the issue
  • estimate time to fix
  • add unit tests

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2014-07-30 13:34:38 +00:00

  • Category changed from Perfdata to libicinga

I do see a problem with

When looking at the performance data returned from the plugin, I don't see any obvious problems, except maybe the units aren't on the list defined at https://www.monitoring-plugins.org/doc/guidelines.html#AEN200. But since the units aren't sent to carbon-cache anyway, this should not matter.

Performance data is parsed by Icinga 2, and invalid UOMs are not tolerated but result in an error.

@icinga-migration
Copy link
Author

Updated by gvegidy on 2014-07-30 22:53:39 +00:00

dnsmichi wrote:

Performance data is parsed by Icinga 2, and invalid UOMs are not tolerated but result in an error.

looks like you are right: I created a dummy plugin returning valid performance data without unit. That was given to carbon. Then I just added "degC" as UOM like in the example above - the whole performance data now was ignored by icinga.

So strictly speaking Icinga is behaving correctly by enforcing the UOMs as defined in the plugin specification. But is that really a good idea?

  • previous monitoring systems like Nagios and Icinga 1 don't enforce the UOMs. A lot of check plugins rely on this, the check_ups I posted the output above of is one of many examples
  • there are tons (literally speaking;) of units missing in the spec that make sense in an IT monitoring context: °C, Volts, Amps, Watts, VA, Hz, RPM, dBm, K|M|G|T|Bytes per Second/ms/minute, ... - you can collect a table with the most common ones, but I don't think you can get a definitive list for all cases
  • If you take the spec seriously, there is no way to specify most of these other UOMs: "a. no unit specified - assume a number (int or float) of things (eg, users, processes, load averages)". That does not apply to e.g. °C.
  • the spec is buggy: "d. B - bytes (also KB, MB, TB)" - do you see GB there? Does it make any sense to leave out GB there? I guess nobody noticed this yet because this part of the spec isn't taken seriously by anyone
  • We don't send any UOM to Graphite, it is designed to work completely without UOMs.
  • The strict UOM check violates the http://en.wikipedia.org/wiki/Robustness\_principle

So I think we should make a sanity check on the character set and length of the UOM to make sure we got the parsing of the data right and there are no security holes in backends that handle the UOM. But we should not check against a given UOM whitelist like now.

@icinga-migration
Copy link
Author

Updated by gbeutner on 2014-07-31 11:32:00 +00:00

Icinga 2 is trying to parse the performance data values because we need to normalize the units (e.g. GB -> byte, kB -> byte). There are a number of issues here:

  • The spec is incomplete/wrong: There's no "K" prefix (should be "k" or "Ki"). "GB" is obviously missing.
  • A lot of plugins are horribly broken: Problems with different locales: 5.3 vs. 5,3; perfdata separators (5,3,2 instead of 5;3;2), etc.

I suppose we could (and should) relax the rules where possible.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2014-07-31 14:42:28 +00:00

Still, it's the Monitoring Plugin's project "Plugin API" definition, and we should carefully evaluate how to extend or soften the existing limitations. Even if there are plenty of plugin violating that api - it still stands to fix them providing proper performance data, even without UOM.

If we consider the current spec incomplete, we should discuss and document an extension to it, but also discuss it with the plugin developers.

@icinga-migration
Copy link
Author

Updated by gvegidy on 2014-07-31 22:19:49 +00:00

gunnarbeutner wrote:

5.3 vs. 5,3; perfdata separators (5,3,2 instead of 5;3;2), etc.

we can't accept a locale using "," as decimal separator and at the same time allow "," as perfdata separator.

I suggest we implement (and document) a list of known good UOMs. These are normalized. All other unit-strings are character sanitized and then passed through.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2014-08-04 13:28:24 +00:00

Changes (first step):

  • Add a warning log message when performance data cannot be parsed

@icinga-migration
Copy link
Author

Updated by gvegidy on 2014-08-04 20:14:46 +00:00

  • Status changed from New to Assigned
  • Assigned to set to gvegidy

@icinga-migration
Copy link
Author

Updated by gvegidy on 2014-08-04 23:05:16 +00:00

Does anybody see a security risk in sending the pure, unparsed perfdata to the log in the case of a parse error? Or should we somehow encode some chars? Which ones?

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2014-08-05 08:46:45 +00:00

  • Relates set to 6851

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2014-08-05 08:47:11 +00:00

  • Subject changed from GraphiteWriter ignores some performance data to add log message for invalid performance data

@icinga-migration
Copy link
Author

Updated by gbeutner on 2014-08-05 09:07:32 +00:00

Our plan for 2.0.2 is to just add a log message for invalid performance data (i.e. what you did in your branch) and postpone updating the parser until at least 2.1 (#6851). I'm going to review and merge your branch.

@icinga-migration
Copy link
Author

Updated by gbeutner on 2014-08-05 09:16:45 +00:00

One minor issue is the ex.what() call. Depending on the exception type this string is often empty. I'm going to replace it with DiagnosticInformation(ex) - which isn't entirely user-friendly but is probably the best we can do atm.

@icinga-migration
Copy link
Author

Updated by gbeutner on 2014-08-05 09:20:05 +00:00

  • Status changed from Assigned to Resolved
  • Done % changed from 0 to 100

Applied in changeset 70c5106.

@icinga-migration
Copy link
Author

Updated by gvegidy on 2014-08-05 09:20:49 +00:00

  • Status changed from Resolved to Assigned
  • Done % changed from 100 to 0

gunnarbeutner wrote:

One minor issue is the ex.what() call. Depending on the exception type this string is often empty.

what is the problem: then we don't have any more info to show. But in the most common case this looks nice and beautiful:

Error parsing performance data 'voltage=231400mV;;;0 battery=100%;;;0;100 load=20%;;;0;100 temp=42degC;;;0': Invalid performance data unit: mv

I'm going to replace it with DiagnosticInformation(ex) - which isn't entirely user-friendly

"isn't entirely user-friendly" is an understatement in this case.

I propse to keep the ex.what() in this case.

@icinga-migration
Copy link
Author

Updated by gbeutner on 2014-08-05 09:24:20 +00:00

Mkay. Updated again.

@icinga-migration
Copy link
Author

Updated by gvegidy on 2014-08-05 09:40:39 +00:00

  • Status changed from Assigned to Resolved
  • Done % changed from 0 to 100

thanks

@icinga-migration icinga-migration added bug Something isn't working libicinga labels Jan 17, 2017
@icinga-migration icinga-migration added this to the 2.0.2 milestone Jan 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant