[dev.icinga.com #2951] fix deleting too old check result files #1066

icinga-migration · 2012-08-05T11:20:01Z

This issue has been migrated from Redmine: https://dev.icinga.com/issues/2951

Created by mfriedrich on 2012-08-05 11:20:01 +00:00

Assignee: mfriedrich
Status: Resolved (closed on 2012-08-31 11:13:38 +00:00)
Target Version: 1.8
Last Update: 2012-08-31 11:13:37 +00:00 (in Redmine)

Icinga Version: 1.7.1
OS Version: Debian

this is a rather common issue - the checkresult dir does not get cleaned after the core reaps the files, and leaves files there, slowing down the overall processing.

as the original diff describes, the initial problem are the "write the checkresult to tmp dir, then move it to checkresult queue, and put a .ok file there as well, telling the core checkresult reaper that files are totally fine to be read". on frequent reloads, this will cause a lot of "not yet finished" checks to stay within the queue, but not having the .ok file there.

the core algorithm on checking if a file is ok, requires to loop all files and stat() if the .ok file is there - which is normally a lot of misses because those old checkresult files won't even be processed anymore. and who got a manual cronjob cleaning that, when the core should do?

that patch should be cherry-picked into 1.8.x trees as well, when done testing.

core: Fix deleting too old check result files
Even under pretty normal circumstances, the check result spool dir
can fill up with a tremendous amount of check result files, which kills
Nagios' performance completely.

The problem is reloads, where old checks may be abandoned in case
they take too long to finish. In that case, half the check result file
is stashed in the spool directory (the other half is only written as
the check returns). With a huge amount of checks and semi-frequent
restarts, the checks will start to accumulate and Nagios will spend
more and more time scanning a huge directory of files where very few of
the check result files have ".ok" files accompanying them, leading to
a ton of cache-misses when we try to stat() the ".ok" file.

This patch fixes it by using the mtime from the stat call earlier in
the chain so even check results without an ".ok" file can be deleted.

Signed-off-by: Andreas Ericsson

Changesets

2012-08-05 11:24:55 +00:00 by mfriedrich 13b11a984d715516414dde3bb706b8e4a6535972

core: fix deleting too old check result files #2951

this is a rather common issue - the checkresult dir does not get cleaned
after the core reaps the files, and leaves files there, slowing down the
overall processing.

as the original diff describes, the initial problem are the "write the
checkresult to tmp dir, then move it to checkresult queue, and put a .ok
file there as well, telling the core checkresult reaper that files are
totally fine to be read". on frequent reloads, this will cause a lot of
"not yet finished" checks to stay within the queue, but not having the
.ok file there.

the core algorithm on checking if a file is ok, requires to loop all
files and stat() if the .ok file is there - which is normally a lot of
misses because those old checkresult files won't even be processed
anymore. and who got a manual cronjob cleaning that, when the core
should do?

refs #2951

2012-08-07 13:30:33 +00:00 by mfriedrich f63541d

core: fix deleting too old check result files #2951

this is a rather common issue - the checkresult dir does not get cleaned
after the core reaps the files, and leaves files there, slowing down the
overall processing.

as the original diff describes, the initial problem are the "write the
checkresult to tmp dir, then move it to checkresult queue, and put a .ok
file there as well, telling the core checkresult reaper that files are
totally fine to be read". on frequent reloads, this will cause a lot of
"not yet finished" checks to stay within the queue, but not having the
.ok file there.

the core algorithm on checking if a file is ok, requires to loop all
files and stat() if the .ok file is there - which is normally a lot of
misses because those old checkresult files won't even be processed
anymore. and who got a manual cronjob cleaning that, when the core
should do?

refs #2951

2012-08-19 17:42:11 +00:00 by mfriedrich e06dadc

core: fix deleting too old check result files #2951

this is a rather common issue - the checkresult dir does not get cleaned
after the core reaps the files, and leaves files there, slowing down the
overall processing.

as the original diff describes, the initial problem are the "write the
checkresult to tmp dir, then move it to checkresult queue, and put a .ok
file there as well, telling the core checkresult reaper that files are
totally fine to be read". on frequent reloads, this will cause a lot of
"not yet finished" checks to stay within the queue, but not having the
.ok file there.

the core algorithm on checking if a file is ok, requires to loop all
files and stat() if the .ok file is there - which is normally a lot of
misses because those old checkresult files won't even be processed
anymore. and who got a manual cronjob cleaning that, when the core
should do?

refs #2951

Relations:

relates #2951

The text was updated successfully, but these errors were encountered:

icinga-migration · 2012-08-31T11:13:38Z

Updated by mfriedrich on 2012-08-31 11:13:38 +00:00

Status changed from Assigned to Resolved
Done % changed from 0 to 100
Icinga Version set to 1
OS Version set to Debian

icinga-migration closed this as completed Aug 31, 2012

icinga-migration added bug Check Results labels Jan 17, 2017

icinga-migration added this to the 1.8 milestone Jan 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dev.icinga.com #2951] fix deleting too old check result files #1066

[dev.icinga.com #2951] fix deleting too old check result files #1066

icinga-migration commented Aug 5, 2012

icinga-migration commented Aug 31, 2012

[dev.icinga.com #2951] fix deleting too old check result files #1066

[dev.icinga.com #2951] fix deleting too old check result files #1066

Comments

icinga-migration commented Aug 5, 2012

icinga-migration commented Aug 31, 2012