Skip to content
This repository has been archived by the owner on Jan 15, 2019. It is now read-only.

[dev.icinga.com #9818] status.dat gets lost when filesystem is too small for two copies of that file #465

Open
icinga-migration opened this issue Aug 3, 2015 · 5 comments

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/9818

Created by tolimar on 2015-08-03 11:29:59 +00:00

Assignee: Wolfgang
Status: Assigned
Target Version: (none)
Last Update: 2015-10-30 08:45:13 +00:00 (in Redmine)

Icinga Version: 1.13.3

Hi!

For performance reasons we hold status.dat on a ramdisk of 100MB. This file system had 35MB space available, so we didn't thought of any problems, till we noticed that the Icinga GUI started complain about a missing status.dat file. However, with our status.dat file growing beyond 35MB the file system became to small to contain a second copy for an updated copy of that file.

In the process of not being able to write a temporary status.dat file in that directory, the original status.dat file was also removed.

While it certainly makes sense, to create status.dat as a temporary file, and copy it to the corresponding directory in a save way (I assume it is first copied over with a temporary name and then move to the actual filename) I think the old status.dat should not have been removed in this situation.

Icinga already shows already when it is working on an outdated status.dat file, and that is what I would have prefered in this situation.

For a temporary solution, please update the documentation at http://docs.icinga.org/latest/en/temp\_data.html to reflect that need.

For reference: The error message in icinga.log where the following:
[1438594555] Error: my_fcopy() failed to write to '/var/spool/icinga/ramdisk/status.dat': No space left on device
[1438594555] Error: Unable to rename file '/dev/shm/icinga.tmpCQ6MOG' to '/var/spool/icinga/ramdisk/status.dat': No space left on device
[1438594555] Error: Unable to update status data file '/var/spool/icinga/ramdisk/status.dat': No space left on device

We use the following configuration entries, if that matters:
/etc/icinga/icinga.cfg:
status_file=/var/spool/icinga/ramdisk/status.dat
temp_file=/dev/shm/icinga.tmp
temp_path=/dev/shm

/etc/fstab:
tmpfs /var/spool/icinga/ramdisk tmpfs size=100M 0 0
tmpfs /var/spool/icinga/checkresults tmpfs size=250M 0 0

  1. ls -l /var/spool/icinga/ramdisk
    total 66428
    -rw-r-r- 1 icinga icinga 29168960 Aug 3 11:58 objects.cache
    -rw-r-r- 1 icinga icinga 38704068 Aug 3 13:27 status.dat

Attachments

@icinga-migration
Copy link
Author

Updated by tolimar on 2015-08-07 13:00:07 +00:00

  • File added check_ramdisk_status_status.dat

Should anyone else stumble over this problem: We are now the attached check to get notified shoudl the ramdisk gets to small.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-10-26 09:15:49 +00:00

  • Project changed from Icinga 1.x to Docs
  • Category set to Core
  • Icinga Version set to 1

Imho this is nothing the core could detect itself. Maybe a documentation update helps ... @wolfgang what do you think?

@icinga-migration
Copy link
Author

Updated by Wolfgang on 2015-10-28 20:15:12 +00:00

  • Status changed from New to Assigned
  • Assigned to set to Wolfgang

I'll try to fix that soon.

@icinga-migration
Copy link
Author

Updated by tolimar on 2015-10-30 08:41:53 +00:00

dnsmichi wrote:

Imho this is nothing the core could detect itself. Maybe a documentation update helps ... @wolfgang what do you think?

Thinking about it, I see two different issues:

  1. The missing documentation, that the ramdisk should be large enough.
  2. That the old status.dat file got lost, when the attempt to replace it failed. IMHO it would have been okay to leave the old status.dat in place and let the gui show a warning, that it is outdated. But somehow it got removed completely.

@icinga-migration
Copy link
Author

Updated by tolimar on 2015-10-30 08:45:13 +00:00

tolimar wrote:
[..]

Thinking about it, I see two different issues:
# The missing documentation, that the ramdisk should be large enough.

While checking the doc to propose a change, I noticed, that it is already documented, and I missed it.

Chapter 8.8. "Temporary Data" already states at the very beginning "Add the size of the status file for temporary data"... Sorry for missing that.

Still leaves the removal of the old status.dat file...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant