New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dev.icinga.com #10638] Regenerate the _api/active-stage, _api/active.conf and _api/include.conf files when they're deleted #3668
Comments
Updated by mfriedrich on 2016-03-18 16:14:10 +00:00
|
Updated by mfriedrich on 2016-04-01 11:38:29 +00:00
|
Updated by mfriedrich on 2016-04-01 11:39:50 +00:00
|
Updated by mfriedrich on 2016-04-01 11:42:44 +00:00
We should implement that for the runtime create objects which are using the api packages internally. Not with highest priority but it would probably help with support. |
Updated by gbeutner on 2016-08-25 16:11:28 +00:00
|
Updated by mfriedrich on 2016-11-09 14:54:30 +00:00
|
Updated by mfriedrich on 2016-12-07 17:17:58 +00:00
|
Updated by mfriedrich on 2017-01-09 15:43:08 +00:00
|
Updated by mfriedrich on 2017-01-09 15:44:03 +00:00
|
Updated by mfriedrich on 2017-01-09 15:44:39 +00:00
|
FYI, I had this problem on my 2.5.4 standalone server. The _api/ folder had gotten corrupted somehow; it was missing a bunch of files such as active.conf, include.conf, etc. I was able to fix it by blowing away /var/lib/icinga2/api/packages/_api and restarting icinga. This resulted in my missing files (active.conf, etc) being recreated automatically. Downtimes are now working correctly. |
At some point the stageName is empty, thus creating such a mess. It is on my TODO list to find out why. |
At some point I have tested a master-satellite setup which didn't work out for me. Thus I have reverted all configuration files back to the standalone configuration. I think problems started after that though I can't tell for sure... |
Workaround for manually re-creating such:
You can of course try it in different ways, but that one will prevent you from additional restarts. If you're planning to manually restore the files, their structure is described inside
Example: My stage name is
This should allow you to reconstruct the files manually, just look where the stage name is used. If you're running into the problem that there's a conf.d/ directory in the top level of the "_api" package directory, safely move its content to stagename/conf.d and verify that all include.conf files are properly initialized. If you happen to have such a case, I'd appreciate a copy of that as tarball (remove sensitive host details beforehand). |
Thanks. This is very useful information.
On Sun, Feb 26, 2017 at 3:20 AM, Michael Friedrich ***@***.*** > wrote:
Workaround for manually re-creating such:
---
Michael Martinez
http://www.michael--martinez.com
|
I was not able to reproduce this in a problematic way. All I managed to get were two stages for one node, this happens thanks to us happily performing surgery on files in parallel, which could easily be the cause for the other problems. The only solution @gunnarbeutner and could come up with right now is using a mutex whenever we write, read and activate stages. |
At some point the stageDir string is empty. We should at least log/break when this happens to ensure data integrity of existing files. |
Next steps:
|
Maybe Critical instead? Throwing an exception seems unnecessary. refs #3668
Tests worked (Script below). But there where no issues like the ones described. I also removed the log message about the lacking active-stage, because in some places it gets called it does not matter whether it's empty or not and we have the lock in cases where race conditions may happen. About the missing files: Script I used for testing:
|
@Crunsher do you mean that the include.conf files modified by the user should be re-created on each request? I would strongly advise against it for performance reasons. Users must not edit the _api package, and the daemon must rely on the fact it is the owner for these files. If the daemon puts out garbage, that's the mentioned bug being fixed. But I would not care if the package remains broken because of a manual user change in there. |
I've created a PR out of the fix branch, so it is not forgotten for reviews. |
@dnsmichi Gods no! Currently we re-create it if it does not exist on startup (covers initial creation). So I guess the locks/make atomic fixes this bug then |
Ok, thanks, then the PR of yours should be merged and we bug anyone who encounters the issue reliably to test the snapshot packages then. |
Maybe Critical instead? Throwing an exception seems unnecessary. refs #3668
The bug is not fixed, we see it in v2.8. |
I've faced with the same issue. How can I fix it? I've tested it on v2.6 and on v2.9.
|
|
This issue has been migrated from Redmine: https://dev.icinga.com/issues/10638
Created by gbeutner on 2015-11-16 06:50:00 +00:00
Assignee: mfriedrich
Status: Assigned
Target Version: Backlog
Last Update: 2017-01-09 15:44:03 +00:00 (in Redmine)
Relations:
The text was updated successfully, but these errors were encountered: