Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev.icinga.com #10638] Regenerate the _api/active-stage, _api/active.conf and _api/include.conf files when they're deleted #3668

Closed
icinga-migration opened this issue Nov 16, 2015 · 26 comments · Fixed by #5620
Labels
area/api REST API blocker Blocks a release or needs immediate attention bug Something isn't working

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/10638

Created by gbeutner on 2015-11-16 06:50:00 +00:00

Assignee: mfriedrich
Status: Assigned
Target Version: Backlog
Last Update: 2017-01-09 15:44:03 +00:00 (in Redmine)

Icinga Version: 2.4.0
Backport?: Not yet backported
Include in Changelog: 1


Relations:

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-03-18 16:14:10 +00:00

  • Category set to API
  • Priority changed from Normal to Low

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-04-01 11:38:29 +00:00

  • Relates set to 11499

@icinga-migration
Copy link
Author

icinga-migration commented Apr 1, 2016

Updated by mfriedrich on 2016-04-01 11:39:50 +00:00

  • Subject changed from _Regenerate the _api/active.conf and api/include.conf files when they're deleted to _Regenerate the _api/active-stage, _api/active.conf and api/include.conf files when they're deleted
    mbmif /usr/local/icinga2/etc/icinga2/tests (master) # ls -la /usr/local/icinga2/var/lib/icinga2/api/packages/_api/
    total 24
    drwx------  6 icinga  staff  204 Sep 15  2015 .
    drwx------  4 icinga  staff  136 Dec 10 15:55 ..
    -rw-r--r--  1 icinga  staff   33 Sep 15  2015 active-stage
    -rw-r--r--  1 icinga  staff  450 Sep 15  2015 active.conf
    -rw-r--r--  1 icinga  staff   25 Sep 15  2015 include.conf
    drwx------  5 icinga  staff  170 Sep 15  2015 mbmif.int.netways.de-1442309540-1
    mbmif /usr/local/icinga2/etc/icinga2/tests (master) # ls -la /usr/local/icinga2/var/lib/icinga2/api/packages/_api/mbmif.int.netways.de-1442309540-1/
    total 8
    drwx------  5 icinga  staff  170 Sep 15  2015 .
    drwx------  6 icinga  staff  204 Sep 15  2015 ..
    drwx------  7 icinga  staff  238 Mar 22 21:22 conf.d
    -rw-r--r--  1 icinga  staff  157 Sep 15  2015 include.conf
    drwx------  2 icinga  staff   68 Sep 15  2015 zones.d

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-04-01 11:42:44 +00:00

  • Priority changed from Low to Normal
  • Target Version set to Backlog
  • Parent Id set to 11415

We should implement that for the runtime create objects which are using the api packages internally. Not with highest priority but it would probably help with support.

@icinga-migration
Copy link
Author

Updated by gbeutner on 2016-08-25 16:11:28 +00:00

  • Relates set to 12551

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-11-09 14:54:30 +00:00

  • Parent Id deleted 11415

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-12-07 17:17:58 +00:00

  • Status changed from New to Assigned
  • Assigned to set to mfriedrich

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2017-01-09 15:43:08 +00:00

  • Relates set to 13725

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2017-01-09 15:44:03 +00:00

  • Priority changed from Normal to High

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2017-01-09 15:44:39 +00:00

  • Relates set to 11012

@mwtzzz-zz
Copy link

FYI, I had this problem on my 2.5.4 standalone server. The _api/ folder had gotten corrupted somehow; it was missing a bunch of files such as active.conf, include.conf, etc. I was able to fix it by blowing away /var/lib/icinga2/api/packages/_api and restarting icinga. This resulted in my missing files (active.conf, etc) being recreated automatically. Downtimes are now working correctly.

@dnsmichi
Copy link
Contributor

At some point the stageName is empty, thus creating such a mess. It is on my TODO list to find out why.

@gvde
Copy link

gvde commented Feb 25, 2017

At some point I have tested a master-satellite setup which didn't work out for me. Thus I have reverted all configuration files back to the standalone configuration. I think problems started after that though I can't tell for sure...

@dnsmichi
Copy link
Contributor

dnsmichi commented Feb 26, 2017

Workaround for manually re-creating such:

  • Move the existing directories in ./_api/stagename/conf.d/ to a save place
  • rmdir the "_api" package
  • create a dummy comment via REST API and immediately delete it again (this restores the _api package without a restart)
  • move the backup config into ./_api/stagename/conf.d/ again
  • restart Icinga 2

You can of course try it in different ways, but that one will prevent you from additional restarts.

If you're planning to manually restore the files, their structure is described inside

  • ConfigPackageUtility::WritePackageConfig()
  • ConfigPackageUtility::WriteStageConfig()
  • ConfigPackageUtility::ActivateStage()

Example: My stage name is mbmif.int.netways.de-1442309540-1

mbmif /usr/local/icinga2/var/lib/icinga2/api/packages/_api (master *) # ls -lah
total 24
drwx------  6 icinga  icinga   204B Apr  1  2016 .
drwx------  4 icinga  icinga   136B Dec 10  2015 ..
-rw-r--r--  1 icinga  icinga    33B Sep 15  2015 active-stage
-rw-r--r--  1 icinga  icinga   450B Sep 15  2015 active.conf
-rw-r--r--  1 icinga  icinga    25B Sep 15  2015 include.conf
drwx------  5 icinga  icinga   170B Nov 21 15:24 mbmif.int.netways.de-1442309540-1
mbmif /usr/local/icinga2/var/lib/icinga2/api/packages/_api (master *) # cat active-stage
mbmif.int.netways.de-1442309540-1
mbmif /usr/local/icinga2/var/lib/icinga2/api/packages/_api (master *) # cat active.conf
if (!globals.contains("ActiveStages")) {
  globals.ActiveStages = {}
}

if (globals.contains("ActiveStageOverride")) {
  var arr = ActiveStageOverride.split(":")
  if (arr[0] == "_api") {
    if (arr.len() < 2) {
      log(LogCritical, "Config", "Invalid value for ActiveStageOverride")
    } else {
      ActiveStages["_api"] = arr[1]
    }
  }
}

if (!ActiveStages.contains("_api")) {
  ActiveStages["_api"] = "mbmif.int.netways.de-1442309540-1"
}
mbmif /usr/local/icinga2/var/lib/icinga2/api/packages/_api (master *) # cat include.conf
include "*/include.conf"
mbmif /usr/local/icinga2/var/lib/icinga2/api/packages/_api (master *) # ls -lah  mbmif.int.netways.de-1442309540-1/
total 8
drwx------  5 icinga  icinga   170B Nov 21 15:24 .
drwx------  6 icinga  icinga   204B Apr  1  2016 ..
drwx------  9 icinga  icinga   306B May 10  2016 conf.d
-rw-r--r--  1 icinga  icinga   157B Sep 15  2015 include.conf
drwx------  2 icinga  icinga    68B Sep 15  2015 zones.d
mbmif /usr/local/icinga2/var/lib/icinga2/api/packages/_api (master *) # cat mbmif.int.netways.de-1442309540-1/include.conf
include "../active.conf"
if (ActiveStages["_api"] == "mbmif.int.netways.de-1442309540-1") {
  include_recursive "conf.d"
  include_zones "_api", "zones.d"
}

This should allow you to reconstruct the files manually, just look where the stage name is used.

If you're running into the problem that there's a conf.d/ directory in the top level of the "_api" package directory, safely move its content to stagename/conf.d and verify that all include.conf files are properly initialized.

If you happen to have such a case, I'd appreciate a copy of that as tarball (remove sensitive host details beforehand).

@mwtzzz-zz
Copy link

mwtzzz-zz commented Feb 26, 2017 via email

@Crunsher
Copy link
Contributor

I was not able to reproduce this in a problematic way. All I managed to get were two stages for one node, this happens thanks to us happily performing surgery on files in parallel, which could easily be the cause for the other problems.

The only solution @gunnarbeutner and could come up with right now is using a mutex whenever we write, read and activate stages.

@dnsmichi
Copy link
Contributor

At some point the stageDir string is empty. We should at least log/break when this happens to ensure data integrity of existing files.

@Crunsher Crunsher self-assigned this Sep 20, 2017
Crunsher added a commit that referenced this issue Sep 20, 2017
@Crunsher
Copy link
Contributor

Next steps:

  • Test with parallel requests
  • Add log messages in case some names that should not be empty are

Crunsher added a commit that referenced this issue Sep 20, 2017
Maybe Critical instead? Throwing an exception seems unnecessary.

refs #3668
@Crunsher
Copy link
Contributor

Tests worked (Script below). But there where no issues like the ones described. I also removed the log message about the lacking active-stage, because in some places it gets called it does not matter whether it's empty or not and we have the lock in cases where race conditions may happen.

About the missing files:
Thanks to the locks they should not be overwritten anymore, if the user deletes them they are regenerated at startup. How should we proceed with this?

Script I used for testing:

for i in `seq 1 20`; do
	curl -k -s -u root:icinga -H 'Accept: application/json' -X POST "https://localhost:5665/v1/config/packages/example-cmdb${i}" &
done
for i in `seq 1 20`; do
		echo "{\"files\": {\"conf.d/test.conf\": \"object Host \\\"cmdb-host${i}\\\" { check_command = \\\"flatter\\\" }\"}}" | \
		curl -k -s -u root:icinga -H 'Accept: application/json' -X POST \
		-d @- "https://localhost:5665/v1/config/stages/example-cmdb${i}" 
done

@dnsmichi
Copy link
Contributor

@Crunsher do you mean that the include.conf files modified by the user should be re-created on each request? I would strongly advise against it for performance reasons. Users must not edit the _api package, and the daemon must rely on the fact it is the owner for these files. If the daemon puts out garbage, that's the mentioned bug being fixed. But I would not care if the package remains broken because of a manual user change in there.

@dnsmichi
Copy link
Contributor

I've created a PR out of the fix branch, so it is not forgotten for reviews.

@Crunsher
Copy link
Contributor

Crunsher commented Oct 4, 2017

@dnsmichi Gods no! Currently we re-create it if it does not exist on startup (covers initial creation). So I guess the locks/make atomic fixes this bug then

@dnsmichi
Copy link
Contributor

dnsmichi commented Oct 6, 2017

Ok, thanks, then the PR of yours should be merged and we bug anyone who encounters the issue reliably to test the snapshot packages then.

@gunnarbeutner gunnarbeutner removed this from the 2.8.0 milestone Oct 16, 2017
Crunsher added a commit that referenced this issue Nov 2, 2017
Crunsher added a commit that referenced this issue Nov 2, 2017
Maybe Critical instead? Throwing an exception seems unnecessary.

refs #3668
@Igor-Petrov
Copy link

The bug is not fixed, we see it in v2.8.
I opened a forum thread regarding this bug https://monitoring-portal.org/t/host-is-not-visible-via-api/2142

@artem-kosenko
Copy link

I've faced with the same issue. How can I fix it? I've tested it on v2.6 and on v2.9.

  • add host via API
  • restart icinga service
  • remove host via API
  • add host via API
  • issue: there is no newly added host in the web interface.

@artem-kosenko
Copy link

artem-kosenko commented Aug 30, 2018

/var/lib/icinga2/api/packages/_api/
├── active.conf
├── active-stage
├── include.conf
└── host-name.example.com-1535636549-1
    ├── conf.d
    │   ├── downtimes
    │   └── hosts
    │       └── test-host.example.conf
    ├── include.conf
    └── zones.d

# cat /var/lib/icinga2/api/packages/_api/active.conf
if (!globals.contains("ActiveStages")) {
  globals.ActiveStages = {}
}

if (globals.contains("ActiveStageOverride")) {
  var arr = ActiveStageOverride.split(":")
  if (arr[0] == "_api") {
    if (arr.len() < 2) {
      log(LogCritical, "Config", "Invalid value for ActiveStageOverride")
    } else {
      ActiveStages["_api"] = arr[1]
    }
  }
}

if (!ActiveStages.contains("_api")) {
  ActiveStages["_api"] = "host-name.example.com-1535636549-1"
}

# cat /var/lib/icinga2/api/packages/_api/active-stage
host-name.example.com-1535636549-1

# cat /var/lib/icinga2/api/packages/_api/include.conf
include "*/include.conf"

# cat /var/lib/icinga2/api/packages/_api/host-name.example.com-1535636549-1/conf.d/hosts/test-host.example.com.conf 
object Host "test-host.example.com" {
	import "P2-host"

	address = "test-host.example.com"
	display_name = "test-host.example.com"
	notes = "my notes"
	notes_url = "http://test-host.example.com"
	vars["args"] = {
		services = {
			check_snmp_mem = {
				arg1 = "someone"
				arg2 = "90,0"
				arg3 = "100,30"
				name = "MEMORY"
			}
			ftp = {
				arg1 = 20.000000
				arg2 = 10.000000
				name = "FTP"
			}
		}
	}
	vars["facts"] = {
		nrpe = [ "check_disk", "check_file_exist" ]
		services = [ "ssh", "ftp" ]
		services_p3 = [ "load", "check_snmp_mem" ]
	}
	version = 1535637140.067982
	zone = "some-zone"
}

# cat /var/lib/icinga2/api/packages/_api/host-name.example.com-1535636549-1/include.conf 
include "../active.conf"
if (ActiveStages["_api"] == "host-name.example.com-1535636549-1") {
  include_recursive "conf.d"
  include_zones "_api", "zones.d"
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api REST API blocker Blocks a release or needs immediate attention bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants