Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev.icinga.com #13323] Cluster syncs API deleted objects from a node which down and was not aware of the deletion #4808

Closed
icinga-migration opened this issue Nov 24, 2016 · 4 comments
Labels
area/distributed Distributed monitoring (master, satellites, clients) bug Something isn't working

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/13323

Created by geds on 2016-11-24 16:06:25 +00:00

Assignee: (none)
Status: New
Target Version: (none)
Last Update: 2016-11-29 08:17:33 +00:00 (in Redmine)

Icinga Version: v2.5.4-206-gb028ff2
Backport?: Not yet backported
Include in Changelog: 1

Icinga2 version: v2.5.4-206-gb028ff2 (icinga2.x86_64 2.5.4-1.snapshot201611221649.el7.centos @ICINGA-snapshot).
Cluster resyncs everything even if some objects were deleted on one node while other was down.

To reproduce this issue:

  1. Spin up vagrant with icinga2x-cluster

git clone https://github.com/Icinga/icinga-vagrant.git
cd icinga-vagrant/icinga2x-cluster
vagrant up

  1. Stop Icinga2 instances on both cluster servers

  2. Edit /etc/icinga2/zones.conf on both 'icinga2a' and 'icinga2b' to enable config replication

object Endpoint "icinga2a" {
host = "192.168.33.10"
}

object Endpoint "icinga2b" {
host = "192.168.33.20"
}

object Zone "master" {
endpoints = [ "icinga2a", "icinga2b" ]
}

object Zone "checker" {
}

object Zone "global-templates" {
global = true
}

  1. Start Icinga2 on both 'icinga2a' and 'icinga2b'

  2. Create a host/service via API

curl -k -s -u "aws:$PASSWORD" -H 'Accept: application/json' -X PUT "https://192.168.33.10:5665/v1/objects/hosts/host1" -d "{ \templates\ [ \"generic-host\" ], \attrs\ { \address\ \"127.0.0.1\", \check_command\ \"hostalive\" } }"
curl -k -s -u "aws:$PASSWORD" -H 'Accept: application/json' -X PUT "https://192.168.33.10:5665/v1/objects/services/host1!service1" -d "{ \attrs\ { \check_command\ \"passive\", \enable_active_checks\ \"0\" } }"

  1. Stop Icinga2 on 'icinga2b'

  2. Delete an object from 'icinga2a' using API:

curl -k -s -u "aws:$PASSWORD" -H 'Accept: application/json' -X DELETE "https://192.168.33.10:5665/v1/objects/services/host1!service1"

  1. Start Icinga2 on 'icinga2b'

At this point 'service1' gets synced to Icinga2a from Icinga2b because the object was still present there while it was down

@icinga-migration
Copy link
Author

Updated by gbeutner on 2016-11-29 08:17:34 +00:00

Good catch. The problem here is that even though the node where the deletion happened does in fact log the deletion - it also gets a "new object" message from the other node when both instances reconnect to each other. This causes the object to be recreated on the instance where the object was initially created.

The proper way to fix this would be to remember deleted objects (kind of, in a tombstone way, really). We'd have to keep creation timestamps for both regular objects and tombstones so we can differentiate between an object that was deleted (object ts less than tombstone ts) - and an object that was re-created after it was previously deleted (tombstone ts smaller than object ts)

@icinga-migration icinga-migration added bug Something isn't working area/distributed Distributed monitoring (master, satellites, clients) labels Jan 17, 2017
@Al2Klimov
Copy link
Member

Hello @geds,

did you create the issue? (Unfortunately we can only guess.)

If yes, is this problem still present in v2.10.3?

Best,
AK

@Al2Klimov Al2Klimov self-assigned this Mar 11, 2019
@Al2Klimov Al2Klimov added the needs feedback We'll only proceed once we hear from you again label Mar 11, 2019
@dnsmichi
Copy link
Contributor

Yes, that's still a problem without any solution. I'm inclined to mark this as a wontfix though after the many years.

@dnsmichi
Copy link
Contributor

Closing in favor of #7136.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/distributed Distributed monitoring (master, satellites, clients) bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants