Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev.icinga.com #11994] global_templates not working when enabling later #4301

Closed
icinga-migration opened this issue Jun 20, 2016 · 6 comments
Labels
area/distributed Distributed monitoring (master, satellites, clients) bug Something isn't working

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/11994

Created by snoopy1978 on 2016-06-20 15:28:16 +00:00

Assignee: (none)
Status: New
Target Version: (none)
Last Update: 2016-06-21 11:30:00 +00:00 (in Redmine)

Icinga Version: 2.4.7
Backport?: Not yet backported
Include in Changelog: 1

Hi everyone,

just playing aroung with icinga2 and clustering and came across this issue:

I have a ha cluster of 2 icinga2 nodes up and running, different hosts and services resides in the zone. At this point the "global-template" zone is commented out at all nodes, everything working fine as expected.
Now I want to enable global templates. Therefore I un-commented the global-template objects in both zones.conf on each node. Afterwards I moved the template object config file from the master zone config dir to a newly created "global-zone" config dir (under ..../zones.d). Then I restarted the master config node which works fine.
Afterwards I restarted the second node which fails. In the log file during compiling the config files, icinga2 errors out with the message, the "generic-host" template does not exist.

My assumption:
The global-templates arn't initially synced when icinga2 comes to compiling the config files with existing hosts and therefore fails to start. But for syncing the global templates it has to start completely...
Right?

Thx in advance
Snoopy


Relations:

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-06-21 10:34:35 +00:00

Adding a zone requires a restart, e.g. on all your satellites and the master. Once they are aware of the new zone, the cluster config sync will work. In your description you are starting the satellites do receive their zone config, but the global-templates zone wasn't known at this stage. Therefore it will not be synced.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-06-21 10:34:43 +00:00

  • Relates set to 10000

@icinga-migration
Copy link
Author

Updated by snoopy1978 on 2016-06-21 10:45:53 +00:00

dnsmichi wrote:

Adding a zone requires a restart, e.g. on all your satellites and the master.
Sure, I know, so this is exactly what I've done after enabling the global-templates on all endpoints. And that was exactly the point where the 2nd node didn't start because of the missing template.

Once they are aware of the new zone, the cluster config sync will work.
That's the point... for getting the new zone, the 2nd node has to be started, which fails, because of the missing template.

In your description you are starting the satellites do receive their zone config, but the global-templates zone wasn't known at this stage. Therefore it will not be synced.
Not exactly... before starting the satellite, I restarted the master, so I would expect, the new zone IS known.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-06-21 10:55:48 +00:00

snoopy1978 wrote:

>In your description you are starting the satellites do receive their zone config, but the global-templates zone wasn't known at this stage. Therefore it will not be synced.
Not exactly... before starting the satellite, I restarted the master, so I would expect, the new zone IS known.

The master knows about it. But the satellite doesn't. The satellite node still runs with the old configuration (no "global-templates" zone). You may query the REST API /v1/objects/zones to see the runtime configuration e.g.
Once the config sync is sending the object configs from zones.d from the master to the satellite, it will later trigger a restart.

In your scenario the following happens:

  • satellite zone is configured, the satellite node accepts all object configs pushed via config sync
  • global-templates zone is not configured (loaded into runtime state), therefore the satellite will deny to sync the config objects from the master in zones.d/global-templates
  • the restart is triggered
  • it fails, as obviously the new objects reference templates which do not have been synced

Therefore I told you to restart all instances on a zone (or endpoint) change before actually restarting the master node. The order is important.

@icinga-migration
Copy link
Author

Updated by snoopy1978 on 2016-06-21 11:30:00 +00:00

dnsmichi wrote:

snoopy1978 wrote:
> >In your description you are starting the satellites do receive their zone config, but the global-templates zone wasn't known at this stage. Therefore it will not be synced.
> Not exactly... before starting the satellite, I restarted the master, so I would expect, the new zone IS known.

The master knows about it. But the satellite doesn't. The satellite node still runs with the old configuration (no "global-templates" zone). You may query the REST API /v1/objects/zones to see the runtime configuration e.g.
Once the config sync is sending the object configs from zones.d from the master to the satellite, it will later trigger a restart.

In your scenario the following happens:

  • satellite zone is configured, the satellite node accepts all object configs pushed via config sync
  • global-templates zone is not configured (loaded into runtime state), therefore the satellite will deny to sync the config objects from the master in zones.d/global-templates
  • the restart is triggered
  • it fails, as obviously the new objects reference templates which do not have been synced

Therefore I told you to restart all instances on a zone (or endpoint) change before actually restarting the master node. The order is important.

OK, so if I get you right, the one and only correct way enabling global-templates later is this:

  1. Both nodes r running
  2. Enable global-templates in the configs on all nodes
  3. First restart the non-master-config-node(s) so they get aware of the new zone
  4. Last restart the master-config-node so it will really push the config of the new zone to the satellites

Right?

Luckily in my case the scenario was only for testing, so I could roll back to a snapshot before (2 VMs) and - by chance - used the way u described. In theory: What could be done if this had happen in a production environment? What had to be done to "repair" the not - starting second node?

BTW:

The order is important.
As the order seems to be of such importance, wouldn't it be worth mentioned in the docs?

@icinga-migration icinga-migration added bug Something isn't working area/distributed Distributed monitoring (master, satellites, clients) labels Jan 17, 2017
@dnsmichi
Copy link
Contributor

dnsmichi commented Feb 7, 2017

Imho this should be taken into account when looking at #4354.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/distributed Distributed monitoring (master, satellites, clients) bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants