Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev.icinga.com #6871] multi-parent dependencies #1869

Closed
icinga-migration opened this issue Aug 7, 2014 · 54 comments · Fixed by #7785
Closed

[dev.icinga.com #6871] multi-parent dependencies #1869

icinga-migration opened this issue Aug 7, 2014 · 54 comments · Fixed by #7785
Assignees
Labels
area/configuration DSL, parser, compiler, error handling enhancement New feature or request queue/wishlist
Projects
Milestone

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/6871

Created by 9er on 2014-08-07 10:58:06 +00:00

Assignee: (none)
Status: New (closed on 2014-11-24 14:16:32 +00:00)
Target Version: Backlog
Last Update: 2016-11-28 12:09:11 +00:00 (in Redmine)


Problem:

  • hosts might have 2 parents (for redundancy), e.g., servers with uplinks to 2 routers and BGP failover
  • adding a dependency for each parent causes notifications to be disabled if just one parent is down and the host is still reachable
  • notifications should be disabled only if both parents are down and the host is unreachable

Config could look something like this:

object Dependency "server-routers" {
  parent_host_name = [ "router1", "router2" ]

  child_host_name = "server"
}

Maybe even add an option to decide, whether all parents must be down ("AND") or just one ("OR"). The "OR" option can then be used to summarize multiple single dependencies.

object Dependency "server-routers" {
  parent_host_name = [ "router1", "router2" ]
  child_host_name = "server"

  operation = "AND"
}

Relations:

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2014-11-11 22:37:08 +00:00

  • Status changed from New to Feedback
  • Assigned to set to 9er

Should be easier with the new apply Dependency for rules in 2.2. That way you can generate your dependencies based on custom attribute arrays/dictionaries.

https://github.com/Icinga/icinga2/blob/master/doc/4-monitoring-basics.md#using-apply-for

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2014-11-24 14:16:32 +00:00

  • Status changed from Feedback to Closed
  • Assigned to deleted 9er

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-02-01 10:02:31 +00:00

  • Duplicated set to 8304

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-02-14 22:57:31 +00:00

  • Category set to Configuration
  • Status changed from Closed to New

@icinga-migration
Copy link
Author

Updated by deneu on 2015-02-15 08:35:33 +00:00

If dependencies are logical or combination why are the childs then unreachlbe if one parent goes down?

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-02-15 10:51:07 +00:00

If one dependency fails, the child becomes unreachable. That's a logical 'or' combination. You instead want an 'and' combination where all parent states are taken into account and the reachability state is calculated from all of them.

@icinga-migration
Copy link
Author

Updated by deneu on 2015-02-26 09:35:45 +00:00

dnsmichi wrote:

If one dependency fails, the child becomes unreachable. That's a logical 'or' combination. You instead want an 'and' combination where all parent states are taken into account and the reachability state is calculated from all of them.

Yes exactly! How can i change an or to an and?!

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-02-26 09:41:21 +00:00

I don't have an implementation idea currently. It ought to be sort of business process where you'll take several other object states into account and then determine the final state, but that's not thought through. On the other side, there's no clean way where such a configuration item could be place. 1) It does not fit on the host/service itself. Applying Dependency objects with different types and multiple ones - not a clear result set. 2) Put on dependency objects themselves - how to know how many other dependencies will be generated from apply rules?

That being said, if you come up with a better proposal and implementation design, feel free to do so. I'm not entirely sure that your request targets the right solution.

@icinga-migration
Copy link
Author

Updated by deneu on 2015-02-26 15:46:43 +00:00

I have no idea to implement it in icinga2 but i think this is necassary and it was a good working default function in icinga/nagios?
The example in github/documentation is not working as it should be because it a 'or' condition as u said. Without this feature large infrastructures can't be monitored in the deep i think. So why not use a "parent"-like function?

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-02-26 15:51:58 +00:00

What is a "parent-like function" in your design?

@icinga-migration
Copy link
Author

Updated by deneu on 2015-02-27 08:56:54 +00:00

dnsmichi wrote:

What is a "parent-like function" in your design?

I mean the same function as the "parents" attribute in icinga1x/nagios.

@icinga-migration
Copy link
Author

Updated by mfrosch on 2015-02-27 09:50:12 +00:00

  • Status changed from New to Feedback

    object Host "router1" {
    import "generic-host"
    }

    object Host "router2" {
    import "generic-host"
    }

    object Host "foobar" {
    import "generic-host"
    vars.parents = [ "router1", "router2" ]
    }

    apply Dependency "routers" for (router in host.vars.parents) to Host {
    parent_host_name = router

    assign where host.vars.parents
    

    }

You could also do some assignment via any other var that creates a logical dependency.

Does this fulfil your requirements?

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-02-27 09:55:32 +00:00

  • Status changed from Feedback to New

@icinga-migration
Copy link
Author

Updated by deneu on 2015-02-27 13:39:57 +00:00

lazyfrosch wrote:

[...]

You could also do some assignment via any other var that creates a logical dependency.

Does this fulfil your requirements?

Hey Markus,

sorry no it does not full my requirements because its only an other way to apply the dependency but its still "or" right?

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-03-02 15:26:14 +00:00

deneu wrote:

dnsmichi wrote:
> What is a "parent-like function" in your design?

I mean the same function as the "parents" attribute in icinga1x/nagios.

I'm leaving this issue in a 'new' state until you've come up with a detailed design and implementation proposal for Icinga 2.

@icinga-migration
Copy link
Author

Updated by nuts on 2015-03-26 00:30:57 +00:00

careful inquire - are there new findings?

We have the problem unfortunately, too. Is the logical "or" useful in some cases?

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-04-08 11:24:05 +00:00

Feedback from SIG-NOC meeting:

  • multiple uplink paths requiring an AND
  • design proposals pending

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-06-23 13:36:38 +00:00

  • Target Version set to Backlog

@icinga-migration
Copy link
Author

Updated by klon on 2015-08-02 06:43:34 +00:00

Please clarify, will you implement "AND" feature?
Or need you some additional feedback from users?
It's very usable for us too.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-08-03 15:40:31 +00:00

There are no resources available for this task currently, therefore we've put this onto "Backlog". We will re-iterate over these issues from time to time after our feature development sprint for 2.4 is finished. On the other hand, one might chime in and sponsor/provide a patch for this feature (similar to other issues remaining on "Backlog" for the time being).

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-08-31 13:54:24 +00:00

  • Duplicated set to 10049

@icinga-migration
Copy link
Author

Updated by mfrosch on 2015-09-01 12:51:28 +00:00

  • Relates set to 10058

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-09-03 16:23:16 +00:00

Some reference: https://www.mail-archive.com/nagios-users@lists.sourceforge.net/msg19373.html

Still, I don't have a viable design in mind to solve the problem properly. Even some sort of Dependency grouping would still cause trouble in the way chained and inherited dependencies will work, tree-wise.

@icinga-migration
Copy link
Author

Updated by barry.quiel on 2015-09-09 16:38:29 +00:00

I like the idea of this feature, but it may have already been solved indirectly. In the release announcement for 2.3.0 ( https://www.icinga.org/2015/03/10/icinga-2-v2-3-0-released/) there is a sample of a dummy cluster object. That object could serve this purpose. The challenge that I haven't figured out is how to get the nodes added to the cluster dynamically. In the example the nodes of the cluster are added to vars.cluster_nodes of the dummy object. I would like some sort of object name lookup function that uses a regex or match to find the names of the objects.

Something like:
vars.cluster_nodes = find_host_object("cluster-node*")

It seems like I should be able to figure this out considering that something very similar already exists in apply statements ( match("cluster-host.*, host.name) ), but I can't seem to wrap my head around the right set of functions.

@icinga-migration
Copy link
Author

Updated by dgoetz on 2015-09-14 09:30:44 +00:00

Just a brief summary to see if my assumptions are correct.

You can create multiple dependencies (either manual or by using apply). This will mean an OR and should now work with the last patches like icinga 1, meaning if only one dependency fails the status is unchanged, if all fail the host / service is unreachable. Not sure about what this means for the attributes of the dependency "disable_notifications" and "disable_checks".

If this works like these are applied when it is unreachable, then I see nearly no case for an AND. Because then AND would be the behavior before the last patches that the host / service needs both and be unreachable if one fails.

If this is also needed I do not like the operator attribute. I would prefer not changing the normal configuration for OR (two or more separate dependencies) and introduce an array for "parent_host_name" and "parent_service_name" for AND. I think this would be easy to configure and understand.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-09-14 11:35:45 +00:00

dirk wrote:

Just a brief summary to see if my assumptions are correct.

You can create multiple dependencies (either manual or by using apply). This will mean an OR and should now work with the last patches like icinga 1, meaning if only one dependency fails the status is unchanged, if all fail the host / service is unreachable. Not sure about what this means for the attributes of the dependency "disable_notifications" and "disable_checks".

No, Icinga 1.x had the additional "parents" attribute which does not exist anymore. That one causes all host parents being combined as "AND". host dependencies as is work like the old Icinga 1.x ones - their logical condition is "OR".

This ticket is all about adding these parent relations again into Icinga 2, somehow related to dependencies. At least the original poster demanded it, though I'm not very happy about the recent suggestions to solve the issue. None of them would fit into the existing design (configuration language, object references, external interfaces, etc).

If this is also needed I do not like the operator attribute. I would prefer not changing the normal configuration for OR (two or more separate dependencies) and introduce an array for "parent_host_name" and "parent_service_name" for AND. I think this would be easy to configure and understand.

Keep in mind that the Dependency type is not exclusive to a host or service type. The attribute "parent_host_name" without "parent_service_name" would generally work as array, but what happens if you'll define service dependencies by having "parent_service_name" set? Take the following example:

apply Dependency "foo" to Service {
  parent_host_name = [ "h1", "h2" ]
  parent_service_name = [ "ping4", "ping6" ]

  assign where true
}

Now guess which parent host and services will be referenced. There's multiple ways to interpret this configuration, and it makes validation and output even harder. Icinga 1.x had a similar issue with service group members being a list of host-service name tuples.

Probably that approach makes sense, though the array solution is not very elegant and not error prone.

Kind regards,
Michael

@icinga-migration
Copy link
Author

Updated by lesinigo on 2016-11-28 12:09:11 +00:00

This “dependencies in AND/OR” thing really hit us in the transition from Nagios to Icinga2. We used to extensively use “parents” in Nagios for use cases that have already been mentioned here, the two most common ones for us are:

  • an Host behind multiple routers/switch: just one network path is enough to reach the host and monitor it
  • an Host which actually is a VM on a cluster of physical nodes: if one physical node fails the VM gets migrated/restarted on another one and it must be monitored, if all of them fail I don't want to be bothered with alarms about the VMs because I already have alarms about the cluster nodes, and if many nodes fail I do want alarms about both the failed cluster nodes and any VM that is not actually running (eg. for capacity reasons)

Here are my two cents on this topic...

I think a possible approach would be being able to specify a “function” for an object, it would get all dependencies as inputs and provide a reachability status as output.
It could be an actual function (i.e. let the users build strange stuff if they want) or just a parameter for the common ones (at least and and or, but maybe all the usual boolean logic functions? and or xor not…).

This would be more than enough for our use cases, which I strongly suspect are the same as >90% of people having this issue, and it does not break backward compatibility since you can just leave the default at the current logic, which we could call “or”.

Actual example:

object Host “Router-A” {
}
object Host “Router-B” {
}
object Host “BigServer” {
    // if all deps are DOWN, then this Host is UNREACHABLE
    unreachable_condition = “and”
}
object Dependency “Server-to-RouterA” {
    parent = “Router-A”
    child  = “BigServer”
}
object Dependency “Server-to-RouterB” {
    parent = “Router-B”
    child  = “BigServer”
}

This approach would not be enough for more complex scenarios, but AFAIK they all boil down to a single "child" object having multiple, distinct "dependency groups" and some of them actually being dependencies on “at least one of”.

If we are talking about strict reachability logic (ie. “I want to know if I can monitor this, so I can know if it is actually DOWN or if it could be UP but I don’t see it) I cannot right now think of any actual, real life scenario where you would have one of those more complex scenarios.

On the other hand, if we are talking about “availability” of a service then there could be multiple dependencies, but IMHO this is wrong to do as dependencies in a monitoring system. It could be useful to have them somewhere else (think Business Processes), but IMHO it is definitely not part of the core, low-level logic, between low-level objects: that logic should just explain how to calculate reachability, to discern “ko” states (down, warning, critical) from “dunno” states (unreachable, unknown).

Side note #1: the example I described could be used as a basis towards supporting more complex scenarios, if you really want: it would probably need a new kind of object, some sort of “intermediate dependency”, so an Host/Service could have “unreachable_condition AND” dependencies on some of those intermediate objects and those intermediate objects could have “unreachable_condition OR” dependencies on their parents if needed. And I suspect that often, if you really want this kind of "dependency group" stuff, you can already hack it together using dummy hosts or services, even if it's not a clean solution.

Side note #2: if I have not missed anything, the current documentation does not explicitly talk about multiple dependencies. It should be improved to clearly state what happens when an Host or Service has multiple Dependencies and what logic is used to determine child state when one or more dependencies fail. This applies to both the current situation and to any eventual outcome of this feature request.

@icinga-migration icinga-migration added enhancement New feature or request area/configuration DSL, parser, compiler, error handling labels Jan 17, 2017
@icinga-migration icinga-migration added this to the Backlog milestone Jan 17, 2017
@dgoetz
Copy link
Contributor

dgoetz commented May 9, 2018

  • global option
  • per object

For me a global option would be enough as I commonly need only the logic for redundant dependencies like two switches and the servers behind should be unreachable if both are down or virtualization hosts and the virtual machines should be unreachable if the hosts are down.

@efuss
Copy link
Contributor

efuss commented May 10, 2018

I agree that a global option is a bad idea. It's confusing and I guess nobody can live with all dependencies being interpreted as redundant (you certainly don't want the implicit dependency of a Service on it's Host and other explicit dependencies being regarded as redundancy-type).

My typical scenario is that a server process needs, say, LDAP and name resolution to work and I have two LDAP servers and two resolvers. I need to be able to express that. The easiest and most straightforward solution I can think of is to apply some „needs LDAP“ dependency to all services requiring LDAP lookups and, in that Dependency Object, write something like

parents = [
	{
		host = "ldap-primary"
		service = "ldap"
	}
	{
		host = "ldap-backup"
		service = "ldap"
	}
]

where the parents array's elements are interpreted as redundant options.
Add a „need resolv“ dependency for Service Objects dependent on name resolution, have both dependencies interpreted cumulative (as-is) and, in my opinion, you're done, no?

Do you find that not covering people's needs, being confusing, over-engeneerd or difficult to implement?

@dnsmichi
Copy link
Contributor

It is a mix of both - confusing and likely users may not understand it, or implement it the wrong way. I know that you could live with your proposal, but you're not the one maintaining all the features and doing support and documentation for them - no offence here.

Probably the array with dictionary notation is the best proposal so far. Still, how would such an array be dealt with - is it AND or is it OR. One should be able to tell just by looking at the config snippet.

For the implementation parts, one needs to consider that such dependencies are not available in the IDO backend, might be hard to traverse that via REST API too. Not so easy, and should be taken into account when creating a new backend, FYI @lippserd

Either way, I'd still like to hear what others in this issue think about

- [ ] global option
- [ ] per object

@dgoetz
Copy link
Contributor

dgoetz commented May 14, 2018

Perhaps what I forgot to mention is my use of the businessprocess addon for more complicated dependency requirements which can also solve things like requiring two out of five webservers. Having such things in the core could be helpful, but I am fine with having them somewhere else and an simple solution in the core.

@odeshog
Copy link

odeshog commented May 14, 2018

A global option would be enough for us but a per-object option would be preferable in the long run.

@dnsmichi
Copy link
Contributor

One implementation "detail" to consider: Right on, object relations need to be tracked, e.g. for the REST API joins. Such a thing won't be possible with a parents array, nor could it be dumped to the database backend easily.

https://github.com/Icinga/icinga2/blob/master/lib/icinga/dependency.ti#L70

I'm thinking about a different method, like grouping these dependencies and evaluating them based on a specified operator. This wouldn't change anything with the current Dependency configuration objects, and introduce an optional element. It also can re-use the group assign where logic, and copy the group membership resolving.

Let's see about that, right on I have some bugs for 2.9 prior to looking into this again.

@Al2Klimov Al2Klimov self-assigned this Aug 30, 2019
@Al2Klimov Al2Klimov removed the needs-sponsoring Not low on priority but also not scheduled soon without any incentive label Aug 30, 2019
Al2Klimov added a commit that referenced this issue Aug 30, 2019
Al2Klimov added a commit that referenced this issue Aug 30, 2019
@dnsmichi dnsmichi added this to Design Drafts in icinga2 Nov 15, 2019
@dnsmichi dnsmichi added the TBD To be defined - We aren't certain about this yet label Nov 15, 2019
@dnsmichi dnsmichi assigned htriem and unassigned Al2Klimov Jan 28, 2020
@dnsmichi dnsmichi removed the TBD To be defined - We aren't certain about this yet label Feb 11, 2020
@dnsmichi
Copy link
Contributor

@htriem has joined the Icinga 2 core team a while ago with now taking more maintainer responsibilities.

During the issue grooming we had some weeks ago, his exercise was determine the question and solutions in this issue. Also, a special exercise was to weigh the issue in terms of config option vs. change the behavior. Without any influence by myself who has no clear view on the issue anymore.

This is what @htriem achieved (at the very moment he's on vacation, therefore I am writing this now):

  1. We couldn't find a reliable scenario where the current behavior with the "if one dependency fails, mark this unreachable" would apply.
  2. The topic contains 20 thumbs up and responses to change the behavior, with only developers wanting to keep the current behavior. That's a fair point to take into account.
  3. The proposed configuration options are either too complicated, or they do not fit the current DSL approach with "there's only one way to do it right" and "keep it simple, stupid".
  4. During the grooming session, an actual patch was implemented in Change behaviour of multiple dependencies (all failed = unreachable) #7785 to change the behavior.

A small bug with "no dependencies -> be reachable" existed, which was unveiled with our unit tests yesterday. Already fixed.

I've added some more unit tests in #7785 rendering this change "bullet proof".

@efuss
Copy link
Contributor

efuss commented Sep 8, 2020

I'm unhappy with this change because it can lead to unrelated services being regarded as redundant wrt. to each other. It can even make a host being regarded as redundant wrt. to a service.

For example (unfortunately, I didn't think about this in the discussion and stumbled over it only after upgrading to 2.12.0), applying the the explicit disable-host-service-checks dependency described in the Monitoring Basics chapter will defeat all other dependencies.

My original dictionary idea seems to be too complicated.

I then came up with the idea to introduce an essential attribute for Dependency Objects, meaning that dependency alone will make the parent unreachable. I implemented this, but after that came up with still another idea.

What about a new redundancy_group attribute for dependencies?
Specifying a redundancy_group would cause a dependency to be regarded as redundant only inside that redundancy group, e.g., "routers".
Dependencies lacking a redundancy_group attribute would be regarded as essential for the parent.

This would, with only one additional simple string attribute, allow for both cumulative and redundant dependencies and even a combination (cumulation of redundancies, like SSH depending on both LDAP and DNS to function, while operating redundant LDAP servers as well as redundant DNS resolvers).

I've implemented this in #8218 and it appears to work. I can't tell whether the additional std::unordered_map computations in Checkable::IsReachable() are tolerable for huge installations.

I also don't feel comfortable enough with the test framework to integrate unit tests for the proposal. The current tests are, of course, expected to fail with the change.

@efuss
Copy link
Contributor

efuss commented Sep 9, 2020

I received a mail PR run failed: Packages - Introduce redundancy groups for Dependency Objects (3b10401) that I don't understand.

In the reports, I see some "Job canceled" messages plus Error: Transaction test error: 1181 file /usr/include/mysql/mariadb_rpl.h conflicts between attempted installs of mariadb-devel-3:10.3.22-1.fc31.x86_64 and mariadb-connector-c-devel-3.1.9-5.fc31.x86_64

I don't think my patch broke that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/configuration DSL, parser, compiler, error handling enhancement New feature or request queue/wishlist
Projects
icinga2
  
Design Drafts