Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev.icinga.com #10058] Wrong calculation for host compat state "UNREACHABLE" in DB IDO #3352

Closed
icinga-migration opened this issue Sep 1, 2015 · 12 comments
Labels
blocker Blocks a release or needs immediate attention bug Something isn't working
Milestone

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/10058

Created by tgelf on 2015-09-01 08:02:39 +00:00

Assignee: mfriedrich
Status: Resolved (closed on 2015-09-04 11:30:04 +00:00)
Target Version: 2.3.10
Last Update: 2015-09-04 11:48:22 +00:00 (in Redmine)

Icinga Version: 2.4.0
Backport?: Already backported
Include in Changelog: 1

Bugs reporting erroneous state when multiple "parents" are involved are popping up from time to time, they used to be stalled or rejected. Background:

  • Icinga2 dropped the UNREACHABLE state in favour of a reachability property
  • There is no more "parent" property, it got replaced by generic Dependency objects
  • So people use Dependency objects to model (network) reachability
  • Everything is fine so far, problems start when shipping state to different legacy backends
  • Classic UI has no support for such a "reachable" flag
  • IDO could be extended, but involving even more fields for such an elementary information would lead to worse performance. We can make use of such a field, but not for showing state in lists and summaries - just to extend detailled object information

IMO we have to live with the fact that to support legacy backends we have to continue exporting one single state field as we used to. This means showing no reachability information for services, but I guess most people will not miss that feature. My conclusion is, that all we need to do to fix this is to slightly adjust state calculation. What all bug reporters stumbled upon are erraneos UNREACHABLE states when the object was in fact OK/UP.

So to make them happy we do not need to completely change the current behaviour. Logic for legacy state output writers should be as simple as:

if (object.reachable || object.state in [ok, up, warn]) {
    return state
} else {
    return UNREACHABLE
}

A node depending on multiple parents is to be considered UP as long as it's check plugin is telling me so. If it is DOWN, then we need to find out whether it is reachable or not. If any of it's parents are failing we consider this a network outage, set the (now virtual) state to UNREACHABLE. That would restore the former behaviour and make everybody happy I guess. At least, I hope so ;)

Cheers,
Thomas

Attachments

Changesets

2015-09-04 11:24:41 +00:00 by mfriedrich 50cd694

Fix wrong calculation for host compat state UNREACHABLE

fixes #10058

2015-09-04 11:25:18 +00:00 by mfriedrich 0a43e81

Fix wrong calculation for host compat state UNREACHABLE

fixes #10058

Relations:

@icinga-migration
Copy link
Author

Updated by tgelf on 2015-09-01 08:04:09 +00:00

Seems that I'm missing permissions to link issues, so here are the related ones I found:

@icinga-migration
Copy link
Author

Updated by mfrosch on 2015-09-01 12:51:27 +00:00

  • Relates set to 6871

@icinga-migration
Copy link
Author

Updated by mfrosch on 2015-09-01 12:51:41 +00:00

  • Relates set to 8304

@icinga-migration
Copy link
Author

Updated by mfrosch on 2015-09-01 12:51:49 +00:00

  • Relates set to 10049

@icinga-migration
Copy link
Author

Updated by mfrosch on 2015-09-01 12:52:03 +00:00

  • Target Version set to Backlog

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-09-03 16:09:45 +00:00

  • Category set to libicinga
  • Status changed from New to Feedback
  • Assigned to set to tgelf
  • Target Version deleted Backlog

I'm now removing this from backlog as I want to discuss this further, and keep it open for suggestions and possible fixes.

As far as I understand this issue, Icinga 2 actually sets a host being UP to UNREACHABLE, if one of its parent objects is DOWN. Am I right about that?

The other relevant issues - multi-parent dependencies - target a different problem. They want to have DOWN hosts marked as UNREACHABLE only if all parent objects are DOWN. That's not the issue here as far as I am concerned.

@icinga-migration
Copy link
Author

Updated by tgelf on 2015-09-03 16:14:30 +00:00

dnsmichi wrote:

As far as I understand this issue, Icinga 2 actually sets a host being UP to UNREACHABLE, if one of its parent objects is DOWN. Am I right about that?

Correct, at least that's what I read from #10049.

The other relevant issues - multi-parent dependencies - target a different problem. They want to have DOWN hosts marked as UNREACHABLE only if all parent objects are DOWN. That's not the issue here as far as I am concerned.

Well... that one might be subject to farther discussion, afair that's not how 1.x used to work - or is it? Nonetheless you're right, that's not what this specific issue was all about.

Thanks,
Thomas

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-09-03 16:28:16 +00:00

  • Status changed from Feedback to Assigned
  • Assigned to changed from tgelf to mfriedrich

tgelf wrote:

dnsmichi wrote:
> As far as I understand this issue, Icinga 2 actually sets a host being UP to UNREACHABLE, if one of its parent objects is DOWN. Am I right about that?

Correct, at least that's what I read from #10049.

I'll try to dig up a test case for that, it still makes no sense that this actually happens.

> The other relevant issues - multi-parent dependencies - target a different problem. They want to have DOWN hosts marked as UNREACHABLE only if all parent objects are DOWN. That's not the issue here as far as I am concerned.

Well... that one might be subject to farther discussion, afair that's not how 1.x used to work - or is it? Nonetheless you're right, that's not what this specific issue was all about.

As you've remarked earlier, there's a difference between dependencies and host parents in 1.x. AFAIK host parents behave as logical AND, see my comment in https://dev.icinga.org/issues/6871#note-23

Nonetheless I'll try to look into this next to 2.4 issues.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-09-04 11:15:05 +00:00

  • File added 10058.conf
  • File added Auswahl_212.png

Tests

object Host "10058-parent-01" {
  address = "127.0.0.1"
  check_command = "hostalive"
  check_interval = 30m
  retry_interval = 30m
}
object Host "10058-parent-02" {
  address = "127.0.0.2"
  check_command = "hostalive"
  check_interval = 30m
  retry_interval = 30m
}
object Host "10058-child-01" {
  address = "127.0.1.1"
  check_command = "hostalive"
  check_interval = 30m
  retry_interval = 30m
}

object Dependency "parent-child-01" {
  parent_host_name = "10058-parent-01"
  child_host_name = "10058-child-01"
  states = [ Up ]
}
object Dependency "parent-child-02" {
  parent_host_name = "10058-parent-02"
  child_host_name = "10058-child-01"
  states = [ Up ]
}

Steps

  • Send check results with "DOWN" to host 10058-parent-01 until it reaches its first HARD state (SOFT state dependencies are not taken into account here by default)
  • Force a check on host 10058-child-01
  • Verify that its state changes from "UP" to "UNREACHABLE"

Auswahl_212.png

Problem

It's only a matter of external interfaces here. The inner core parts of Icinga2 do not know about the state "UNREACHABLE". We've added that state for convenience reasons to DB IDO, etc but probably should not have done so.

Icinga Web 2 already provides the column "reachable" which should be taken into account for better visualization to the user that this host is UP, but the dependency chain caused it to become "unreachable". Though that's a different topic which is not part of this issue.

Code

hostdbobject.cpp

        fields->Set("current_state", host->IsReachable() ? host->GetState() : 2);

statusdatawriter.cpp

                fp << "\t" << "current_state=" << (host->IsReachable() ? host->GetState() : 2) << "\n"

Certainly more.

Proposed Fix

Eliminate all occurences of "UNREACHABLE" and hardcoded 2 as state, and make them a central CompatUtility class method where we change the way this compat state is calculated.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-09-04 11:24:20 +00:00

  • File added Auswahl_215.png
  • Subject changed from Erraneous state/reachability calculation with multiple parents - proposal to Wrong calculation for host compat state "UNREACHABLE" in DB IDO
  • Target Version set to 2.4.0

Applied Fix

Auswahl_215.png

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-09-04 11:30:04 +00:00

  • Status changed from Assigned to Resolved
  • Done % changed from 0 to 100

Applied in changeset 50cd694.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-09-04 11:48:23 +00:00

  • Target Version changed from 2.4.0 to 2.3.10
  • Backport? changed from TBD to Yes

@icinga-migration icinga-migration added blocker Blocks a release or needs immediate attention bug Something isn't working libicinga labels Jan 17, 2017
@icinga-migration icinga-migration added this to the 2.3.10 milestone Jan 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker Blocks a release or needs immediate attention bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant