Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev.icinga.com #12985] Downtimes disappearing shortly after icinga reload #4747

Closed
icinga-migration opened this issue Oct 25, 2016 · 9 comments
Labels
area/db-ido Database output bug Something isn't working

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/12985

Created by rdesanno on 2016-10-25 20:42:56 +00:00

Assignee: rdesanno
Status: Closed (closed on 2016-12-07 21:27:30 +00:00)
Target Version: (none)
Last Update: 2016-12-07 21:27:30 +00:00 (in Redmine)

Icinga Version: 2.5.4-1
Backport?: Not yet backported
Include in Changelog: 1

The root of the problem that we are seeing in our shop is whenever a host or service is set with a downtime or acknowledgement, and icinga is reloaded, those downtimes and/or comments get erased from the database and removed from the webui. It's pretty consistent in our environment and can replicate it very easily by doing the following:

set a downtime, acknowledgement, persistent acknowledgement with expiration
reload icinga2
wait

I have gone so far as to erase all comments and downtimes under our _api directory and recreated the database, but this did not help. I can literally watch the lines disappear from tables icinga_comments or icinga_scheduleddowntime and haven't a clue on how to fix it.

The only thing I can tell is that comments et al, might persist if entered on a different day than the reload takes place, but im not sure that this is 100% true.

Below are the versions we are running if it helps any:

CentOS7 3.10.0-327.36.1.el7.x86_64

  • icinga2-2.5.4-1.el7.centos.x86_64
  • icinga2-common-2.5.4-1.el7.centos.x86_64
  • icinga2-classicui-config-2.5.4-1.el7.centos.x86_64
  • icinga2-libs-2.5.4-1.el7.centos.x86_64
  • icinga2-bin-2.5.4-1.el7.centos.x86_64
  • icinga2-ido-mysql-2.5.4-1.el7.centos.x86_64
  • php-mysql-5.4.16-36.3.el7_2.x86_64
  • icinga2-ido-mysql-2.5.4-1.el7.centos.x86_64
  • MariaDB-server-10.1.18-1.el7.centos.x86_64
  • MariaDB-shared-10.1.18-1.el7.centos.x86_64
  • MariaDB-common-10.1.18-1.el7.centos.x86_64
  • MariaDB-client-10.1.18-1.el7.centos.x86_64
@icinga-migration
Copy link
Author

Updated by elippmann on 2016-10-27 11:13:42 +00:00

  • Status changed from New to Feedback
  • Assigned to set to rdesanno

Hi,

Could you please post the output of:

ls -lah /var/lib/icinga2/api/packages/_api/

and the contents of the files from the directory above:

  • active-stage
  • active.conf
  • include.conf

Are you running a single instance setup?

Best regards,
Eric

@icinga-migration
Copy link
Author

Updated by rdesanno on 2016-10-27 14:15:38 +00:00

Here you go. Also, we are running a single instance for what it's worth.

[root@pf82 conf.d]# ls -lah /var/lib/icinga2/api/packages/_api/
total 12K
drwx------ 3 icinga icinga  96 Oct  3 12:08 .
drwx------ 3 icinga icinga  17 Mar 31  2016 ..
-rw-r--r-- 1 icinga icinga 444 Mar 31  2016 active.conf
-rw-r--r-- 1 icinga icinga  27 Mar 31  2016 active-stage
-rw-r--r-- 1 icinga icinga  25 Mar 31  2016 include.conf
drwx------ 3 icinga nagios  19 Jun 25 12:53 pf82.eq.pl.pvt-1459445242-1

[root@pf82 conf.d]# cat /var/lib/icinga2/api/packages/_api/active.conf
if (!globals.contains("ActiveStages")) {
  globals.ActiveStages = {}
}

if (globals.contains("ActiveStageOverride")) {
  var arr = ActiveStageOverride.split(":")
  if (arr[0] == "_api") {
    if (arr.len() < 2) {
      log(LogCritical, "Config", "Invalid value for ActiveStageOverride")
    } else {
      ActiveStages["_api"] = arr[1]
    }
  }
}

if (!ActiveStages.contains("_api")) {
  ActiveStages["_api"] = "pf82.eq.pl.pvt-1459445242-1"
}

[root@pf82 conf.d]# cat /var/lib/icinga2/api/packages/_api/active-stage
pf82.eq.pl.pvt-1459445242-1[root@pf82 conf.d]#

[root@pf82 conf.d]# cat /var/lib/icinga2/api/packages/_api/include.conf
include "*/include.conf"

@icinga-migration
Copy link
Author

Updated by rdesanno on 2016-10-27 14:20:53 +00:00

I also noticed that the keys for the icinga_scheduleddowntime table dont seem to match the schema that I found online. I'm not sure how this could have happened over the course of the few upgrades that happened over the year and going to do a fresh install on another box to compare. Not sure if its related but thought I would point it out because this doesn't look right either.

MariaDB [icinga]> describe icinga_scheduleddowntime;
+------------------------+---------------------+------+-----+---------------------+----------------+
| Field                  | Type                | Null | Key | Default             | Extra          |
+------------------------+---------------------+------+-----+---------------------+----------------+
| scheduleddowntime_id   | bigint(20) unsigned | NO   | PRI | NULL                | auto_increment |
| instance_id            | bigint(20) unsigned | YES  | MUL | 0                   |                |
| downtime_type          | smallint(6)         | YES  |     | 0                   |                |
| object_id              | bigint(20) unsigned | YES  | MUL | 0                   |                |
| entry_time             | timestamp           | NO   |     | 0000-00-00 00:00:00 |                |
| author_name            | varchar(64)         | YES  |     |                     |                |
| comment_data           | text                | YES  |     | NULL                |                |
| internal_downtime_id   | bigint(20) unsigned | YES  |     | 0                   |                |
| triggered_by_id        | bigint(20) unsigned | YES  |     | 0                   |                |
| is_fixed               | smallint(6)         | YES  |     | 0                   |                |
| duration               | bigint(20)          | YES  |     | 0                   |                |
| scheduled_start_time   | timestamp           | NO   |     | 0000-00-00 00:00:00 |                |
| scheduled_end_time     | timestamp           | NO   |     | 0000-00-00 00:00:00 |                |
| was_started            | smallint(6)         | YES  |     | 0                   |                |
| actual_start_time      | timestamp           | NO   |     | 0000-00-00 00:00:00 |                |
| actual_start_time_usec | int(11)             | YES  |     | 0                   |                |
| is_in_effect           | smallint(6)         | YES  |     | 0                   |                |
| trigger_time           | timestamp           | NO   |     | 0000-00-00 00:00:00 |                |
| name                   | text                | YES  |     | NULL                |                |
| session_token          | int(11)             | YES  |     | NULL                |                |
| endpoint_object_id     | bigint(20)          | YES  |     | NULL                |                |
+------------------------+---------------------+------+-----+---------------------+----------------+
21 rows in set (0.00 sec)

MariaDB [icinga]>

@icinga-migration
Copy link
Author

Updated by rdesanno on 2016-10-27 15:54:33 +00:00

For example, this is what I see upon reload. Note that I had 1028 rows of downtime set and after reload, it was an empty set. Immediately after reload, there were no downtimes in effect under /icingaweb2/monitoring/list/downtimes in the GUI however when I looked at one of the hosts that previously had a downtime set before the reload, it still had a "plug" icon next to it, at least for a few seconds before disappearing.

1028 rows in set (0.10 sec)

MariaDB [icinga]> select name,duration,comment_data from icinga_scheduleddowntime;
Empty set (0.01 sec)

Here is an example of how newly entered downtime records are being recorded if it helps. I'm curious what you think about the duration = 0.

MariaDB [icinga]> select entry_time,duration,scheduled_start_time,scheduled_end_time,is_in_effect,trigger_time,session_token from icinga_scheduleddowntime orderby limit 1;
+---------------------+----------+----------------------+---------------------+--------------+---------------------+---------------+
| entry_time          | duration | scheduled_start_time | scheduled_end_time  | is_in_effect | trigger_time        | session_token |
+---------------------+----------+----------------------+---------------------+--------------+---------------------+---------------+
| 2016-10-27 11:46:16 |        0 | 2016-10-27 11:45:57  | 2016-10-27 13:45:57 |            1 | 2016-10-27 11:50:16 |    1477582827 |
+---------------------+----------+----------------------+---------------------+--------------+---------------------+---------------+
1 row in set (0.00 sec)

@icinga-migration
Copy link
Author

Updated by nlm on 2016-11-10 10:39:58 +00:00

The duration column must be the duration attribute of flexible downtimes.

From the documentation :

duration    Optional. How long the downtime lasts. Only has an effect for flexible (non-fixed) downtimes.

@icinga-migration
Copy link
Author

Updated by elippmann on 2016-11-10 11:12:32 +00:00

Thanks for the SQL output. But could you please resend this w/ select *? Are you using Web 2 for transmitting external commands?

@icinga-migration
Copy link
Author

Updated by rdesanno on 2016-11-10 18:59:26 +00:00

Thanks for the input but after dealing with this bug for so long, I have decided to do a baremetal wipe / install and can no longer recreate this bug. I'm convinced that something in the upgrade path had broken this for us and my testing revealed that a new install was stable.

FYI to anyone else having this issue.

@icinga-migration
Copy link
Author

Updated by rdesanno on 2016-11-10 19:00:03 +00:00

Feel free to close this ticket.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-12-07 21:27:30 +00:00

  • Status changed from Feedback to Closed

I have the feeling that this is related to the fact that sometimes /include.conf gets missing, see #13251. Still I'll close here as requested ;)

@icinga-migration icinga-migration added bug Something isn't working area/db-ido Database output labels Jan 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/db-ido Database output bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant