Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev.icinga.com #9055] Wrong timestamps w/ historical data replay in DB IDO #2894

Closed
icinga-migration opened this issue Apr 14, 2015 · 16 comments
Labels
area/db-ido Database output bug Something isn't working
Milestone

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/9055

Created by tredel on 2015-04-14 08:41:44 +00:00

Assignee: mfriedrich
Status: Resolved (closed on 2015-04-27 13:21:04 +00:00)
Target Version: 2.3.5
Last Update: 2015-04-27 13:21:04 +00:00 (in Redmine)

Icinga Version: icinga2-2.3.3-1
Backport?: Already backported
Include in Changelog: 1

If there's a split between a master and slave zone, the cluster replay uses wrong timestamps on the master/parent node.

I'm using CentOS release 6.6 with icinga2-2.3.3-1.el6.x86_64

My zones.conf:

object Endpoint "graphite" {
    host = "graphite"
}
object Endpoint "graphite-2" {
    host = "graphite-2"
}
object Zone "master" {
    endpoints = [ "graphite" ]
}
object Zone "slave" {
    endpoints = [ "graphite-2" ]
    parent = "master"
}

Detailed workflow:

  • 08:02:10 graphite: Icinga shutdown
  • 08:03:58 graphite-2: critical, soft 2 for ssh on test-1
  • 08:04:30 graphite-2: critical, soft 3 for ssh on test-1
  • 08:04:36 graphite-2: critical, hard 1 for ssh on test-1
  • 08:06:30 graphite-2: ok, hard 1 for ssh on test-1
  • 08:06:35 graphite: Icinga startup

Result:

  • Master graphite has wrong timestamps in alert history (all 08:06:38)

  • The IDO also has wrong timestamps (all 08:06:38)

    mysql> select a.host_object_id, a.display_name, b.host_object_id, b.service_object_id, b.display_name from icinga_hosts as a, icinga_services as b where a.host_object_id = b.host_object_id and a.display_name='test-1' and b.display_name='ssh';
    +----------------+--------------+----------------+-------------------+--------------+
    | host_object_id | display_name | host_object_id | service_object_id | display_name |
    +----------------+--------------+----------------+-------------------+--------------+
    | 55 | test-1 | 55 | 74 | ssh |
    +----------------+--------------+----------------+-------------------+--------------+
    1 row in set (0.00 sec)

    mysql> select * from icinga_statehistory;
    +-----------------+-------------+---------------------+-----------------+-----------+--------------+-------+------------+-----------------------+--------------------+------------+-----------------+--------------------------------------+-------------+--------------+--------------------+
    | statehistory_id | instance_id | state_time | state_time_usec | object_id | state_change | state | state_type | current_check_attempt | max_check_attempts | last_state | last_hard_state | output | long_output | check_source | endpoint_object_id |
    +-----------------+-------------+---------------------+-----------------+-----------+--------------+-------+------------+-----------------------+--------------------+------------+-----------------+--------------------------------------+-------------+--------------+--------------------+
    | 1 | 1 | 2015-04-10 08:06:38 | 457462 | 74 | 1 | 2 | 0 | 2 | 3 | 0 | 0 | Connection refused | | graphite-2 | 1 |
    | 2 | 1 | 2015-04-10 08:06:38 | 469619 | 74 | 1 | 2 | 1 | 1 | 3 | 2 | 2 | Connection refused | | graphite-2 | 1 |
    | 3 | 1 | 2015-04-10 08:06:38 | 490376 | 74 | 1 | 0 | 1 | 1 | 3 | 2 | 0 | SSH OK - OpenSSH_5.3 (protocol 2.0) | | graphite-2 | 1 |
    +-----------------+-------------+---------------------+-----------------+-----------+--------------+-------+------------+-----------------------+--------------------+------------+-----------------+--------------------------------------+-------------+--------------+--------------------+

Attachments

Changesets

2015-04-24 16:14:57 +00:00 by (unknown) 0f3d46c

Add timestamp option for checkresult test script

refs #9055

2015-04-24 16:14:57 +00:00 by (unknown) 2254ac4

DB IDO: Fix wrong check result timestamp for historical tables

refs #9055

2015-04-27 13:06:14 +00:00 by (unknown) b921439

Fix DbEvents::AddCheckableCheckHistory argument

refs #9055

2015-04-27 13:12:42 +00:00 by (unknown) dcf9e7b

Add timestamp option for checkresult test script

refs #9055

2015-04-27 13:12:51 +00:00 by (unknown) aaba240

DB IDO: Fix wrong check result timestamp for historical tables

refs #9055

2015-04-27 13:13:00 +00:00 by (unknown) bb59065

Fix DbEvents::AddCheckableCheckHistory argument

refs #9055

2015-06-24 12:12:18 +00:00 by (unknown) 45a0f09

Fix unused variable in dbevents

refs #9055

2015-06-24 12:14:50 +00:00 by (unknown) 656286f

Fix unused variable in dbevents

refs #9055

Relations:

@icinga-migration
Copy link
Author

Updated by tredel on 2015-04-14 08:44:03 +00:00

  • File added ido_sql.txt

SQL stuff is not readable; added as attachment

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-04-15 13:00:48 +00:00

  • Description updated

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-04-24 10:02:12 +00:00

  • Status changed from New to Feedback
  • Assigned to set to tredel

Probably this is just an DB IDO backend problem, and only loosely related to the cluster history replay. Is there anything else affected other than the 'state_time' column?

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-04-24 10:06:22 +00:00

  • Relates set to 8374

@icinga-migration
Copy link
Author

Updated by tredel on 2015-04-24 10:13:12 +00:00

Seems like the data within status.dat are also wrong (see screenshots).

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-04-24 10:19:07 +00:00

Well, probably the same issue which might need a separate issue though. Using DB IDO there are most likely other historical tables affected.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-04-24 15:43:01 +00:00

  • Subject changed from Wrong timestamps after cluster replay to Wrong timestamps w/ historical data replay in DB IDO
  • Category changed from Cluster to DB IDO
  • Status changed from Feedback to Assigned
  • Assigned to changed from tredel to mfriedrich
  • Target Version set to 2.4.0
  • Estimated Hours set to 16

Compat log issues

In terms of Classic UI using static compat log files - this cannot be fixed easily, as it requires to rewrite the entire log files and archive based on the order. Which is something we cannot (and probably never will) implement just for fixing this issue. One thing we must avoid is to mix older time stamps into newer log files and destroy the order - the Classic UI log parser cannot determine these values correctly and this would cause confusion and irritation looking at these log files.

We should probably discuss that one separately, and abandon Classic UI for tests using a different historical backend than the database.

DB IDO timestamp

For the DB IDO bits, this is reproducible by throwing in an external command for processing a check result using a different timestamp. Apparently we pass that one correctly from the external command processor to the generated check result object itself.

Wrong

michi@nbmif ~/coding/icinga/icinga2 (fix/replay-timestamps-9055) $ sudo ./tools/scripts/process_check_result -H nbmif.int.netways.de -S passive -r 0 -T 1429880000
Sending '/bin/echo [1429880000] PROCESS_SERVICE_CHECK_RESULT;nbmif.int.netways.de;passive;0;icinga2 extcmdfile test @ 1429884382 >> /var/run/icinga2/cmd/icinga2.cmd'

MariaDB [icinga]> select * from icinga_statehistory sh join icinga_objects sho on sh.object_id=sho.object_id where sho.name1='nbmif.int.netways.de' and sho.name2='passive' order by state_time desc limit 1;
+-----------------+-------------+---------------------+-----------------+-----------+--------------+-------+------------+-----------------------+--------------------+------------+-----------------+--------------------------------------+-------------+----------------------+--------------------+-----------+-------------+---------------+----------------------+---------+-----------+
| statehistory_id | instance_id | state_time          | state_time_usec | object_id | state_change | state | state_type | current_check_attempt | max_check_attempts | last_state | last_hard_state | output                               | long_output | check_source         | endpoint_object_id | object_id | instance_id | objecttype_id | name1                | name2   | is_active |
+-----------------+-------------+---------------------+-----------------+-----------+--------------+-------+------------+-----------------------+--------------------+------------+-----------------+--------------------------------------+-------------+----------------------+--------------------+-----------+-------------+---------------+----------------------+---------+-----------+
|         1196421 |           1 | 2015-04-24 16:06:22 |          200017 |     60389 |            1 |     0 |          1 |                     1 |                  3 |          1 |               0 | icinga2 extcmdfile test @ 1429884382 |             | nbmif.int.netways.de |              58653 |     60389 |           1 |             2 | nbmif.int.netways.de | passive |         1 |
+-----------------+-------------+---------------------+-----------------+-----------+--------------+-------+------------+-----------------------+--------------------+------------+-----------------+--------------------------------------+-------------+----------------------+--------------------+-----------+-------------+---------------+----------------------+---------+-----------+
1 row in set (0,01 sec)

Fix Attempt

michi@nbmif ~/coding/icinga/icinga2 (fix/replay-timestamps-9055) $ sudo ./tools/scripts/process_check_result -H nbmif.int.netways.de -S passive -r 2 -T 1429880000
Sending '/bin/echo [1429880000] PROCESS_SERVICE_CHECK_RESULT;nbmif.int.netways.de;passive;2;icinga2 extcmdfile test @ 1429884584 >> /var/run/icinga2/cmd/icinga2.cmd'

(different states sent)

MariaDB [icinga]> select * from icinga_statehistory sh join icinga_objects sho on sh.object_id=sho.object_id where sho.name1='nbmif.int.netways.de' and sho.name2='passive' order by state_time desc;
+-----------------+-------------+---------------------+-----------------+-----------+--------------+-------+------------+-----------------------+--------------------+------------+-----------------+--------------------------------------+-------------+----------------------+--------------------+-----------+-------------+---------------+----------------------+---------+-----------+
| statehistory_id | instance_id | state_time          | state_time_usec | object_id | state_change | state | state_type | current_check_attempt | max_check_attempts | last_state | last_hard_state | output                               | long_output | check_source         | endpoint_object_id | object_id | instance_id | objecttype_id | name1                | name2   | is_active |
+-----------------+-------------+---------------------+-----------------+-----------+--------------+-------+------------+-----------------------+--------------------+------------+-----------------+--------------------------------------+-------------+----------------------+--------------------+-----------+-------------+---------------+----------------------+---------+-----------+
|         1204130 |           1 | 2015-04-24 14:53:20 |               0 |     60389 |            1 |     0 |          1 |                     1 |                  3 |          2 |               0 | icinga2 extcmdfile test @ 1429886226 |             | nbmif.int.netways.de |              58653 |     60389 |           1 |             2 | nbmif.int.netways.de | passive |         1 |
|         1194599 |           1 | 2015-04-24 14:53:20 |               0 |     60389 |            1 |     1 |          1 |                     1 |                  3 |          3 |               1 | icinga2 extcmdfile test @ 1429883944 |             | nbmif.int.netways.de |              58653 |     60389 |           1 |             2 | nbmif.int.netways.de | passive |         1 |
|         1196233 |           1 | 2015-04-24 14:53:20 |               0 |     60389 |            1 |     2 |          1 |                     1 |                  3 |          1 |               2 | icinga2 extcmdfile test @ 1429884331 |             | nbmif.int.netways.de |              58653 |     60389 |           1 |             2 | nbmif.int.netways.de | passive |         1 |
|         1197227 |           1 | 2015-04-24 14:53:20 |               0 |     60389 |            1 |     2 |          1 |                     1 |                  3 |          1 |               2 | icinga2 extcmdfile test @ 1429884584 |             | nbmif.int.netways.de |              58653 |     60389 |           1 |             2 | nbmif.int.netways.de | passive |         1 |
+-----------------+-------------+---------------------+-----------------+-----------+--------------+-------+------------+-----------------------+--------------------+------------+-----------------+--------------------------------------+-------------+----------------------+--------------------+-----------+-------------+---------------+----------------------+---------+-----------+
4 rows in set (0,00 sec)

Other interfaces

Gelf

Sets the current timestamp for each message. Similar changes required as already implemented with GraphiteWriter.

Perfdata

-> TIMET default in *.ti file must be changed?

API

We should keep that in mind when implementing the API streams.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-04-24 16:15:48 +00:00

  • Status changed from Assigned to Feedback
  • Assigned to changed from mfriedrich to tredel

I've pushed a fix to the git master. Please test with the latest snapshot package builds.

@icinga-migration
Copy link
Author

Updated by tredel on 2015-04-26 10:13:47 +00:00

I made a quick check with latest snapshot package. Seems to work :)

I will make further detailed tests during next week. Many many thanks!!

@icinga-migration
Copy link
Author

Updated by mfrosch on 2015-04-27 08:20:35 +00:00

  • Backport? changed from TBD to Yes

@icinga-migration
Copy link
Author

Updated by mfrosch on 2015-04-27 08:21:07 +00:00

  • Backport? changed from Yes to TBD

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-04-27 09:21:34 +00:00

  • Relates set to 9183

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-04-27 09:22:37 +00:00

  • Relates set to 9184

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-04-27 09:25:31 +00:00

  • Relates set to 9185

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-04-27 09:26:07 +00:00

  • Status changed from Feedback to Assigned
  • Assigned to changed from tredel to mfriedrich
  • Done % changed from 0 to 50

TODO

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-04-27 13:21:04 +00:00

  • Status changed from Assigned to Resolved
  • Target Version changed from 2.4.0 to 2.3.5
  • Done % changed from 50 to 100
  • Backport? changed from TBD to Yes

Gelf/OpenTsdbWriter should be handled by their maintainers. PerfdataWriter has been modified accordingly. CompatLog is a general wont-fix.

I assume this works from my previous tests. If not, feel free to comment here and I might re-open the issue.

@icinga-migration icinga-migration added bug Something isn't working area/db-ido Database output labels Jan 17, 2017
@icinga-migration icinga-migration added this to the 2.3.5 milestone Jan 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/db-ido Database output bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant