Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev.icinga.com #10994] Replay log handling proposal #3860

Closed
icinga-migration opened this issue Jan 20, 2016 · 9 comments
Closed

[dev.icinga.com #10994] Replay log handling proposal #3860

icinga-migration opened this issue Jan 20, 2016 · 9 comments
Labels
area/distributed Distributed monitoring (master, satellites, clients) bug Something isn't working

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/10994

Created by tgelf on 2016-01-20 13:30:10 +00:00

Assignee: (none)
Status: Closed (closed on 2016-09-07 12:44:08 +00:00)
Target Version: (none)
Last Update: 2016-11-09 14:52:20 +00:00 (in Redmine)

Icinga Version: 2.4.1
Backport?: Not yet backported
Include in Changelog: 1

Our current replay log implementation puts unnecessary burden on our nodes, mostly caused by locking and file-parsing. First of all, we should write a BIG FAT hint to the documentation, stating that log_duration for agent endpoints should be set to 0 on their master(s). That's what one should already do as of today, so nothing new here.

A lightweight and more robust logic could then work as follows:

  • change the logposition information from "timestamp" to "filename:offset", with filename still carrying it's creation timestamp for sorting reasons
  • file names should carry zone names (or sit in zone-specific directories)
  • we can now ship whole files without parsing them when a replay is necessary
  • each peering endpoint will be shipped only files for zones it is allowed to see
  • the "current" file should not be transmitted this way
  • glob/ship all the other files
  • repeat the last step unless only "current" remains
  • eventually force just one more rotation and repeat the above, just to keep the next step as short as possible
  • lock and ship "current", similar to how it happens today
  • rotate "current", replay is all done

Regards,
Thomas


Relations:

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-01-22 15:13:43 +00:00

  • Target Version set to Backlog

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-01-25 09:59:57 +00:00

  • Target Version changed from Backlog to 2.5.0

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-01-25 10:32:37 +00:00

  • Target Version changed from 2.5.0 to 2.4.2

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-02-04 12:04:02 +00:00

  • Target Version changed from 2.4.2 to Backlog

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-02-24 23:54:02 +00:00

  • Relates set to 9730

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-03-18 14:33:53 +00:00

  • Parent Id set to 11313

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-09-07 12:44:08 +00:00

  • Status changed from New to Closed
  • Target Version deleted Backlog

The underlying problem should already have been fixed. We've also added a chapter inside the documentation which explains the usage of log_duration for client setups.

@icinga-migration
Copy link
Author

Updated by tgelf on 2016-09-07 13:02:22 +00:00

dnsmichi wrote:

The underlying problem should already have been fixed.

Which one? What happens for example with agents when their local time drifts away?

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-11-09 14:52:20 +00:00

  • Parent Id deleted 11313

@icinga-migration icinga-migration added bug Something isn't working area/distributed Distributed monitoring (master, satellites, clients) labels Jan 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/distributed Distributed monitoring (master, satellites, clients) bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant