Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev.icinga.com #12718] Crash in ClusterEvents::SendNotificationsAPIHandler #4665

Closed
icinga-migration opened this issue Sep 13, 2016 · 10 comments
Labels
bug Something isn't working
Milestone

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/12718

Created by pellucid on 2016-09-13 13:00:21 +00:00

Assignee: gbeutner
Status: Resolved (closed on 2016-09-13 20:15:04 +00:00)
Target Version: 2.6.0
Last Update: 2016-12-06 16:07:32 +00:00 (in Redmine)

Icinga Version: 2.5.4
Backport?: Not yet backported
Include in Changelog: 1

Our icinga checker (part of larger cluster) is exiting regularly with a SEGFAULT. While running it within gdb showed me this info:

#0  std::_Rb_tree, std::_Select1st >, std::less, std::allocator > >::find (this=0x18, __k=...) at /opt/rh/devtoolset-2/root/usr/include/c++/4.8.2/bits/stl_tree.h:1803
#1  0x00007ffff7139e15 in find (this=0x0, key=...)

at /opt/rh/devtoolset-2/root/usr/include/c**/4.8.2/bits/stl_map.h:837

#2  icinga::Dictionary::Get (this=0x0, key=...) at ../base/dictionary.cpp:41
#3  0x00007ffff0fa8e57 in icinga::ClusterEvents::SendNotificationsAPIHandler (origin=

(boost::intrusive_ptricinga::MessageOrigin) 0x7fffe41f0140, params=
(boost::intrusive_ptricinga::Dictionary) 0x7fffe4a70ae0) at ../icinga/clusterevents.cpp:832

#4  0x00007ffff680f41f in boost::detail::function::function_invoker2 const&, boost::intrusive_ptr const&), icinga::Value, boost::intrusive_ptr const&, boost::intrusive_ptr const&>::invoke (function_ptr=Unhandled dwarf expression opcode 0xf3

)
at /usr/include/boost153/boost/function/function_template.hpp:95

#5  0x00007ffff67bbb7d in operator() (this=Unhandled dwarf expression opcode 0xf3

) at /usr/include/boost153/boost/function/function_template.hpp:767

#6  icinga::ApiFunction::Invoke (this=Unhandled dwarf expression opcode 0xf3

) at ../remote/apifunction.cpp:31

#7  0x00007ffff6809937 in icinga::JsonRpcConnection::MessageHandler (this=0x7fffe518dce0, jsonString=Unhandled dwarf expression opcode 0xf3

)
at ../remote/jsonrpcconnection.cpp:202

#8  0x00007ffff680b00b in icinga::JsonRpcConnection::MessageHandlerWrapper (this=0x7fffe518dce0, jsonString=...)

at ../remote/jsonrpcconnection.cpp:148

#9  0x00007ffff715a542 in operator() (this=0x7fffb8005aa8)

at /usr/include/boost153/boost/function/function_template.hpp:767

#10 icinga::WorkQueue::WorkerThreadProc (this=0x7fffb8005aa8) at ../base/workqueue.cpp:234
#11 0x00007ffff7bd25c3 in ?? () from /usr/lib64/libboost_thread.so.1.53.0
#12 0x00007ffff4896aa1 in start_thread (arg=0x7fff9b8a2700) at pthread_create.c:301
#13 0x00007ffff45e393d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
icinga version:
icinga2 - The Icinga 2 network monitoring daemon (version: v2.5.4)

Copyright (c) 2012-2016 Icinga Development Team (https://www.icinga.org/)
License GPLv2+: GNU GPL version 2 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Application information:
  Installation root: /usr
  Sysconf directory: /etc
  Run directory: /var/run
  Local state directory: /var
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /var/run/icinga2/icinga2.pid

System information:
  Platform: CentOS
  Platform version: 6.7 (Final)
  Kernel: Linux
  Kernel version: 2.6.32-573.22.1.el6.x86_64
  Architecture: x86_64

Build information:
  Compiler: GNU 4.8.2
  Build host: 10525b1d9830

Icinga2 features enabled:

Disabled features: command compatlog debuglog gelf graphite influxdb livestatus notification opentsdb perfdata statusdata syslog
Enabled features: api checker mainlog

validation:

icinga2 daemon --validate
information/cli: Icinga application loader (version: v2.5.4)
information/cli: Loading configuration file(s).
information/ConfigItem: Committing config item(s).
information/ApiListener: My API identity: infra-icinga2-checker1
warning/ApplyRule: Apply rule 'satellite-host' (in /var/lib/icinga2/api/zones/global-templates/_etc/satellite.conf: 29:1-29:41) for type 'Dependency' does not match anywhere!
warning/ApplyRule: Apply rule 'mail-icingaadmin' (in /var/lib/icinga2/api/zones/global-templates/_etc/notifications.conf: 11:1-11:45) for type 'Notification' does not match anywhere!
warning/ApplyRule: Apply rule 'mail-icingaadmin' (in /var/lib/icinga2/api/zones/global-templates/_etc/notifications.conf: 20:1-20:48) for type 'Notification' does not match anywhere!
warning/ApplyRule: Apply rule 'backup-downtime' (in /var/lib/icinga2/api/zones/global-templates/_etc/downtimes.conf: 5:1-5:52) for type 'ScheduledDowntime' does not match anywhere!
warning/ApplyRule: Apply rule 'cluster-zone-iaas' (in /var/lib/icinga2/api/zones/global-templates/_etc/icinga_monitoring.conf: 5:1-5:33) for type 'Service' does not match anywhere!
warning/ApplyRule: Apply rule 'cluster-zone-am1' (in /var/lib/icinga2/api/zones/global-templates/_etc/icinga_monitoring.conf: 14:1-14:32) for type 'Service' does not match anywhere!
warning/ApplyRule: Apply rule 'cluster-zone-am3' (in /var/lib/icinga2/api/zones/global-templates/_etc/icinga_monitoring.conf: 23:1-23:32) for type 'Service' does not match anywhere!
warning/ApplyRule: Apply rule 'cluster-zone-eun' (in /var/lib/icinga2/api/zones/global-templates/_etc/icinga_monitoring.conf: 32:1-32:32) for type 'Service' does not match anywhere!
warning/ApplyRule: Apply rule 'cluster' (in /var/lib/icinga2/api/zones/global-templates/_etc/icinga_monitoring.conf: 42:1-42:23) for type 'Service' does not match anywhere!
warning/ApplyRule: Apply rule 'icinga' (in /var/lib/icinga2/api/zones/global-templates/_etc/icinga_monitoring.conf: 52:1-52:22) for type 'Service' does not match anywhere!
warning/ApplyRule: Apply rule 'disk' (in /var/lib/icinga2/api/zones/global-templates/_etc/icinga_monitoring.conf: 61:1-61:20) for type 'Service' does not match anywhere!
warning/ApplyRule: Apply rule 'load' (in /var/lib/icinga2/api/zones/global-templates/_etc/icinga_monitoring.conf: 70:1-70:20) for type 'Service' does not match anywhere!
warning/ApplyRule: Apply rule 'procs' (in /var/lib/icinga2/api/zones/global-templates/_etc/icinga_monitoring.conf: 79:1-79:21) for type 'Service' does not match anywhere!
warning/ApplyRule: Apply rule 'check_ssh' (in /var/lib/icinga2/api/zones/global-templates/_etc/infra_services.conf: 1:0-1:24) for type 'Service' does not match anywhere!
warning/ApplyRule: Apply rule 'check_ftp' (in /var/lib/icinga2/api/zones/global-templates/_etc/infra_services.conf: 7:1-7:25) for type 'Service' does not match anywhere!
warning/ApplyRule: Apply rule 'check_root' (in /var/lib/icinga2/api/zones/global-templates/_etc/infra_services.conf: 14:1-14:26) for type 'Service' does not match anywhere!
warning/ApplyRule: Apply rule 'check_3wareraid' (in /var/lib/icinga2/api/zones/global-templates/_etc/infra_services.conf: 22:1-22:31) for type 'Service' does not match anywhere!
warning/ApplyRule: Apply rule 'check_megaraid_raid' (in /var/lib/icinga2/api/zones/global-templates/_etc/infra_services.conf: 31:1-31:35) for type 'Service' does not match anywhere!
warning/ApplyRule: Apply rule 'check_backup_mounts' (in /var/lib/icinga2/api/zones/global-templates/_etc/infra_services.conf: 48:1-48:35) for type 'Service' does not match anywhere!
warning/ApplyRule: Apply rule 'check-rabbitmq-overview' (in /var/lib/icinga2/api/zones/global-templates/_etc/openstack_functional.conf: 20:1-20:39) for type 'Service' does not match anywhere!
warning/ApplyRule: Apply rule 'check-mysql-slave' (in /var/lib/icinga2/api/zones/global-templates/_etc/openstack_services.conf: 351:1-351:33) for type 'Service' does not match anywhere!
warning/ApplyRule: Apply rule 'check-rabbitmq-proc' (in /var/lib/icinga2/api/zones/global-templates/_etc/openstack_services.conf: 928:1-928:35) for type 'Service' does not match anywhere!
warning/ApplyRule: Apply rule 'check-rabbitmq' (in /var/lib/icinga2/api/zones/global-templates/_etc/openstack_services.conf: 1029:1-1029:30) for type 'Service' does not match anywhere!
warning/ApplyRule: Apply rule 'check_cpu' (in /var/lib/icinga2/api/zones/global-templates/_etc/windows_services.conf: 1:0-1:24) for type 'Service' does not match anywhere!
warning/ApplyRule: Apply rule 'check_disk' (in /var/lib/icinga2/api/zones/global-templates/_etc/windows_services.conf: 12:1-12:26) for type 'Service' does not match anywhere!
warning/ApplyRule: Apply rule 'check_memory' (in /var/lib/icinga2/api/zones/global-templates/_etc/windows_services.conf: 23:1-23:28) for type 'Service' does not match anywhere!
information/ConfigItem: Instantiated 1 ApiUser.
information/ConfigItem: Instantiated 1 ApiListener.
information/ConfigItem: Instantiated 7 Zones.
information/ConfigItem: Instantiated 1 FileLogger.
information/ConfigItem: Instantiated 7 Endpoints.
information/ConfigItem: Instantiated 3 NotificationCommands.
information/ConfigItem: Instantiated 3412 Notifications.
information/ConfigItem: Instantiated 63 CheckCommands.
information/ConfigItem: Instantiated 4 Downtimes.
information/ConfigItem: Instantiated 300 Hosts.
information/ConfigItem: Instantiated 1 IcingaApplication.
information/ConfigItem: Instantiated 68 HostGroups.
information/ConfigItem: Instantiated 3112 Dependencies.
information/ConfigItem: Instantiated 1 UserGroup.
information/ConfigItem: Instantiated 2 Users.
information/ConfigItem: Instantiated 3112 Services.
information/ConfigItem: Instantiated 4 TimePeriods.
information/ConfigItem: Instantiated 89 ServiceGroups.
information/ConfigItem: Instantiated 1 CheckerComponent.
information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
information/cli: Finished validating the configuration file(s).

The icinga2.conf file:

/**
 * Icinga 2 configuration file
 * - this is where you define settings for the Icinga application including
 * which hosts/services to check.
 *
 * For an overview of all available configuration options please refer
 * to the documentation that is distributed as part of Icinga 2.
 */

/**
 * The constants.conf defines global constants.
 */
include "constants.conf"

/**
 * The zones.conf defines zones for a cluster setup.
 * Not required for single instance setups.
 */
include "zones.conf"

/**
 * The Icinga Template Library (ITL) provides a number of useful templates
 * and command definitions.
 * Common monitoring plugin command definitions are included separately.
 */
include 
include 
include 
// include 

/**
 * The features-available directory contains a number of configuration
 * files for features which can be enabled and disabled using the
 * icinga2 feature enable / icinga2 feature disable CLI commands.
 * These commands work by creating and removing symbolic links in
 * the features-enabled directory.
 */
include "features-enabled/*.conf"

/**
 * The repository.d directory contains all configuration objects
 * managed by the 'icinga2 repository' CLI commands.
 */
include_recursive "repository.d"

/**
 * Although in theory you could define all your objects in this file
 * the preferred way is to create separate directories and files in the conf.d
 * directory. Each of these files must have the file extension ".conf".
 */
#include_recursive "conf.d"

The zones.conf:

/*
 * Endpoint and Zone configuration for a cluster setup
 * This local example requires `NodeName` defined in
 * constants.conf.
 */

object Endpoint "infra-icinga.cloudvps.com" {
  host = "14.6.20.2"
  log_duration = 1h
}


object Endpoint "infra-icinga2-checker1" {
  host = "10.255.7.236"
  log_duration = 1h
}
object Endpoint "infra-icinga2-checker2" {
  host = "10.255.3.249"
  log_duration = 1h
}
object Endpoint "infra-icinga2-checker3" {
  host = "10.255.15.54"
  log_duration = 1h
}
object Endpoint "infra-icinga2-checker4" {
  host = "10.255.6.202"
  log_duration = 1h
}
object Endpoint "infra-icinga2-checker5" {
  host = "10.255.3.243"
  log_duration = 1h
}
object Endpoint "infra-icinga2-checker6" {
  host = "10.255.6.222"
  log_duration = 1h
}

object Zone "central" {
  endpoints = [ "infra-icinga.cloudvps.com" ]
}

object Zone "ams1" {
  parent = "centraal"
  endpoints = [  "infra-icinga2-checker1",  "infra-icinga2-checker6",  ]
}
object Zone "eun" {
  parent = "centraal"
  endpoints = [  "infra-icinga2-checker2",  "infra-icinga2-checker5",  ]
}
object Zone "ams3" {
  parent = "centraal"
  endpoints = [  "infra-icinga2-checker3",  ]
}
object Zone "iaas" {
  parent = "centraal"
  endpoints = [  "infra-icinga2-checker4",  ]
}

/*
 * Defines a global zone containing templates,
 * etc. synced to all nodes, if they accept
 * configuration. All remote nodes need
 * this zone configured too.
 */

object Zone "global-templates" {
  global = true
}
object Zone "cloudvps-templates" {
  global = true
}

Just some data i saw in dmesg:
icinga2[18519]: segfault at 28 ip 00002aac6920fe42 sp 00002aaca01fec20 error 4 in libbase.so[2aac690c2000+1df000]
icinga2[9418]: segfault at 28 ip 00002b475c37ee42 sp 00002b4785021c20 error 4 in libbase.so[2b475c231000+1df000]
icinga2[30469]: segfault at 28 ip 00007f26f5a12e42 sp 00007f26eefecc20 error 4 in libbase.so[7f26f58c5000+1df000]
icinga2[21812]: segfault at 28 ip 00007f765d51ee42 sp 00007f7656b3bc20 error 4 in libbase.so[7f765d3d1000+1df000]

Can you guys help me out what i need to do?

Attachments

Changesets

2016-09-13 20:14:11 +00:00 by gbeutner 8fd454f

Fix crash in ClusterEvents::SendNotificationsAPIHandler

fixes #12718

Relations:

@icinga-migration
Copy link
Author

Updated by pellucid on 2016-09-13 13:29:46 +00:00

BTW, The other members in the zone, don't fail as often as the checker1 instances.

@icinga-migration
Copy link
Author

Updated by gbeutner on 2016-09-13 20:14:36 +00:00

  • Status changed from New to Assigned
  • Assigned to set to gbeutner
  • Target Version set to 2.6.0

@icinga-migration
Copy link
Author

Updated by gbeutner on 2016-09-13 20:15:04 +00:00

  • Status changed from Assigned to Resolved
  • Done % changed from 0 to 100

Applied in changeset 8fd454f.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-09-14 09:16:32 +00:00

  • Relates set to 12677

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-11-10 18:59:08 +00:00

  • Relates set to 13151

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-11-16 15:36:05 +00:00

  • Relates set to 12861

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-12-05 09:56:02 +00:00

  • Relates set to 13371

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-12-06 16:07:32 +00:00

  • Subject changed from Checker exiting with segfault regularly to Crash in ClusterEvents::SendNotificationsAPIHandler

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-12-06 16:08:10 +00:00

  • Relates set to 12650

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2016-12-06 16:10:20 +00:00

  • Relates set to 12765

@icinga-migration icinga-migration added bug Something isn't working libbase labels Jan 17, 2017
@icinga-migration icinga-migration added this to the 2.6.0 milestone Jan 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant