Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev.icinga.com #10410] OpenBSD: hang during ConfigItem::ActivateItems() in daemon startup #3517

Closed
icinga-migration opened this issue Oct 20, 2015 · 23 comments
Labels
bug Something isn't working
Milestone

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/10410

Created by sthen on 2015-10-20 14:07:15 +00:00

Assignee: gbeutner
Status: Resolved (closed on 2015-11-10 12:42:56 +00:00)
Target Version: 2.4.0
Last Update: 2015-11-10 12:42:56 +00:00 (in Redmine)

Icinga Version: Introduced in 2.2.2, remains in 2.3.11
Backport?: No
Include in Changelog: 1

icinga2 startup has been broken on OpenBSD for a while. It gets to ConfigItem::ActivateItems() but hangs after the DynamicObject::RestoreObjects() call. I've eventually tracked down the commit that introduced it; reverting the fix for https://dev.icinga.org/issues/7769 allows startup to take place correctly. I don't know how to track it down further but please let me know if there's any particular information that would be useful or if you have anything you'd like me to try.

Changesets

2015-10-20 20:55:16 +00:00 by (unknown) 9002f80

Make sure the timer thread is gone when calling fork()

refs #10410

2015-10-20 21:02:11 +00:00 by (unknown) 3c6f0e3

Make sure the timer thread is gone when calling fork()

refs #10410

2015-10-21 05:02:49 +00:00 by (unknown) cb230a0

Ensure that the main thread pool gets re-initialized properly after fork()

refs #10410

2015-10-21 07:18:52 +00:00 by (unknown) e93dd3c

Ensure that the main thread pool gets re-initialized properly after fork()

refs #10410

2015-11-09 19:39:26 +00:00 by (unknown) 9ea51aa

Use non-blocking open() for the command pipe

fixes #10410

2015-11-10 10:41:21 +00:00 by (unknown) 0a6505c

Use poll() for the command pipe fd

refs #10410

Relations:

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-10-20 15:56:01 +00:00

  • Relates set to 7769

@icinga-migration
Copy link
Author

Updated by gbeutner on 2015-10-20 19:33:56 +00:00

  • Category set to Packages
  • Status changed from New to Feedback
  • Assigned to set to sthen

Can you attach gdb to the icinga2 process and show me a stacktrace (thread apply all bt full)?

FWIW I've got the support/2.3 branch working on OpenBSD and it doesn't seem to be hanging (it's executing checks just fine):

# uname -a
OpenBSD openbsd.localdomain 5.8 GENERIC.MP#1236 amd64
# /opt/icinga2/sbin/icinga2 daemon
[2015-10-20 21:29:25 +0200] information/cli: Icinga application loader (version: v2.3.11; debug)
[2015-10-20 21:29:25 +0200] information/cli: Loading application type: icinga/IcingaApplication
[2015-10-20 21:29:25 +0200] information/Utility: Loading library 'libicinga.so'
[2015-10-20 21:29:25 +0200] information/ConfigCompiler: Compiling config file: /opt/icinga2/etc/icinga2/icinga2.conf
[2015-10-20 21:29:25 +0200] information/ConfigCompiler: Compiling config file: /opt/icinga2/etc/icinga2/constants.conf
[2015-10-20 21:29:25 +0200] information/ConfigCompiler: Compiling config file: /opt/icinga2/etc/icinga2/zones.conf
[2015-10-20 21:29:25 +0200] information/ConfigCompiler: Compiling config file: /opt/icinga2/share/icinga2/include/itl
[2015-10-20 21:29:25 +0200] information/ConfigCompiler: Compiling config file: /opt/icinga2/share/icinga2/include/command.conf
[2015-10-20 21:29:25 +0200] information/Utility: Loading library 'libmethods.so'
[2015-10-20 21:29:25 +0200] information/ConfigCompiler: Compiling config file: /opt/icinga2/share/icinga2/include/command-icinga.conf
[2015-10-20 21:29:25 +0200] information/ConfigCompiler: Compiling config file: /opt/icinga2/share/icinga2/include/timeperiod.conf
[2015-10-20 21:29:25 +0200] information/ConfigCompiler: Compiling config file: /opt/icinga2/share/icinga2/include/plugins
[2015-10-20 21:29:25 +0200] information/ConfigCompiler: Compiling config file: /opt/icinga2/share/icinga2/include/command-plugins.conf
[2015-10-20 21:29:25 +0200] information/ConfigCompiler: Compiling config file: /opt/icinga2/etc/icinga2/features-enabled/checker.conf
[2015-10-20 21:29:25 +0200] information/Utility: Loading library 'libchecker.so'
[2015-10-20 21:29:26 +0200] information/ConfigCompiler: Compiling config file: /opt/icinga2/etc/icinga2/features-enabled/mainlog.conf
[2015-10-20 21:29:26 +0200] information/ConfigCompiler: Compiling config file: /opt/icinga2/etc/icinga2/features-enabled/notification.conf
[2015-10-20 21:29:26 +0200] information/Utility: Loading library 'libnotification.so'
[2015-10-20 21:29:26 +0200] information/ConfigCompiler: Compiling config file: /opt/icinga2/etc/icinga2/conf.d/commands.conf
[2015-10-20 21:29:26 +0200] information/ConfigCompiler: Compiling config file: /opt/icinga2/etc/icinga2/conf.d/downtimes.conf
[2015-10-20 21:29:26 +0200] information/ConfigCompiler: Compiling config file: /opt/icinga2/etc/icinga2/conf.d/groups.conf
[2015-10-20 21:29:26 +0200] information/ConfigCompiler: Compiling config file: /opt/icinga2/etc/icinga2/conf.d/hosts.conf
[2015-10-20 21:29:26 +0200] information/ConfigCompiler: Compiling config file: /opt/icinga2/etc/icinga2/conf.d/notifications.conf
[2015-10-20 21:29:26 +0200] information/ConfigCompiler: Compiling config file: /opt/icinga2/etc/icinga2/conf.d/satellite.conf
[2015-10-20 21:29:26 +0200] information/ConfigCompiler: Compiling config file: /opt/icinga2/etc/icinga2/conf.d/services.conf
[2015-10-20 21:29:26 +0200] information/ConfigCompiler: Compiling config file: /opt/icinga2/etc/icinga2/conf.d/templates.conf
[2015-10-20 21:29:26 +0200] information/ConfigCompiler: Compiling config file: /opt/icinga2/etc/icinga2/conf.d/timeperiods.conf
[2015-10-20 21:29:26 +0200] information/ConfigCompiler: Compiling config file: /opt/icinga2/etc/icinga2/conf.d/users.conf
[2015-10-20 21:29:26 +0200] information/ConfigItem: Committing config items
[2015-10-20 21:29:26 +0200] warning/ApplyRule: Apply rule 'satellite-host' (in /opt/icinga2/etc/icinga2/conf.d/satellite.conf: 29:1-29:41) for type 'Dependency' does not match anywhere!
[2015-10-20 21:29:26 +0200] information/ConfigItem: Checked 1 Endpoint(s).
[2015-10-20 21:29:26 +0200] information/ConfigItem: Checked 1 Zone(s).
[2015-10-20 21:29:26 +0200] information/ConfigItem: Checked 52 CheckCommand(s).
[2015-10-20 21:29:26 +0200] information/ConfigItem: Checked 2 NotificationCommand(s).
[2015-10-20 21:29:26 +0200] information/ConfigItem: Checked 3 TimePeriod(s).
[2015-10-20 21:29:26 +0200] information/ConfigItem: Checked 1 CheckerComponent(s).
[2015-10-20 21:29:26 +0200] information/ConfigItem: Checked 1 FileLogger(s).
[2015-10-20 21:29:26 +0200] information/ConfigItem: Checked 1 NotificationComponent(s).
[2015-10-20 21:29:26 +0200] information/ConfigItem: Checked 2 HostGroup(s).
[2015-10-20 21:29:26 +0200] information/ConfigItem: Checked 3 ServiceGroup(s).
[2015-10-20 21:29:26 +0200] information/ConfigItem: Checked 1 Host(s).
[2015-10-20 21:29:26 +0200] information/ConfigItem: Checked 11 Service(s).
[2015-10-20 21:29:26 +0200] information/ConfigItem: Checked 1 User(s).
[2015-10-20 21:29:26 +0200] information/ConfigItem: Checked 12 Notification(s).
[2015-10-20 21:29:26 +0200] information/ConfigItem: Checked 1 UserGroup(s).
[2015-10-20 21:29:26 +0200] information/ConfigItem: Checked 1 IcingaApplication(s).
[2015-10-20 21:29:26 +0200] information/ConfigItem: Checked 1 ScheduledDowntime(s).
[2015-10-20 21:29:26 +0200] information/ScriptGlobal: Dumping variables to file '/opt/icinga2/var/cache/icinga2/icinga2.vars'
[2015-10-20 21:29:26 +0200] information/DynamicObject: Restoring program state from file '/opt/icinga2/var/lib/icinga2/icinga2.state'
[2015-10-20 21:29:26 +0200] information/DynamicObject: Restored 95 objects. Loaded 0 new objects without state.
[2015-10-20 21:29:26 +0200] information/ConfigItem: Triggering Start signal for config items
[2015-10-20 21:29:26 +0200] information/ConfigItem: Activated all objects.
^C[2015-10-20 21:29:27 +0200] information/Application: Received request to shut down.
[2015-10-20 21:29:28 +0200] information/Application: Shutting down...
[2015-10-20 21:29:28 +0200] information/CheckerComponent: Checker stopped.
[2015-10-20 21:29:28 +0200] information/DynamicObject: Dumping program state to file '/opt/icinga2/var/lib/icinga2/icinga2.state'
[2015-10-20 21:29:28 +0200] information/IcingaApplication: Icinga has shut down.
#

@icinga-migration
Copy link
Author

Updated by gbeutner on 2015-10-20 19:49:24 +00:00

Aha, it does seem to hang when run as 'icinga2 daemon -d'. Interesting.

@icinga-migration
Copy link
Author

Updated by gbeutner on 2015-10-20 20:57:00 +00:00

  • Status changed from Feedback to Assigned
  • Assigned to changed from sthen to gbeutner

Alright, I can definitely see how 7d93788 would cause this. The problem is we're holding a lock on a mutex while calling fork(). Now that I think about it I'm fairly certain I've seen a similar issue on AIX.

This doesn't seem to quite fix everything yet though, the child process dies after a few seconds (SIGBUS). Still looking into that one. :)

@icinga-migration
Copy link
Author

Updated by gbeutner on 2015-10-20 21:03:20 +00:00

  • Category changed from Packages to libbase
  • Status changed from Assigned to Feedback
  • Assigned to changed from gbeutner to sthen

Ok, the second version of the patch should be working now (3c6f0e3). Can you try whether this fixes your problem? :)

@icinga-migration
Copy link
Author

Updated by sthen on 2015-10-20 21:32:49 +00:00

This fixes the hang, but something's not quite right I think. Nothing is written to main-log until I shutdown i2, and then the following is logged (everything appears in the file at the same time):

[2015-10-20 22:25:45 +0100] information/ConfigItem: Activated all objects.
[2015-10-20 22:26:56 +0100] information/Application: Received request to shut down.
[2015-10-20 22:26:58 +0100] information/Application: Shutting down...
[2015-10-20 22:26:58 +0100] information/CheckerComponent: Checker stopped.
[2015-10-20 22:26:58 +0100] information/DbConnection: Pausing IDO connection: ido-mysql

Compare to the case with 7d93788 backed out completely, I get this written to log immediately at startup

[2015-10-20 22:28:24 +0100] information/ConfigItem: Activated all objects.
[2015-10-20 22:28:24 +0100] information/IdoMysqlConnection: MySQL IDO instance id: 1 (schema version: '1.13.0')

and then at shutdown

[2015-10-20 22:31:07 +0100] information/Application: Received request to shut down.
[2015-10-20 22:31:07 +0100] information/Application: Shutting down...
[2015-10-20 22:31:07 +0100] information/CheckerComponent: Checker stopped.
[2015-10-20 22:31:07 +0100] information/DbConnection: Pausing IDO connection: ido-mysql

@icinga-migration
Copy link
Author

Updated by gbeutner on 2015-10-21 07:41:46 +00:00

Ok, can you try both 3c6f0e3 and e93dd3c?

@icinga-migration
Copy link
Author

Updated by sthen on 2015-10-21 08:37:58 +00:00

That did the trick, thanks Gunnar!

@icinga-migration
Copy link
Author

Updated by gbeutner on 2015-10-21 09:03:34 +00:00

  • Status changed from Feedback to Resolved
  • Assigned to changed from sthen to gbeutner
  • Target Version set to 2.4.0

Btw, on an unrelated note: You can use -DFLEX_EXECUTABLE=/usr/local/bin/gflex, i.e. you don't need to create symlinks. :P

@icinga-migration
Copy link
Author

Updated by sthen on 2015-10-21 20:01:43 +00:00

Possibly related to this, so I've kept it in this ticket for now but I can move it to a separate ticket if preferred. I'm preparing a port of icingaweb2 so I've enabled the command pipe to test this (and work out how to handle our default chroot jail for PHP). When restarting the daemon after enabling this, it results in a similar problem, i.e. 'icinga2 daemon -d' fails to finish daemonizing.

$ sudo icinga2 feature enable command
Enabling feature command. Make sure to restart Icinga 2 for these changes to take effect.
$ ls -la /var/run/icinga2/cmd/
total 8
drwxr-xr-x  2 _icinga  wheel  512 Oct 21 20:47 ./
drwxr-xr-x  3 _icinga  wheel  512 Oct 21 20:38 ../
$ sudo icinga2 daemon -d
[2015-10-21 20:47:38 +0100] information/cli: Icinga application loader (version: r2.3.11-1)
[2015-10-21 20:47:38 +0100] information/cli: Loading application type: icinga/IcingaApplication
[2015-10-21 20:47:38 +0100] information/Utility: Loading library 'libicinga.so'
[2015-10-21 20:47:38 +0100] information/ConfigCompiler: Compiling config file: /etc/icinga2/icinga2.conf
[2015-10-21 20:47:38 +0100] information/ConfigCompiler: Compiling config file: /etc/icinga2/constants.conf
[2015-10-21 20:47:38 +0100] information/ConfigCompiler: Compiling config file: /etc/icinga2/zones.conf
[2015-10-21 20:47:38 +0100] information/ConfigCompiler: Compiling config file: /usr/local/share/icinga2/include/itl
[2015-10-21 20:47:38 +0100] information/ConfigCompiler: Compiling config file: /usr/local/share/icinga2/include/command.conf
[2015-10-21 20:47:38 +0100] information/Utility: Loading library 'libmethods.so'
[2015-10-21 20:47:38 +0100] information/ConfigCompiler: Compiling config file: /usr/local/share/icinga2/include/command-icinga.conf
[2015-10-21 20:47:38 +0100] information/ConfigCompiler: Compiling config file: /usr/local/share/icinga2/include/timeperiod.conf
[2015-10-21 20:47:38 +0100] information/ConfigCompiler: Compiling config file: /usr/local/share/icinga2/include/plugins
[2015-10-21 20:47:38 +0100] information/ConfigCompiler: Compiling config file: /usr/local/share/icinga2/include/command-plugins.conf
[2015-10-21 20:47:38 +0100] information/ConfigCompiler: Compiling config file: /etc/icinga2/features-enabled/checker.conf
[2015-10-21 20:47:38 +0100] information/Utility: Loading library 'libchecker.so'
[2015-10-21 20:47:38 +0100] information/ConfigCompiler: Compiling config file: /etc/icinga2/features-enabled/command.conf
[2015-10-21 20:47:38 +0100] information/Utility: Loading library 'libcompat.so'
[2015-10-21 20:47:38 +0100] information/ConfigCompiler: Compiling config file: /etc/icinga2/features-enabled/compatlog.conf
[2015-10-21 20:47:38 +0100] information/Utility: Loading library 'libcompat.so'
[2015-10-21 20:47:38 +0100] information/ConfigCompiler: Compiling config file: /etc/icinga2/features-enabled/ido-mysql.conf
[2015-10-21 20:47:38 +0100] information/Utility: Loading library 'libdb_ido_mysql.so'
[2015-10-21 20:47:38 +0100] information/ConfigCompiler: Compiling config file: /etc/icinga2/features-enabled/mainlog.conf
[2015-10-21 20:47:38 +0100] information/ConfigCompiler: Compiling config file: /etc/icinga2/features-enabled/notification.conf
[2015-10-21 20:47:38 +0100] information/Utility: Loading library 'libnotification.so'
[2015-10-21 20:47:38 +0100] information/ConfigCompiler: Compiling config file: /etc/icinga2/features-enabled/statusdata.conf
[2015-10-21 20:47:38 +0100] information/Utility: Loading library 'libcompat.so'
[2015-10-21 20:47:38 +0100] information/ConfigCompiler: Compiling config file: /etc/icinga2/features-enabled/syslog.conf
[2015-10-21 20:47:38 +0100] information/ConfigCompiler: Compiling config file: /etc/icinga2/conf.d/commands.conf
[2015-10-21 20:47:38 +0100] information/ConfigCompiler: Compiling config file: /etc/icinga2/conf.d/downtimes.conf
[2015-10-21 20:47:38 +0100] information/ConfigCompiler: Compiling config file: /etc/icinga2/conf.d/groups.conf
[2015-10-21 20:47:38 +0100] information/ConfigCompiler: Compiling config file: /etc/icinga2/conf.d/hosts.conf
[2015-10-21 20:47:38 +0100] information/ConfigCompiler: Compiling config file: /etc/icinga2/conf.d/notifications.conf
[2015-10-21 20:47:38 +0100] information/ConfigCompiler: Compiling config file: /etc/icinga2/conf.d/satellite.conf
[2015-10-21 20:47:38 +0100] information/ConfigCompiler: Compiling config file: /etc/icinga2/conf.d/services.conf
[2015-10-21 20:47:38 +0100] information/ConfigCompiler: Compiling config file: /etc/icinga2/conf.d/templates.conf
[2015-10-21 20:47:38 +0100] information/ConfigCompiler: Compiling config file: /etc/icinga2/conf.d/timeperiods.conf
[2015-10-21 20:47:38 +0100] information/ConfigCompiler: Compiling config file: /etc/icinga2/conf.d/users.conf
[2015-10-21 20:47:38 +0100] information/ConfigItem: Committing config items
[2015-10-21 20:47:38 +0100] warning/ApplyRule: Apply rule 'satellite-host' (in /etc/icinga2/conf.d/satellite.conf: 29:1-29:41) for type 'Dependency' does not match anywhere!
[2015-10-21 20:47:38 +0100] information/ConfigItem: Checked 1 Endpoint(s).
[2015-10-21 20:47:38 +0100] information/ConfigItem: Checked 1 Zone(s).
[2015-10-21 20:47:38 +0100] information/ConfigItem: Checked 52 CheckCommand(s).
[2015-10-21 20:47:38 +0100] information/ConfigItem: Checked 2 NotificationCommand(s).
[2015-10-21 20:47:38 +0100] information/ConfigItem: Checked 3 TimePeriod(s).
[2015-10-21 20:47:38 +0100] information/ConfigItem: Checked 1 CheckerComponent(s).
[2015-10-21 20:47:38 +0100] information/ConfigItem: Checked 1 ExternalCommandListener(s).
[2015-10-21 20:47:38 +0100] information/ConfigItem: Checked 1 CompatLogger(s).
[2015-10-21 20:47:38 +0100] information/ConfigItem: Checked 1 IdoMysqlConnection(s).
[2015-10-21 20:47:38 +0100] information/ConfigItem: Checked 1 FileLogger(s).
[2015-10-21 20:47:38 +0100] information/ConfigItem: Checked 1 NotificationComponent(s).
[2015-10-21 20:47:38 +0100] information/ConfigItem: Checked 1 StatusDataWriter(s).
[2015-10-21 20:47:38 +0100] information/ConfigItem: Checked 1 SyslogLogger(s).
[2015-10-21 20:47:38 +0100] information/ConfigItem: Checked 3 HostGroup(s).
[2015-10-21 20:47:38 +0100] information/ConfigItem: Checked 3 ServiceGroup(s).
[2015-10-21 20:47:38 +0100] information/ConfigItem: Checked 1 Host(s).
[2015-10-21 20:47:38 +0100] information/ConfigItem: Checked 11 Service(s).
[2015-10-21 20:47:38 +0100] information/ConfigItem: Checked 3 User(s).
[2015-10-21 20:47:38 +0100] information/ConfigItem: Checked 12 Notification(s).
[2015-10-21 20:47:38 +0100] information/ConfigItem: Checked 2 UserGroup(s).
[2015-10-21 20:47:38 +0100] information/ConfigItem: Checked 1 IcingaApplication(s).
[2015-10-21 20:47:38 +0100] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2015-10-21 20:47:38 +0100] information/DynamicObject: Restoring program state from file '/var/lib/icinga2/icinga2.state'
[2015-10-21 20:47:38 +0100] information/DynamicObject: Restored 102 objects. Loaded 1 new objects without state.
[2015-10-21 20:47:38 +0100] information/ConfigItem: Triggering Start signal for config items
[2015-10-21 20:47:38 +0100] information/DbConnection: Resuming IDO connection: ido-mysql
[2015-10-21 20:47:38 +0100] information/ConfigItem: Activated all objects.
[2015-10-21 20:47:43 +0100] information/Notification: Sending notification 'sym.spacehopper.org!disk!mail-icingaadmin' for user 'icingaadmin'
[2015-10-21 20:47:43 +0100] information/Notification: Sending notification 'sym.spacehopper.org!disk!mail-icingaadmin' for user 'sthen'

It has created the pipe at this point, and leaves me with two processes running:

$ ls -l /var/run/icinga2/cmd                              
total 0
prw-rw----  1 _icinga  wheel  0 Oct 21 20:47 icinga2.cmd|

$ ps ax|grep [i]cinga2
 1931 pi  S+      0:00.51 icinga2 daemon -d
25934 pi  D+      0:00.43 icinga2 daemon -d

pid 1931 is here:

Thread 1 (thread 1001931):

#0  0x000019700595fd1a in nanosleep () at :2
#1  0x0000196ffaffcf64 in nanosleep (rqtp=0x7f7ffffeb1d0, rmtp=0x0) at /usr/src/lib/librthread/rthread_cancel.c:295
#2  0x0000197005954497 in usleep (useconds=) at /usr/src/lib/libc/gen/usleep.c:46
#3  0x0000196fec1dbd3a in Daemonize () at /usr/obj/ports/icinga2-2.3.11/icinga2-2.3.11/lib/cli/daemoncommand.cpp:88
#4  icinga::DaemonCommand::Run (this=, vm=..., ap=...) at /usr/obj/ports/icinga2-2.3.11/icinga2-2.3.11/lib/cli/daemoncommand.cpp:279
#5  0x0000196d3f6137ea in Main () at /usr/obj/ports/icinga2-2.3.11/icinga2-2.3.11/icinga-app/icinga.cpp:469
#6  0x0000196d3f615dca in main (argc=3, argv=0x7f7ffffee858) at /usr/obj/ports/icinga2-2.3.11/icinga2-2.3.11/icinga-app/icinga.cpp:735

(gdb) quit

I haven't been able to get a backtrace from the second process; gdb hangs when I try to attach to it (looking in top the process wait chan is 'fdlock'). I'll have a poke and see if I can come up with some more information but wonder if you have any ideas?

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-10-22 13:53:39 +00:00

  • Backport? changed from TBD to No

@icinga-migration
Copy link
Author

Updated by gbeutner on 2015-10-22 15:49:27 +00:00

  • Status changed from Resolved to Assigned

Reopened the ticket. I'll have to test this when i have a bit of time (probably this weekend or next monday).

@icinga-migration
Copy link
Author

Updated by gbeutner on 2015-10-26 12:29:59 +00:00

  • Assigned to changed from gbeutner to mfriedrich

@icinga-migration
Copy link
Author

Updated by gbeutner on 2015-11-09 11:49:09 +00:00

  • Assigned to changed from mfriedrich to gbeutner

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-11-09 14:57:40 +00:00

  • Status changed from Assigned to Feedback
  • Assigned to changed from gbeutner to sthen
  • Target Version changed from 2.4.0 to Backlog

The ExternalCommandListener feature spawns a new thread which tries to open a FIFO file. Ordinarily this open() call should only block that thread until another process opens the FIFO file for writing. However, apparently other unrelated open() calls in other threads (e.g. main-log as file logger feature) are also blocked.

Example:

#include 
#include 
#include 
#include 
#include 

static void *thread_proc(void *arg)
{
        unlink("/tmp/test-fifo");
        mkfifo("/tmp/test-fifo", 0700);

        open("/tmp/test-fifo", O_RDONLY);
}

int main(void)
{
        pthread_t id;
        pthread_create(&id, NULL, &thread_proc, NULL);

        usleep(2000000);

        printf("Trying to open file\n");
        open("/tmp/test-log", O_WRONLY | O_CREAT);
        printf("Opened file\n");
}

As far as I understand this would appear to be a bug in the open() syscall on OpenBSD.

@icinga-migration
Copy link
Author

Updated by gbeutner on 2015-11-09 19:40:33 +00:00

  • Status changed from Feedback to Assigned
  • Assigned to changed from sthen to gbeutner
  • Target Version changed from Backlog to 2.4.0

@icinga-migration
Copy link
Author

Updated by Anonymous on 2015-11-09 19:45:04 +00:00

  • Status changed from Assigned to Resolved
  • Done % changed from 0 to 100

Applied in changeset 9ea51aa.

@icinga-migration
Copy link
Author

Updated by gbeutner on 2015-11-10 09:52:04 +00:00

  • Status changed from Resolved to Assigned

Meh, the command pipe thread is using 100% CPU time on Linux after this change.

@icinga-migration
Copy link
Author

Updated by sthen on 2015-11-10 11:23:37 +00:00

I asked around about this. Turns out the syscall bug in OpenBSD is known. There was an attempt at fixing it in the past (kern/kern_descrip.c 1.105, kern/vfs_syscalls.c 1.203, sys/filedesc.h 1.26), but there were problems with the attempted fix (commit log for the revert was "causes all new processes to get stuck after a while").

An alternative workaround, changing O_RDONLY to O_RDWR, was suggested - so though it's dirty, you could pull out the commit and do something like

--- lib/compat/externalcommandlistener.cpp.orig Tue Nov 10 10:55:06 2015
+++ lib/compat/externalcommandlistener.cpp  Tue Nov 10 11:00:23 2015
@@ -96,7 +96,12 @@ void ExternalCommandListener::CommandPipeThread(const 
        int fd;

        do {
+#ifdef __OpenBSD__
+/* Avoid OpenBSD bug: http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/regress/lib/libpthread/blocked_fifo/ */
+           fd = open(commandPath.CStr(), O_RDWR);
+#else
            fd = open(commandPath.CStr(), O_RDONLY);
+#endif
        } while (fd < 0 && errno == EINTR);

        if (fd < 0) {

@icinga-migration
Copy link
Author

Updated by gbeutner on 2015-11-10 11:26:51 +00:00

I think I may have already found a portable fix for the "100% cpu" problem on Linux:

  1. open the FIFO file with O_NONBLOCK
  2. Wait until the FD is readable (via poll() )
  3. Read from the FD.

I'll need to test this a bit more on OS X/Linux - but I think this is the cleanest solution for this problem.

@icinga-migration
Copy link
Author

Updated by gbeutner on 2015-11-10 12:42:56 +00:00

  • Status changed from Assigned to Resolved

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-12-15 09:42:22 +00:00

  • Relates set to 10841

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-12-15 09:42:30 +00:00

  • Relates set to 10700

@icinga-migration icinga-migration added bug Something isn't working libbase labels Jan 17, 2017
@icinga-migration icinga-migration added this to the 2.4.0 milestone Jan 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant