Skip to content
This repository has been archived by the owner on Jan 15, 2019. It is now read-only.

[dev.icinga.com #2617] status.cgi time out when displaying hostgroups in large environments #976

Closed
icinga-migration opened this issue May 16, 2012 · 32 comments

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/2617

Created by mzac on 2012-05-16 16:14:03 +00:00

Assignee: ricardo
Status: Resolved (closed on 2012-06-12 16:12:03 +00:00)
Target Version: 1.7.1
Last Update: 2014-12-08 09:42:45 +00:00 (in Redmine)

Icinga Version: 1.10.0
OS Version: any

We just upgraded our icinga installation to 1.7.0 and status.cgi is timing out when users are using nagstamon.

Nagstamon fetches this url:

http://servercgi-bin/status.cgi?hostgroup=all&style=hostdetail&hoststatustypes=12&hostprops=0

Getting this in my http logs:

[Wed May 16 12:07:32 2012] [error] [client x.x.x.95] Premature end of script headers: status.cgi
[Wed May 16 12:07:32 2012] [error] [client x.x.x.126] Premature end of script headers: status.cgi
[Wed May 16 12:07:32 2012] [error] [client x.x.x.95] Premature end of script headers: status.cgi
[Wed May 16 12:07:32 2012] [error] [client x.x.x.119] Premature end of script headers: status.cgi
[Wed May 16 12:07:32 2012] [error] [client x.x.x.107] Premature end of script headers: status.cgi
[Wed May 16 12:07:32 2012] [error] [client x.x.x.122] Premature end of script headers: status.cgi
[Wed May 16 12:07:32 2012] [error] [client x.x.x.117] Premature end of script headers: status.cgi
[Wed May 16 12:07:32 2012] [error] [client x.x.x.95] Premature end of script headers: status.cgi
[Wed May 16 12:07:32 2012] [error] [client x.x.x.119] Premature end of script headers: status.cgi
[Wed May 16 12:07:32 2012] [error] [client x.x.x.122] Premature end of script headers: status.cgi
[Wed May 16 12:07:32 2012] [error] [client x.x.x.107] Premature end of script headers: status.cgi, referer: https://icinga.ncs.mcgill.ca/icinga/cgi-bin/status.cgi?host=all&type=detail&servicestatustypes=16&hoststatustypes=3&serviceprops=2097162&nostatusheader
[Wed May 16 12:07:32 2012] [error] [client x.x.x.95] Premature end of script headers: status.cgi
[Wed May 16 12:08:52 2012] [warn] [client x.x.x.126] Timeout waiting for output from CGI script /usr/lib64/icinga/cgi/status.cgi
[Wed May 16 12:08:52 2012] [error] [client x.x.x.126] Script timed out before returning headers: status.cgi
[Wed May 16 12:08:52 2012] [warn] [client x.x.x.119] Timeout waiting for output from CGI script /usr/lib64/icinga/cgi/status.cgi
[Wed May 16 12:08:52 2012] [error] [client x.x.x.119] Script timed out before returning headers: status.cgi
[Wed May 16 12:08:52 2012] [warn] [client x.x.x.107] Timeout waiting for output from CGI script /usr/lib64/icinga/cgi/status.cgi
[Wed May 16 12:08:52 2012] [error] [client x.x.x.107] Script timed out before returning headers: status.cgi
[Wed May 16 12:08:52 2012] [warn] [client x.x.x.95] Timeout waiting for output from CGI script /usr/lib64/icinga/cgi/status.cgi
[Wed May 16 12:08:52 2012] [error] [client x.x.x.95] Script timed out before returning headers: status.cgi
[Wed May 16 12:08:52 2012] [warn] [client x.x.x.122] Timeout waiting for output from CGI script /usr/lib64/icinga/cgi/status.cgi
[Wed May 16 12:08:52 2012] [error] [client x.x.x.122] Script timed out before returning headers: status.cgi
[Wed May 16 12:08:52 2012] [warn] [client x.x.x.95] Timeout waiting for output from CGI script /usr/lib64/icinga/cgi/status.cgi
[Wed May 16 12:08:52 2012] [error] [client x.x.x.95] Script timed out before returning headers: status.cgi
[Wed May 16 12:08:52 2012] [warn] [client x.x.x.117] Timeout waiting for output from CGI script /usr/lib64/icinga/cgi/status.cgi
[Wed May 16 12:08:52 2012] [error] [client x.x.x.117] Script timed out before returning headers: status.cgi

Seems that on previous versions that the url would only return the problems, not sure what it's doing now.

Note we are monitoring 7000 hosts and 16100 services so we have a high load.

Attachments

Changesets

2012-05-18 19:44:19 +00:00 by ricardo 0f0722e

classic-ui: Fixed status.cgi time out when displaying hostgroups in large environments #2617

refs: #2617

sorry for this one.

This patch is dedicated to Donna Summer. rest in peace
@icinga-migration
Copy link
Author

Updated by mzac on 2012-05-16 16:14:34 +00:00

We just downgraded back down to 1.6.1 and the problem has gone away, looks like a bug

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2012-05-16 16:27:51 +00:00

  • Status changed from New to Feedback
  • Priority changed from Urgent to Normal

i can't reproduce that on my rhel 5.8 rpm build testbox.

status.cgi?hostgroup=all&style=hostdetail&hoststatustypes=12&hostprops=0
does not produce any results to me

status.cgi?host=all&type=detail&servicestatustypes=16&hoststatustypes=3&serviceprops=2097162&nostatusheader
gives me what i wanna see

but the logs are clean.

can you please debug the cgis on the shell, as described here?
https://wiki.icinga.org/display/testing/Icinga+Classic+UI+Testing

@icinga-migration
Copy link
Author

Updated by mzac on 2012-05-16 16:46:07 +00:00

[root@localhost cgi]# export REMOTE_USER="admin"; export REQUEST_METHOD=GET ; export QUERY_STRING="hostgroup=all&style=hostdetail&hoststatustypes=12&hostprops=0"; gdb ./status.cgi

(gdb) run
Starting program: /tmp/cgi/status.cgi
Cache-Control: no-store
Pragma: no-cache
Last-Modified: Wed, 16 May 2012 16:44:13 GMT
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Content-type: text/html; charset="utf-8"






Current Network Status



var refresh_rate=30;
var do_refresh=true;
var counter_seconds=refresh_rate;













jQuery.noConflict();
jQuery(document).ready(function() {
  jQuery('a.tips').cluetip({ajaxCache: false, dropShadow: false,showTitle: false });
});




<!-- SkinnyTip (c) Elliott Brueggeman -->

Gets stuck there and then after about a minute spits out the rest of the page correctly

Program exited normally.
(gdb) bt full
No stack.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2012-05-16 16:55:29 +00:00

cgi.cfg settings?

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2012-05-16 16:56:03 +00:00

  • Assigned to set to ricardo

@icinga-migration
Copy link
Author

Updated by mzac on 2012-05-16 17:00:46 +00:00

Here is my cgi.cfg, when I run status.cgi from 1.6.1 it runs right away, but 1.7.0 hangs. any gdb commands you can give me to see what's going on when it hangs?

main_config_file=/etc/icinga/icinga.cfg
physical_html_path=/usr/share/icinga
url_html_path=/
url_stylesheets_path=/icinga/stylesheets
http_charset=utf-8
show_context_help=0
highlight_table_rows=1
use_pending_states=1
use_logging=1
cgi_log_file=/var/log/icinga/gui/icinga-cgi.log
cgi_log_rotation_method=d
cgi_log_archive_path=/var/log/icinga/gui/archives
enforce_comments_on_actions=1
first_day_of_week=0
use_authentication=1
use_ssl_authentication=0
authorized_for_system_information=*
authorized_contactgroup_for_configuration_information=cgi_fullaccess
authorized_contactgroup_for_full_command_resolution=cgi_fullaccess
authorized_contactgroup_for_system_commands=cgi_fullaccess
authorized_contactgroup_for_all_services=cgi_fullaccess
authorized_contactgroup_for_all_hosts=cgi_fullaccess
authorized_contactgroup_for_all_service_commands=cgi_fullaccess
authorized_contactgroup_for_all_host_commands=cgi_fullaccess
show_all_services_host_is_authorized_for=1
show_partial_hostgroups=0
default_statusmap_layout=2
default_statuswrl_layout=4
ping_syntax=/bin/ping -n -U -c 5 $HOSTADDRESS$
refresh_rate=30
escape_html_tags=1
persistent_ack_comments=0
action_url_target=_blank
notes_url_target=_blank
lock_author_names=1
default_downtime_duration=7200
default_expiring_acknowledgement_duration=86400
status_show_long_plugin_output=0
display_status_totals=1
tac_show_only_hard_state=0
extinfo_show_child_hosts=1
suppress_maintenance_downtime=0
show_tac_header=1
show_tac_header_pending=1
tab_friendly_titles=1

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2012-05-16 17:07:50 +00:00

i was hoping to see a sigsegv which could be a cause for premature end of script headers. but timeouts cannot be traced with gdb .. strace might work better, or valgrind - maybe the cgi leaks memory in loop which causes everything else to suffer.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2012-05-16 17:09:40 +00:00

what os/distribution is that and which package source for icinga?

@icinga-migration
Copy link
Author

Updated by mzac on 2012-05-16 17:10:46 +00:00

strace from where it starts to hang and then starts up again, it's hanging during all those stats on /etc/localtime

write(1, "
) = 86
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
write(1, "
) = 34
write(1, "\t$(document).ready(function(){\n", 31        $(document).ready(function(){
) = 31
write(1, "\t\t$(\"table.status tr\").hover(fun"..., 45          $("table.status tr").hover(function( e ) {
) = 45
write(1, "\t\t\t$(this).find(\"td\").each(funct"..., 39                 $(this).find("td").each(function(){
) = 39
write(1, "\t\t\t\tif($(this).attr(\"class\")) {\n", 32                          if($(this).attr("class")) {
) = 32
write(1, "\t\t\t\t\t$(this).addClass(\"highlight"..., 39                                        $(this).addClass("highlightRow");
) = 39

@icinga-migration
Copy link
Author

Updated by mzac on 2012-05-16 17:20:07 +00:00

Compared to strace on 1.6.1:

write(1, "

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2012-05-16 17:24:54 +00:00

hm, a blind shot is the javascript compare when to actually shoot the refresh on the page, causing max calls to (possible unsafe) localtime calls.

could you try to change the refresh method back to http headers with the new cgi.cfg setting?

# REFRESH TYPE
# This option determines what type of refresh should be used.
# You can choose between http header and javascript. By
# default javascript (1) is activated. If you have trouble
# using javascript then try refresh via http header (0).

refresh_type=1

so refresh_type=0 with 1.7 then. maybe this is the root cause?

@icinga-migration
Copy link
Author

Updated by mzac on 2012-05-16 17:28:37 +00:00

No difference when setting refresh_type=0, same thing :(

dnsmichi wrote:

hm, a blind shot is the javascript compare when to actually shoot the refresh on the page, causing max calls to (possible unsafe) localtime calls.

could you try to change the refresh method back to http headers with the new cgi.cfg setting?

[...]

so refresh_type=0 with 1.7 then. maybe this is the root cause?

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2012-05-16 17:29:20 +00:00

  • Project changed from Icinga 1.x to 19

@icinga-migration
Copy link
Author

Updated by mzac on 2012-05-16 17:34:59 +00:00

Sorry didn't see this one:

Red Hat Enterprise Linux Server release 6.2 (Santiago)
Linux localhost 2.6.32-220.4.2.el6.x86_64 #1 SMP Mon Feb 6 16:39:28 EST 2012 x86_64 x86_64 x86_64 GNU/Linux

[root@localhost cgi]# yum info icinga
Name        : icinga
Arch        : x86_64
Version     : 1.7.0
Release     : 1ncs.el6
Size        : 366 k
Repo        : ncspkgs-arch

We built the RPM from source

dnsmichi wrote:

what os/distribution is that and which package source for icinga?

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2012-05-16 17:37:13 +00:00

pfuh. it would be interesting if we could get a static status.dat and objects.cache copy of yours and test that "real" against our cgis. if applicable, mailto michael.friedrich (at) univie.ac.at, gzipped. i might forward to ricardo as well, confidential as usual.

tomorrow is a holiday here, but one might have time to debug further.

@icinga-migration
Copy link
Author

Updated by mzac on 2012-05-16 17:45:36 +00:00

I'll check with my manager and if it's ok I'll send it off to you. I might have to zap out some lines in them of course.

dnsmichi wrote:

pfuh. it would be interesting if we could get a static status.dat and objects.cache copy of yours and test that "real" against our cgis. if applicable, mailto michael.friedrich (at) univie.ac.at, gzipped. i might forward to ricardo as well, confidential as usual.

tomorrow is a holiday here, but one might have time to debug further.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2012-05-16 20:13:16 +00:00

steps to reproduce:

$ cd path/to/icinga/icinga-core
$ mkdir cgi/mcgill

put the received status and objects.cache in there. then add 2 files

  • icinga.cfg containing the locations to logfile, status.dat, objects.cache - edit that accordingly.
log_file=/var/log/icinga/icinga.log
log_archive_path=/var/log/icinga/archives
object_cache_file=/home/dnsmichi/coding/icinga/icinga-core/cgi/mcgill/objects.cache
status_file=/home/dnsmichi/coding/icinga/icinga-core/cgi/mcgill/status.dat
  • cgi.cfg from default, but with a different icinga.cfg path
main_config_file=/home/dnsmichi/coding/icinga/icinga-core/cgi/mcgill/icinga.cfg

physical_html_path=/usr/share/icinga
url_html_path=/icinga
url_stylesheets_path=/icinga/stylesheets
http_charset=utf-8
show_context_help=0
highlight_table_rows=1
use_pending_states=1
use_logging=0
cgi_log_file=/var/log/icinga/gui/icinga-cgi.log
cgi_log_rotation_method=d
cgi_log_archive_path=/var/log/icinga/gui
enforce_comments_on_actions=0
first_day_of_week=0
use_authentication=1
use_ssl_authentication=0
authorized_for_system_information=icingaadmin
authorized_for_configuration_information=icingaadmin
authorized_for_full_command_resolution=icingaadmin
authorized_for_system_commands=icingaadmin
authorized_for_all_services=icingaadmin
authorized_for_all_hosts=icingaadmin
authorized_for_all_service_commands=icingaadmin
authorized_for_all_host_commands=icingaadmin
show_all_services_host_is_authorized_for=1
show_partial_hostgroups=0
default_statusmap_layout=5
default_statuswrl_layout=4
ping_syntax=/bin/ping -n -U -c 5 $HOSTADDRESS$
refresh_rate=90
refresh_type=1
escape_html_tags=1
persistent_ack_comments=0
action_url_target=main
notes_url_target=main
lock_author_names=1
default_downtime_duration=7200
default_expiring_acknowledgement_duration=86400
status_show_long_plugin_output=0
display_status_totals=0
tac_show_only_hard_state=0
extinfo_show_child_hosts=0
suppress_maintenance_downtime=0
show_tac_header=1
show_tac_header_pending=1
tab_friendly_titles=1

now tell the cgi the changed cgi.cfg location plus the other env vars, run it.

$ export ICINGA_CGI_CONFIG=mcgill/cgi.cfg ; export REMOTE_USER="icingaadmin"; export REQUEST_METHOD=GET ; export QUERY_STRING="hostgroup=all&style=hostdetail&hoststatustypes=12&hostprops=0"; ./status.cgi

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2012-05-16 20:16:57 +00:00

so indeed, it hangs doing an strace

read(3, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\t\0\0\0\t\0\0\0\0"..., 4096) = 1467
close(3)                                = 0
munmap(0x7fe626347000, 4096)            = 0
write(1, "Last-Modified: Wed, 16 May 2012 "..., 46Last-Modified: Wed, 16 May 2012 20:14:15 GMT
) = 46
write(1, "Expires: Thu, 01 Jan 1970 00:00:"..., 40Expires: Thu, 01 Jan 1970 00:00:00 GMT
) = 40
write(1, "Content-type: text/html; charset"..., 44Content-type: text/html; charset="utf-8"

) = 44
write(1, "\n", 7
)                 = 7
write(1, "\n", 7
)                 = 7
write(1, "
) = 78
write(1, "
) = 46
write(1, "
) = 68
write(1, "Current Network StatusCurrent Network Status
) = 38
write(1, "
) = 78
write(1, "
) = 78
write(1, "\n", 32<script type="text/javascript">
) = 32
write(1, "var refresh_rate=90;\n", 21var refresh_rate=90;
)  = 21
write(1, "var do_refresh=true;\n", 21var do_refresh=true;
)  = 21
write(1, "var counter_seconds=refresh_rate"..., 34var counter_seconds=refresh_rate;
) = 34
write(1, "\n", 10
)             = 10
write(1, "
) = 74
write(1, "
) = 78
write(1, "
) = 71
write(1, "
) = 75
write(1, "
) = 80
write(1, "\n", 8
)                = 8
write(1, "\n", 22
) = 22
write(1, "\n", 1
)                       = 1
write(1, "
) = 161
stat("/usr/share/icinga/ssi/common-header.ssi", 0x7fff31260900) = -1 ENOENT (No such file or directory)
stat("/usr/share/icinga/ssi/status-header.ssi", 0x7fff31260900) = -1 ENOENT (No such file or directory)
write(1, "\n", 1
)                       = 1
write(1, "
<!-- SkinnyTip (c) Elliott Brueggeman -->

) = 114
write(1, "
) = 86

then hogs /etc/localtime after a while, in 5 to 30 sec interval.

stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0

after ~2min, it's done with the privately provided status.dat and objects.cache.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2012-05-16 20:26:17 +00:00

some strace stats with -CT

)                = 8 <0.000156>
write(1, "\n", 8
)                = 8 <0.000165>
brk(0x244c000)                          = 0x244c000 <0.001765>
exit_group(0)                           = ?
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 62.14    0.005436           8       715           write
 22.82    0.001996           6       358           brk
 10.20    0.000892          81        11           munmap
  3.25    0.000284          28        10           close
  1.60    0.000140           7        19           mmap
  0.00    0.000000           0         3           read
  0.00    0.000000           0        10           open
  0.00    0.000000           0        81         4 stat
  0.00    0.000000           0        12           fstat
  0.00    0.000000           0         1           lseek
  0.00    0.000000           0         3           mprotect
  0.00    0.000000           0         3         3 access
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         1           arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00    0.008748                  1228         7 total

plus timing -CTttt

) = 75 <0.000168>
1337199809.244644 write(1, "
) = 80 <0.000157>
1337199809.245039 write(1, "\n", 8
) = 8 <0.000159>
1337199809.245466 write(1, "\n", 22
) = 22 <0.000171>
1337199809.245906 write(1, "\n", 1
)     = 1 <0.000130>
1337199809.246194 write(1, "
) = 161 <0.000160>
1337199809.246516 stat("/usr/share/icinga/ssi/common-header.ssi", 0x7fff062e0840) = -1 ENOENT (No such file or directory) <0.000044>
1337199809.246738 stat("/usr/share/icinga/ssi/status-header.ssi", 0x7fff062e0840) = -1 ENOENT (No such file or directory) <0.000121>
1337199809.247024 write(1, "\n", 1
)     = 1 <0.000059>
1337199809.247159 write(1, "
<!-- SkinnyTip (c) Elliott Brueggeman -->

) = 114 <0.000125>
1337199809.247464 write(1, "
) = 86 <0.000144>

....
1337199822.040933 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0 <0.000073>
1337199822.041252 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0 <0.000061>
1337199822.501231 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0 <0.000149>
1337199822.501713 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0 <0.000046>
1337199823.490602 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0 <0.000074>
1337199823.490871 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0 <0.000035>
1337199823.501815 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0 <0.000073>
1337199823.502128 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0 <0.000066>
1337199824.422490 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0 <0.000074>
1337199824.422758 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0 <0.000033>
1337199826.432763 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0 <0.000073>
1337199826.433030 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0 <0.000033>
1337199828.503915 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0 <0.000078>
1337199828.504236 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0 <0.000058>
1337199833.943258 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0 <0.000111>
1337199833.943694 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0 <0.000059>
1337199838.604888 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0 <0.000128>
1337199838.605255 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0 <0.000053>

...

Copyright (c) 2009-2012 Icinga Development Team -->
) = 161 <0.000019>
1337199908.543839 write(1, "\n", 8
) = 8 <0.000014>
1337199908.543886 write(1, "\n", 8
) = 8 <0.000013>
1337199908.594440 brk(0x2ee5000)        = 0x2ee5000 <0.002740>
1337199908.604821 exit_group(0)         = ?
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 37.12    0.002463           3       715           write
 30.71    0.002038           6       358           brk
 29.66    0.001968         179        11           munmap
  2.52    0.000167           2        81         4 stat
  0.00    0.000000           0         3           read
  0.00    0.000000           0        10           open
  0.00    0.000000           0        10           close
  0.00    0.000000           0        12           fstat
  0.00    0.000000           0         1           lseek
  0.00    0.000000           0        19           mmap
  0.00    0.000000           0         3           mprotect
  0.00    0.000000           0         3         3 access
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         1           arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00    0.006636                  1228         7 total

in performance regards, the status.cgi consumes a single core of my four.

start: 1337199809
end: 1337199908

99 seconds (!)

@icinga-migration
Copy link
Author

Updated by ricardo on 2012-05-16 21:57:53 +00:00

Hi,

can you switch "show_partial_hostgroups" to 1 and try again?

Thank you.

@icinga-migration
Copy link
Author

Updated by ricardo on 2012-05-16 23:09:11 +00:00

hi,

just to let you know. found the problem.

mzac it's your insane amount of hosts in hostgroups. Just delete some hosts and you are fine!

Just kidding!

The problem is in checking if a certain host belongs to a hostgroup to display. Have to change the processing as I already did for servicegroups.

Will provide a patch tomorrow night.

This occurs only in large environments.

And I will keep in mind to bug you to do some more tests after I changed/added something to Classic UI ;-)

Cheers Ricardo

@icinga-migration
Copy link
Author

Updated by mzac on 2012-05-17 00:49:34 +00:00

Believe it or not we have about half the number of hostgroups as we did before, so I was sure that wasn't the problem! :) We have set it up to be quite complex but it does help us drill down to a specific building and closet since we have a lot of buildings and closets spread around the McGill campus.

In terms of testing I'd be able to help for sure, the server we're currently running our Icinga instance on is a:

IBM x3650 M3
8 Xeon Cores @ 3.07 GHZ running HT (so looks like 16 cores)
24gb RAM

When I did the upgrade today I had about 20 people hitting the server with combination of the webgui and Nagstamon and all 16 cores went to 100% and the load average went up to 20!

I'm very happy you were able to find the problem. As for the patch, will you just release it as a patch or do you think it warrants a 1.7.1 release?

Thanks again!

ricardo wrote:

hi,

just to let you know. found the problem.

mzac it's your insane amount of hosts in hostgroups. Just delete some hosts and you are fine!

Just kidding!

The problem is in checking if a certain host belongs to a hostgroup to display. Have to change the processing as I already did for servicegroups.

Will provide a patch tomorrow night.

This occurs only in large environments.

And I will keep in mind to bug you to do some more tests after I changed/added something to Classic UI ;-)

Cheers Ricardo

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2012-05-17 10:31:12 +00:00

at least one week of tests on git and nightly builds. plus we need to collect more bugs possibly reported and to be resolved (already got one). either way, if you can share ressources for testing, your input is very welcome - please apply as icinga padawan for testing then info@icinga.org so things get organized bit better

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2012-05-17 11:12:40 +00:00

  • Category set to 52
  • Status changed from Feedback to Assigned
  • Target Version set to 1.7.1

@icinga-migration
Copy link
Author

Updated by ricardo on 2012-05-17 21:06:04 +00:00

  • File added status.cgi_hostgroup_fix.diff

Hi,

can you try this patch? It works for me, but I wanted to check if it works with your environment too.

Thanks a lot.

Cheers Ricardo

@icinga-migration
Copy link
Author

Updated by ricardo on 2012-05-18 09:08:20 +00:00

  • Subject changed from status.cgi timing out on 1.7.0 - classic_ui to status.cgi time out when displaying hostgroups in large environments

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2012-05-18 11:56:31 +00:00

$ git apply status.cgi_hostgroup_fix.diff
$ make cgis
$ cd cgi/
$ export ICINGA_CGI_CONFIG=mcgill/cgi.cfg ; export REMOTE_USER="icingaadmin"; export REQUEST_METHOD=GET ; export QUERY_STRING="hostgroup=all&style=hostdetail&hoststatustypes=12&hostprops=0";

$ time ./status.cgi
...
real    0m2.032s
user    0m1.956s
sys     0m0.049s

better :-D

i've prepared a merge base for you in dev/cgis with the latest stuff from the 1.7 release (next).

please put that patch after merge into your tree, on top of the recent ones. and make sure to collect now 1.7.1 patches, and then focus on the 1.8 tree.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2012-05-18 12:01:31 +00:00

  • Done % changed from 0 to 80

@icinga-migration
Copy link
Author

Updated by mzac on 2012-05-18 18:39:16 +00:00

Hi, just tested it out and it works perfectly when 'show_partial_hostgroups=0'. I saw a mention of this that it might be better to be 1, so I will change this in my config since I am limiting users to specific host groups.

I'm just wondering though why the default of this option is 0, would it not make more sense to have the default 1? or is it 0 so that we see everything?

Thanks, Zachary

ricardo wrote:

Hi,

can you try this patch? It works for me, but I wanted to check if it works with your environment too.

Thanks a lot.

Cheers Ricardo

@icinga-migration
Copy link
Author

Updated by ricardo on 2012-05-18 19:31:14 +00:00

Hi,

I'm glad it works. I'm really sorry that this slipped through. This is what you get if don't be consequent all the way.

But keep in mind, that "status.cgi?hostgroup=all" will only show hosts which are in a hostgroup. If a host is without any hostgroup it won't show up in this view. This was broken in Nagios for a looong time. Already thinking about adding a compatibility cgi option.

Hope you can now enjoy the features of 1.7 and this fix will hit 1.7.1.

When I switched "show_partial_hostgroups" to "1" it worked for me better. Thats why I asked to test it.

show_partial_hostgroups=0: user has to be authorized for the whole hostgroup to see any host. If the user is authorized for one or more host in the group, but not for the group itself, the user won't see any hosts in this group.

show_partial_hostgroups=1: if user is authorized for one host in the group, the user will see just this one and no other host in the group.

If you think we should set it to "1" by default, then we should open an Issue to discuss this.

Cheers Ricardo

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2012-06-12 16:12:03 +00:00

  • Status changed from Assigned to Resolved
  • Done % changed from 80 to 100

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2014-12-08 09:42:45 +00:00

  • Project changed from 19 to Core, Classic UI, IDOUtils
  • Category changed from 52 to Classic UI
  • Icinga Version set to 1
  • OS Version set to any

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant