Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev.icinga.com #9322] sending multiple Livestatus commands rejects all except the first #3038

Closed
icinga-migration opened this issue May 27, 2015 · 27 comments
Labels
area/livestatus Legacy interface bug Something isn't working
Milestone

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/9322

Created by sni on 2015-05-27 08:33:08 +00:00

Assignee: gbeutner
Status: Resolved (closed on 2015-10-16 13:14:51 +00:00)
Target Version: 2.3.11
Last Update: 2015-12-11 12:35:09 +00:00 (in Redmine)

Icinga Version: 2.3.3
Backport?: Already backported
Include in Changelog: 1

Livestatus normally accepts multiple commands at once and executes them ond after another. However, it seems like Icinga2 only reads the first command from the socket and discards all other commands. It would be nice to support multiple commands with a single connection.
See sni/Thruk#468 for some more details.

Changesets

2015-10-16 13:12:05 +00:00 by (unknown) a9f14f1

Fix query processing for Livestatus queries which use KeepAlive

fixes #9322

2015-10-16 13:16:48 +00:00 by (unknown) 6b9013c

Fix query processing for Livestatus queries which use KeepAlive

fixes #9322
@icinga-migration
Copy link
Author

Updated by iosif69 on 2015-05-28 07:44:03 +00:00

sni wrote:

Livestatus normally accepts multiple commands at once and executes them ond after another. However, it seems like Icinga2 only reads the first command from the socket and discards all other commands. It would be nice to support multiple commands with a single connection.
See sni/Thruk#468 for some more details.

the same behavior, Icinga2 (v2.3.4), thruk 1.88-4

@icinga-migration
Copy link
Author

Updated by mfrosch on 2015-05-28 08:27:36 +00:00

Could you give us a pointer to the code that formats and submits/writes the commands? Might be helpful!

@icinga-migration
Copy link
Author

Updated by sni on 2015-05-28 09:01:19 +00:00

It's mostly here: https://github.com/sni/Thruk/blob/master/lib/Thruk/Controller/cmd.pm#L680

So it's a string of commands, separated by two newlines. Something like this should work:

cmd:

COMMAND [1432803628] SCHEDULE_FORCED_SVC_CHECK;App 2;App 2;1432803628

COMMAND [1432803628] SCHEDULE_FORCED_SVC_CHECK;App 3;App 3;1432803628

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-05-28 09:24:51 +00:00

A single new-line acts as query termination indicator, which would explain the error message of closing the socket while your command still tries to write to the already closed socket.

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-06-18 09:14:29 +00:00

  • Status changed from New to Assigned
  • Assigned to set to gbeutner

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-06-23 13:26:54 +00:00

  • Target Version set to Backlog

@icinga-migration
Copy link
Author

Updated by tomwork on 2015-09-25 04:32:17 +00:00

Hi,

+1

I have the same problem (icinga 2.3.10 with Thruk 2.00-2).
Thruk is really a nice interface to have on top of icinga2. The icinga front-ends are not bad but we like Thruk better.

@icinga-migration
Copy link
Author

Updated by tomwork on 2015-10-06 06:51:51 +00:00

Hi,

Do you have any hints/pointers on how we can fix that?
It is a rather annoying bug.

Cheers

@icinga-migration
Copy link
Author

Updated by mittma on 2015-10-06 07:45:00 +00:00

Hi there!

Same problem here!
Thruk 1.88-4 and 2.00-2 with Icinga 1.13.3/livestatus 1.2.6p1 work great.
I like the new configuration language of Icinga 2, but Thruk is almost unusable due to this bug - at least you loose a lot of its advantages.
I don't want to drop Thruk. I'd rather migrate to Naemon or OMD instead.

Best regards.

PS: My programming experience in C** is completely rusted, but it seems to me that LivestatusListener::ClientHandler in livestatuslistener.cpp is the guilty one. There in the second for-Loop (reading each line) you have the following code:

            if (line.GetLength() > 0)
                lines.push_back(line);
            else
                break;

So, if the line has no content, reading stops. Stupid question: is it possible to use "continue" instead of "break"? It may work, since there is an additional check for Eof I think:

            if (srs == StatusEof)
                break;

Please don't be too harsh with me if that's complete nonsense, but I clutch at every straw, because I want to use Icinga 2 AND Thruk together!

@icinga-migration
Copy link
Author

Updated by tomwork on 2015-10-07 00:48:16 +00:00

Hi Mittma,

Well, I came with the same conclusion yesterday, like you I read the same code and I have the same problem about C** skills. If I find the time I will try to hack things around to try to make it work. Trial and error ;-)

Desperate, I also tried to tell thruk to only send one \n instead of two (so no empty lines) but that does not work either. Thruk being coded in Perl it was easier to try. Recompiling icinga2 is a different exercise, hopefully, there is a makefile to just rebuild the livestatus library instead of the whole shebang.

@icinga-migration
Copy link
Author

Updated by tomwork on 2015-10-07 00:51:17 +00:00

Oh, I forgot to say, I personally believe that the following is the culprit:

if (lines.empty())
            break;

Ref:

@icinga-migration
Copy link
Author

Updated by krausm on 2015-10-07 06:32:52 +00:00

Hi,

We are currently working on integrating Icinga2 within "OMD Labs Edition" (see https://labs.consol.de/de/omd/ for details), using Thruk as its main user interface.
You can try it out using the Testing-Repository: https://labs.consol.de/repo/testing/

Thereby sni found this issue, and we already discussed it with dnsmichi.

According to him, the culprit is here: https://github.com/Icinga/icinga2/blob/master/lib/livestatus/livestatuslistener.cpp#L195
Here the parser recognizes the newline and terminates reading from stream.

He outlined as a possible solution, to change the break and store the request into a map (request1, lines, request2, lines), that can afterwwards be processed within a loop.

Limiting factors currently are either C** skills or time ...

@icinga-migration
Copy link
Author

Updated by mittma on 2015-10-07 07:57:13 +00:00

Hi krausm,

thanks a lot for the info! The OMD Labs Edition surely looks promising and I will have a closer look on it. I didn't know that mod_gearman is possible with Icinga2...

Limiting factors currently are either C++ skills or time ...

Yes, both can be found rather sparsely...

@icinga-migration
Copy link
Author

Updated by tomwork on 2015-10-07 23:27:56 +00:00

All right, but... is this bug fixed in your OMD labs edition? It does not sound like it based on your comment.
So we are still at square 1, am I right?

@icinga-migration
Copy link
Author

Updated by krausm on 2015-10-08 08:03:21 +00:00

Unfortunately it is not fixed in OMD Labs Edition. If it were, we would have sent it upstream.
I am just struggling to prepare a fix, but it is not ready yet.

@icinga-migration
Copy link
Author

Updated by tomwork on 2015-10-09 01:09:56 +00:00

Maybe we can help?
Can you share your progress?

@icinga-migration
Copy link
Author

Updated by tomwork on 2015-10-09 01:19:37 +00:00

Below is the livestatus LQL language description. As we all already know, what Thruk does is correct. However icinga2 stops processing once it find an empty line which actually goes against the LQL "protocol". We can also imagine a quick and non performing hack on the Thruk side where each command is sent separately instead of a series of commands but that's an ugly workaround.

"Just as with GET, a query is terminated either by closing the connection or by sending a newline. COMMAND automatically implies keep alive and behave like GET when KeepAlive is set to on. That way you can mix GET and COMMAND quries in one connection. "

Ref: https://mathias-kettner.de/checkmk\_livestatus.html#H1:Sending%20commands%20via%20Livestatus

@icinga-migration
Copy link
Author

Updated by krausm on 2015-10-09 13:28:26 +00:00

TL;DR: my current progress is "It would seem to work, BUT..."

Long version:
I tried to change the position, where the livestatus query is done. Basically i pulled it into the loop. See the diff at my "playground"-fork here:
https://github.com/m-kraus/icinga2/compare/v2.3.10...m-kraus:bug9322

Thruk now seems to behave as expected in short tests, i.e. multiple commands work.

BUT, there are exceptions occurring in the debug log:

2015-10-09 13:09:44 +0000] critical/Socket: recv() failed with error code 9, "Bad file descriptor"
[2015-10-09 13:09:44 +0000] critical/ThreadPool: Exception thrown in event handler:
Error: std::exception
    (0) libbase.so: void boost::throw_exception(icinga::socket_error const&) (+0xbe) [0x7f2b9b5d9a4e]
    (1) libbase.so: void boost::exception_detail::throw_exception_(icinga::socket_error const&, char const*, char const*, int) (+0x40) [0x7f2b9b5d9b10]
    (2) libbase.so: icinga::Socket::Read(void*, unsigned long) (+0x18d) [0x7f2b9b5944fd]
    (3) libbase.so: icinga::NetworkStream::Read(void*, unsigned long, bool) (+0x1e) [0x7f2b9b59459e]
    (4) libbase.so: icinga::StreamReadContext::FillFromStream(boost::intrusive_ptr const&, bool) (+0x55) [0x7f2b9b5846c5]
    (5) libbase.so: icinga::Stream::ReadLine(icinga::String*, icinga::StreamReadContext&, bool) (+0x74) [0x7f2b9b59ee34]
    (6) liblivestatus.so: icinga::LivestatusListener::ClientHandler(boost::intrusive_ptr const&) (+0x142) [0x7f2b948d65b2]
    (7) libbase.so: icinga::ThreadPool::WorkerThread::ThreadProc(icinga::ThreadPool::Queue&) (+0x2da) [0x7f2b9b5baa7a]
    (8) libboost_thread-mt.so.1.53.0:  (+0xd24a) [0x7f2b9bfde24a]
    (9) libpthread.so.0:  (+0x7df5) [0x7f2b98c27df5]
    (10) libc.so.6: clone (+0x6d) [0x7f2b9913a1ad]

The livestatus query seems to terminate the stream, while someone still tries to read or write to/from it.

Does anyone have an idea, how to solve the stream handling? My knowledge about c** and boost is way too little, to be helpful here...

Greetings, Michael

@icinga-migration
Copy link
Author

Updated by tomwork on 2015-10-16 07:07:30 +00:00

Hi,

I tried your logic and yes it works but there is this 'bad file descriptor' error. I believe that's because it hits EOF and Stream::Readline still tries to read from an empty socket or something like that I guess. I believe that without this exception it may not exit the for loop anyway. The 'else break' is the for loop exit door. You hijack'd it, it works but I cannot understand why.

I tried different ways of my own, sadly, I couldn't make it work nicer. Also having these magical object does not help because there is a fair bit of logic in the stream readline as well so that could be where the problem comes from. Another way avoiding C** code changes would be to build a unixsock proxy to intercept thruk's '_bulk_send' commands and split them in different command so icinga2 sees them as separate COMMAND.

Also, I wanted to come back about something I said earlier, the LQL says "a query is terminated either by closing the connection or by sending a newline." That's what icinga2 does. So the bulk_send done by Thruk is incorrect or is meant to be managed via different connections on the receiving end then but I am no expert, just a livestatus newbie trying to have it a go.

To debug livestatus I came with the following to see what's going on.

\rm /tmp/tom ; socat -v  UNIX-LISTEN:/tmp/tom,fork,mode=777  UNIX-CLIENT:/var/run/icinga2/cmd/livestatus 2>&1 |grep -C1 COMMAND

I think I will stop there for now, I burnt a day trying different (random;) things. I may try another day or build a 'proxy'.

@icinga-migration
Copy link
Author

Updated by gbeutner on 2015-10-16 13:14:35 +00:00

Wow, you're going to hate the patch for this one - considering how much effort you've already put into this.

Icinga uses ReadLine() to read data from the socket. However, because we don't know exactly where the line ends we just read a 'few' bytes - and put whatever data we can't process right now (i.e. after the first line) into a data structure called StreamReadContext. The problem in this case is that the StreamReadContext variable is declared inside the for loop and destroyed after the first query, i.e. we're losing whatever data we've read from the stream. Oops.

The fix is to move the StreamReadContext variable outside of the loop.

@icinga-migration
Copy link
Author

Updated by Anonymous on 2015-10-16 13:14:51 +00:00

  • Status changed from Assigned to Resolved
  • Done % changed from 0 to 100

Applied in changeset a9f14f1.

@icinga-migration
Copy link
Author

Updated by gbeutner on 2015-10-16 13:17:09 +00:00

  • Target Version changed from Backlog to 2.3.11
  • Backport? changed from TBD to Yes

@icinga-migration
Copy link
Author

Updated by krausm on 2015-10-16 13:58:58 +00:00

Gunnar, i can confirm this fix works!
Thank you for your effort.

@icinga-migration
Copy link
Author

Updated by tomwork on 2015-10-19 00:55:45 +00:00

Thank you Gunmar for finding the time to look into the problem and fixing it. Much appreciated.
I would have never found it on my own ;-)
Cheers

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2015-10-19 08:48:37 +00:00

  • Subject changed from sending multiple commands rejects all except the first to sending multiple Livestatus commands rejects all except the first

@icinga-migration
Copy link
Author

Updated by mittma on 2015-10-19 09:01:45 +00:00

Great work and thanks a lot!
Now Icinga 2 is back in the race ;-)

Best regards,
Martin.

@icinga-migration
Copy link
Author

Updated by cmh on 2015-12-11 12:35:09 +00:00

I'm still seeing issues with this, unsure if it should be re-opened. If I select many services and apply a comment, most of them will get the comment, but there is usually one, sometimes more - at the end which doesn't get the comment. I've started copying the comment, selecting all that I want, and applying the comment, then seeing which ones didn't get it, selecting those, applying the comment again. Sometimes if I choose enough (>20?) at once I'll have to do this process 3-4 times.

@icinga-migration icinga-migration added bug Something isn't working area/livestatus Legacy interface labels Jan 17, 2017
@icinga-migration icinga-migration added this to the 2.3.11 milestone Jan 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/livestatus Legacy interface bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant