Skip to content
This repository has been archived by the owner on Jan 15, 2019. It is now read-only.

[dev.icinga.com #1363] Oracle:cleanup_table_by_instance_time "no data" errors in syslog #599

Closed
icinga-migration opened this issue Mar 28, 2011 · 10 comments

Comments

@icinga-migration
Copy link

This issue has been migrated from Redmine: https://dev.icinga.com/issues/1363

Created by Tommi on 2011-03-28 19:54:05 +00:00

Assignee: Tommi
Status: Resolved (closed on 2011-05-05 14:57:49 +00:00)
Target Version: 1.4
Last Update: 2014-12-08 14:34:55 +00:00 (in Redmine)


Found tons of "errors" within my syslog, but suspect there is no error but really no data to clean. For me is no reason to pump this into syslog
Mar 28 21:30:59 host ido2db: ERROR: MSG 'ORA-01403: no data found '
Mar 28 21:30:59 host ido2db: ERROR: QUERY 'BEGIN clean_table_by_instance_time(:X1, :X2, :X3, :X4); END;'
catching this exception should be sufficient.

Changesets

2011-04-03 12:08:26 +00:00 by Tommi bd2d05b

recreate_icinga13_objects.sql: change exception handling in PL/SQL functions refs #1363

refs #1363

2011-05-03 20:59:45 +00:00 by Tommi 874e2ad

includes all patchesi and features in upgrade/*.sql
refs #1354,#1355,#1358,#1363,#1401,#1173
@icinga-migration
Copy link
Author

Updated by mfriedrich on 2011-03-28 20:25:32 +00:00

well i figured recently, that those errors where not resolved by just returning others in exception condition in the procedure.
but i'd like to know how to catch such an exception - because if in ido2db code, you might create a dictionary of error codes, which to spit out and which to ignore. i'd rather not want to do that.

@icinga-migration
Copy link
Author

Updated by Tommi on 2011-03-30 15:42:52 +00:00

 exception
    when no_data_found then null;

will do the job. Dont had these messages over the day.

Anyway, i cant see any advantage using a SP over conventional coding in this particular case, because it will only move the literals from oci to SP execute immediate string. The problems are the literals with a lot of different table names and timestamps which cant be avoided. Also they are to often execetuted.
For my opinion we can safely go to put cleanup back into the code, but start it only once a day(max once at hour) or offer a crontab job to do the all the cleanup to be scheduled by the user.

@icinga-migration
Copy link
Author

Updated by Tommi on 2011-03-30 15:53:05 +00:00

usually i will ignore the following return codes

$noerr=array(0,1403,24347);//No Error,No Data,Null in Aggregate

But with newer oracle versions or custom error codes it will be more when using these features.
I would like to suggest a central ido_oci_execute function covering execute and error checking/logging

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2011-03-30 16:02:38 +00:00

Tommi wrote:

[...]
will do the job. Dont had these messages over the day.

Anyway, i cant see any advantage using a SP over conventional coding in this particular case, because it will only move the literals from oci to SP execute immediate string. The problems are the literals with a lot of different table names and timestamps which cant be avoided. Also they are to often execetuted.

the different table names can't be avoided as it's depending on ~60 tables to be cleaned (some at startup, some regularly by time). as written in the other issue, putting those strings straight up into a query is not an option.

i thought of putting that into a procedure, allowing the DBA to tweak and fix that him/herself.

For my opinion we can safely go to put cleanup back into the code, but start it only once a day(max once at hour) or offer a crontab job to do the all the cleanup to be scheduled by the user.

cleaning interval has been reduced to 1hour per default in my branch, but the users still can set that to another level throughout the config. afterall, you are adjusting that for the rdbms you are using and it's not determined which one will be cleaned. at least on non optimized systems and automated cleanup would be sufficient. letting 4k services, checkinterval 5 minutes run against an oracle db will cause ~4mio rows / day. this is when you want regular housekeeping.
furthermore, housekeeping is being run as own thread, so the connection and such are independant of the rest.

on the logging/executing - if it's really necessary to put up some error codes to be ignored, define them globally within the header files, using it then somewhere around the code.

@icinga-migration
Copy link
Author

Updated by Tommi on 2011-03-30 17:40:25 +00:00

dnsmichi wrote:

Tommi wrote:
> [...]
> will do the job. Dont had these messages over the day.
>
> Anyway, i cant see any advantage using a SP over conventional coding in this particular case, because it will only move the literals from oci to SP execute immediate string. The problems are the literals with a lot of different table names and timestamps which cant be avoided. Also they are to often execetuted.

the different table names can't be avoided as it's depending on ~60 tables to be cleaned (some at startup, some regularly by time).
this i agree
as written in the other issue, putting those strings straight up into a query is not an option.

this i dont would say. I suggest a similar functionality of the procedure within the db*.c code with the sql generated with (a)sprintf and execute direct from oci. At the end your procedure will do exactly the same. We can stay with the procedure but i would point out there is no advantage to use a SP in this way.

i thought of putting that into a procedure, allowing the DBA to tweak and fix that
him/herself.

with "execute immediate" a dba wont be able to improve anything concerning the conditions you mentioned above

> For my opinion we can safely go to put cleanup back into the code, but start it only once a day(max once at hour) or offer a crontab job to do the all the cleanup to be scheduled by the user.

cleaning interval has been reduced to 1hour per default in my branch, but the users still can set that to another level throughout the config. afterall, you are adjusting that for the rdbms you are using and it's not determined which one will be cleaned. at least on non optimized systems and automated cleanup would be sufficient. letting 4k services, checkinterval 5 minutes run against an oracle db will cause ~4mio rows / day. this is when you want regular housekeeping.

Yes, of cource i will have cleaning, but not checking every minute 60 tables with x mio rows. If i have 100 mio rows within the table its better to check and delete 4 mio rows once a day instead of checking 100mio rows once per minute to find only 4k to delete.

furthermore, housekeeping is being run as own thread, so the connection and such are independant of the rest.

fine. We should modify at least the connect function to apply a module name to the session identified which job ist just doing the queries. It will help to assign bottlenecks to a particular sub function

on the logging/executing - if it's really necessary to put up some error codes to be ignored, define them globally within the header files, using it then somewhere around the code.

Do you like Logfiles with tousands of lines for an unimportand issue hiding these lines which are really important to care?

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2011-03-30 18:19:51 +00:00

Tommi wrote:

this i dont would say. I suggest a similar functionality of the procedure within the db*.c code with the sql generated with (a)sprintf and execute direct from oci. At the end your procedure will do exactly the same. We can stay with the procedure but i would point out there is no advantage to use a SP in this way.

let's say, you'll do it right away like you proposed on the oci side of life. then i'll do my work on the code (most likely fulfilling mysql and postgresql) and i change a global variable. should i then recheck everything for oracle, because there was an exception made just for oracle?
idoutils support 3 rdbms, it was hard to get there, and everything is being made possible to keep that way. we often ran in to things that just mysql worked, and the rest did not. so was the thing with only oracle worked, and libdbi did not. i don't want to see that happen over here. there are rules and conditions on the environment (idoutils, especially ido2db and handling the data from the socket into the db), which need to be met in the first place. if they are met, an exception likewise for functionality and/or performance reasons can be raised.

i don't see any benefit here, only that the solaris asprintf needs a review on the encoding. this might be a bug with higher priority too - did you check everything else too? there are quite a few asprintfs all around the code, also affecting other data, not only the tablenames themselves.

with "execute immediate" a dba wont be able to improve anything concerning the conditions you mentioned above

my dba's told me that execute immediate would be the best option for that conditions (explained above why) and my recherche encouraged me to do that in such away. what you can improve now - you can fix the no_data exception straight away. you don't need to compile or upgrade anything. that's mostly the point, why putting that onto the rdbms itsself.

Yes, of cource i will have cleaning, but not checking every minute 60 tables with x mio rows. If i have 100 mio rows within the table its better to check and delete 4 mio rows once a day instead of checking 100mio rows once per minute to find only 4k to delete.

it's not 60. it's just those which are provided within ido2db.cfg: max_\*_age, only historical related. when we had 4 mio rows, we disabled writing servicechecks either way, since it's a status only rdbms right on. and it will be replaced by postgresql soon either way.
check my ido branch and you will see that the trimming interval is set to 3600 as default. (idoutils does provide that to the user since the very first beginning, so it is and was possible for >1 year now to set that yourself :) i'm only working on that (including the threads) on a seperate branch. it's work in progress, and does not fit my maintainer work on core and classicui.

fine. We should modify at least the connect function to apply a module name to the session identified which job ist just doing the queries. It will help to assign bottlenecks to a particular sub function

the conninfo table gets the checkin inserted, differing in the name, if that would help in any way. that agent name is also kept within the idi struct.
the threading is on of my next todos, so i'd rather not touch that now if i were you.

Do you like Logfiles with tousands of lines for an unimportand issue hiding these lines which are really important to care?

no. i normally grep them. and no_data can be dangerous. so in order to allow each sysadmin to set that like he wants i'll go one step further, and propose a comma seperated cfg option for error codes to be ignored by oci.

@icinga-migration
Copy link
Author

Updated by Tommi on 2011-03-30 19:25:46 +00:00

dnsmichi wrote:

Tommi wrote:
> this i dont would say. I suggest a similar functionality of the procedure within the db*.c code with the sql generated with (a)sprintf and execute direct from oci. At the end your procedure will do exactly the same. We can stay with the procedure but i would point out there is no advantage to use a SP in this way.

let's say, you'll do it right away like you proposed on the oci side of life. then i'll do my work on the code (most likely fulfilling mysql and postgresql) and i change a global variable. should i then recheck everything for oracle, because there was an exception made just for oracle?
idoutils support 3 rdbms, it was hard to get there, and everything is being made possible to keep that way. we often ran in to things that just mysql worked, and the rest did not. so was the thing with only oracle worked, and libdbi did not. i don't want to see that happen over here. there are rules and conditions on the environment (idoutils, especially ido2db and handling the data from the socket into the db), which need to be met in the first place. if they are met, an exception likewise for functionality and/or performance reasons can be raised.

see my notes in #1362. i suggest a common application layer and a database specific data layer. Within the data layer all database specific issues like exceptions and error codes should be handled there. A specific function can raise specific error and some of the errors we can expect in a particular use cae(like no data while delete or duplicate key when doing "upserts"). Then we should handle expectable errors within the code (sql or oci func.) and there is no need to disturb the user. Maybe a duplicate key is not wanted in another case, then we have to raise the error.

i don't see any benefit here, only that the solaris asprintf needs a review on the encoding. this might be a bug with higher priority too - did you check everything else too? there are quite a few asprintfs all around the code, also affecting other data, not only the tablenames themselves.

the problem with the extra character on table name i have on a linux box. and is not related to this case.I know better than me howto handle strings in C

> with "execute immediate" a dba wont be able to improve anything concerning the conditions you mentioned above

my dba's told me that execute immediate would be the best option for that conditions (explained above why) and my recherche encouraged me to do that in such away. what you can improve now - you can fix the no_data exception straight away. you don't need to compile or upgrade anything. that's mostly the point, why putting that onto the rdbms itsself.

Yes, it is the only way to get it work - in sql. As i stated before, i can live with it only pointing out, there is no special benefit (saving oracle resources) over the "mysql" version

the conninfo table gets the checkin inserted, differing in the name, if that would help in any way. that agent name is also kept within the idi struct.
the threading is on of my next todos, so i'd rather not touch that now if i were you.

This is only a suggestion to make the oracle DBAs on this world happy. They will only see entries in v$session, not these in conninfo

no. i normally grep them. and no_data can be dangerous. so in order to allow each sysadmin to set that like he wants i'll go one step further, and propose a comma seperated cfg option for error codes to be ignored by oci.

see my notes before about expectable errors and real errors

@icinga-migration
Copy link
Author

Updated by Tommi on 2011-04-03 12:19:08 +00:00

  • Category set to 24
  • Status changed from New to Feedback
  • Assigned to set to Tommi
  • Done % changed from 0 to 100

ignore (only NO_DATA_FOUND) exception in functions
Commit bd2d05b

@icinga-migration
Copy link
Author

Updated by Tommi on 2011-05-05 14:57:49 +00:00

  • Status changed from Feedback to Resolved
  • Target Version set to 1.4

included in V1.4 upgrade scripts

@icinga-migration
Copy link
Author

Updated by mfriedrich on 2014-12-08 14:34:55 +00:00

  • Project changed from 18 to Core, Classic UI, IDOUtils
  • Category changed from 24 to IDOUtils

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant