Showing posts with label Oracle. Show all posts
Showing posts with label Oracle. Show all posts

Thursday, November 27, 2014

Adjusted Block Counts on Partitioned Tables without Partition Level Stats and IN-LISTS


An article on 10053 Trace Files - Global Stats on Partitioned Tables by Doug Burns is a good introduction to my problem.  Doug has included excerpts from the 10053 trace covering a few variations, but one that appears incomplete is the case where there are:

  • Global stats
  • No partition-level stats
  • Single partition being queried
  • IN-LIST Query
In this situation I have a few differing results

This is what I don't want, a block count of 1, as that means a full table scan is always going to be better than an index scan.  A full table scan is perfectly fine *if* there is actually only 1 block, but there's actually whole lot  more - the optimizer got it wrong.  Worse, good old bind variable peeking means that this peeked full table scan plan will also get used on partitions that contain lots of data.


***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: INVOICE  Alias: INVOICE  (Using composite stats)
  (making adjustments for partition skews)
  ORIGINAL VALUES::    #Rows: 343372313  #Blks:  8296936  AvgRowLen:  142.00
  PARTITIONS::
  PRUNED: 1
  ANALYZED: 0  UNANALYZED: 1
    #Rows: 343372313  #Blks:  1  AvgRowLen:  142.00

My findings though are somewhat uncertain.

I have a test table with 3 partitions (two large, one small), a locally partitioned index on the partitioning key, and global stats.  Then I query the small partition with

SELECT * FROM test WHERE id in (2000001, 2000002)

Variation 1 (as above): The optimizer adjusts to use 1 block.

Variation 2: If I create the test table, then modify it (changing the partitioning key to NOT NULL), then the optimizer appears to adjust to use the actual number of blocks in the segment.

Variation 3: If I create the test table with a NOT NULL partitioning key, the optimizer adjusts to use 1 block.

I haven't been able to work out how why variation 2 is different.  This is the result I want ALWAYS.
Peter.

Tuesday, June 11, 2013

Streams Pool only for Streams? Think Again!

Here's some enlightenment from Arup Nanda.

The Arup Nanda Blog: Streams Pool is only for Streams? Think Again!

The streams pool isn't only used for streams.  It's also used by the following.

  • Data Pump
The streams pool will be dynamically allocated, defaulting to 10% of the shared pool size, and stealing from the buffer cache.

So... what else uses the streams pool ?  We have a 11.2.0.3 database where the streams pool is dynamically allocated on startup by "something".  That "something" is the same process as identified by this log message, identified in the session trace by an sqlplus session.

Alert Log

  • [30114] Successfully onlined Undo Tablespace 2.
Session Trace
  • *** MODULE NAME:(sqlplus ...




Friday, July 20, 2012

What's the meaning of the asterisk in Oracle's ORATAB file?


The perplexing asterisk is something Oracle DBAs have all seen when editing /etc/oratab for years.  But how many are old enough to know what it means, and more importantly if it is still required, and what it should be set to, especially when multiple ORACLE_HOMEs are used.

It's somewhat pleasing that I haven’t seen it in any installations lately but it remains on some machines where prior Oracle versions were installed and have been upgraded.

The best suggestion I found from searching the web was to look in the oraenv script.  From the header of the script…  An asterisk '*' can be used to refer to the NULL SID.  So now you know. J  And if your're like me you’re still just as confused.  Join the club.  Nothing in the oraenv script actually does anything with it.

There is some further information in the dbhome script.  It actually uses it, and when explicitly passed a "" (i.e. dbhome "") then you are returned the ORACLE_HOME matching the line with the asterisk.   But oraenv won't accept the "" so it’s pretty academic.

A web search for “NULL SID” offers some better insights.  It returned Oracle 7 document where it stated that a NULL SID was not supported by SQLNET V2.  So it looks like it was something use back in the very early oracle versions before the ORACLE_SID was introduced to identify a database, and something that hasn't been used for years.

So, what should you do when you find a line with an asterisk in an oratab file?  Remove it and stop the confusion.

Wednesday, July 4, 2012

Oracle: Impacts of partition maintenance in RAC

Problem: Excessive gc cr block 2-way or db file sequential read waits against TABPART$ during an insert after adding/dropping a partition in another instance on a table with 10,000 partitions.  

Analysis: Rather surprisingly, the waits aren't for the TABPART$ blocks that needed to be reloaded but the segment header blocks of all the partitions.  These were being loaded in CR mode during query parsing.

After changing the table partitioning it make sense that Oracle would be flushing the TABPART$ information for that table from the library cache of the other instances.  The libarary cache on the instance performing partition maintenance doesn't get purged as this is updated as part of the operation.   On other instances, the next operation on that table then needs to re-populate the library cache from information in TABPART$ (which it does for ALL partitions even if only one is required).  As part of reading from TABPART$, the segment headers of all partitions need to be inspected (not quite sure why).  For 10,000 partitions, there are at least 10,000 physical reads (1 minute @ 6ms per read) or block transfers (5 to 15 seconds @ 0.5 to 1.5ms per transfer) required.

Once the library cache is re-loaded, subsequent statements are fast (until the next partition maintenance operation).

The flush of partition information from the library cache cannot be avoided.

The wait time can be reduced by ensuring the blocks are available on one instance in current (CUR) mode.  Segment headers can't be retained in a KEEP pool, so the only way to make them available is to load them on an idle instance.

The wait time can also be reduced by pre-loading the segment headers in current mode on all instances impacted, however these blocks will not be frequently accessed hence may quickly drop out of the cache.

Loading segment header blocks in current mode is necessary as blocks in consistent (CR)  mode cannot be re-used..  Segment header block can reside in multiple instances in current mode at the same time.

When the segment headers are loaded as part of the TABPART$ query, they are loaded on consistent  mode.  Forcing them into current mode can only done by retrieving rows from each of the partitions.  This can be done using a FULL TABLE SCAN of the table with the SAMPLE option with a small number.  (If anyone knows a better way that doesn't require parsing 10,000 statements with partition clauses or bind/executes, I'm all ears.)  This operation needs to be run twice, as the first time the segment headers are retrieved in consistent mode.

Workaround/Solution


1. Use an IDLE node to hold current copies of segment header blocks

2. Pre-load segment header blocks on all active instances


Supporting Queries


To inspect the segment header blocks



col object_name for a10
col subobject_name for a10
col status for a5
select file#,block#,class#,bh.status, object_name, subobject_name
  from v$bh bh
  join dba_objects obj
    ON (objd = data_object_id)
 where object_name in ('TEST')
   and class# = 4
 order by object_name, subobject_name
/

To load segment header blocks in current mode

select /*+ FULL(test) */ count(*) from test sample (0.00001);


Tuesday, February 7, 2012

Concurrent Oracle 11.2.0.2 32-bit and 64-bit ODBC on Windows 7 x64

Whilst installing the latest Oracle InstantClient 11.2.0.2 with ODBC on Windows 7 I've discovered something new.

=> Both the 32-bit and 64-bit drivers can co-exist.  A single User Data Source can be used for both 32-bit and 64-bit applications.  This means they can be maintained with the default 64-bit ODBC Data Source Administrator.  No more switching between the 32-bit and 64-bit ODBC Administrators.

Why does this work?  

  1. The ODBC Data Source entries for the User DSN are stored in a shared location in the registry (HKCU/Software/ODBC).  Items in this location are not redirected for 32-bit to the Wow6432Node tree.
  2. The ODBC Driver entries are stored in a redirectable location in the registry (HKLM/Software/ODBC).  Items in this location redirect to the Wow6432Node tree (HKLM/Software/Wow6432Node/ODBC).
  3. The Driver names for both 32-bit and 64-bit entries are the same.  Hence, applications will obtain the data source information from the shared location, then obtain the driver location from the redirectable entries.
  4. Whilst the Driver location is specified in the shared DSN entries, there is no reliance on that location.  There is a fallback to the ODBC Driver entries in the registry.
The configuration


I have the 64-bit driver installed in c:\oracle\instantclient_11_2_64, and the 32-bit driver installed in c:\oracle\instantclient_11_2.  The driver name is "Oracle in instantclient_11_2".

When doesn't it work?


This will not work for System Data Source entries as these are stored in the redirectable location (HKLM/Software/ODBC).

Monday, July 18, 2011

Oracle ACFS Troubleshooting (11.2.0.2)


Todays efforts to create an ACFS volume on a linux cluster haven't been smooth – but finally there is success.

Problem 1: ASMCA was slow to startup.
Running a ps and searching for the pid of asmca process showed RSH tests (i.e. /usr/bin/rsh hostB /bin/true).  But RSHD is not running on this cluster, and these attempts take a while to timeout.  The problem was the prior SSH connection attempt was failing.  It needed to be primed with a command line connection (ssh hostB) due to a key change.

Problem 2: ACFS volumes didn't enable on remote nodes
Creating the ACFS volume and filesystem using ASMCA was successful, however the volumes were only enabled successfully on the primary node of the cluster (i.e. hostA).  All other nodes returned ORA-15477 (Cannot communicate with the volume driver).

The volume driver was running:
> acfsdriverstate
ACFS-9206: usage: acfsdriverstate [-orahome ] [-s]
> acfsdriverstate installed
ACFS-9203: true
> acfsdriverstate loaded
ACFS-9203: true
> acfsdriverstate version
ACFS-9325:     Driver OS kernel version = 2.6.18-8.el5(x86_64).
ACFS-9326:     Driver Oracle version = 100804.1.
> acfsdriverstate supported
ACFS-9200: Supported

On at least one of the nodes, the mount directory had the incorrect group (this cluster uses the legacy dba group rather than oinstall):
> chmod dba /u04

Re-installing ACFS allowed further progress.  A few web searches resulted in similar scenarios requiring a re-install after every node boot.  The root cause was unknown.  time will tell whether this is a necessary workaround for this cluster.
> sudo su
> acfsroot install

Enabling of the volumes was then successful
> sudo su – oracle
> . oraenv << "+ASM2"
> asmcmd volenable -a

ASMCA was successful mounting volumes on all but one node.  The final node needed to be manually mounted.
> sudo su
> mount –t acfs /dev/asm/acfs1-351 /u04

Thought of the day/week: Working with computers is rarely boring but frequently frustrating.  They can be as unpredictable as people.

Friday, July 1, 2011

Hidden and Undocumented "Cardinality Feedback"

I just stumbled across this in an 11gR2 environment. Initially I though it was part of Adaptive Cursors Sharing and Bind Awareness, but no. The impacted queries have not been flagged as bind sensitive or bind aware. The DBMS_XPLAN cursor plan includes the note "cardinality feedback used for this statement".

A quick search found this article...

We Do Streams: Hidden and Undocumented "Cardinality Feedback"

There is an article written on this topic by the CBO dev team:
http://blogs.oracle.com/optimizer/entry/cardinality_feedback


During the first execution of a SQL statement, an execution plan is generated as usual. During optimization, certain types of estimates that are known to be of low quality (for example, estimates for tables which lack statistics or tables with complex predicates) are noted, and monitoring is enabled for the cursor that is produced. If cardinality feedback monitoring is enabled for a cursor, then at the end of execution, some of the cardinality estimates in the plan are compared to the actual cardinalities seen during execution. If some of these estimates are found to differ significantly from the actual cardinalities, the correct estimates are stored for later use. The next time the query is executed, it will be optimized again, and this time the optimizer uses the corrected estimates in place of its usual estimates.

We can tell that cardinality feedback was used because it appears in the note section of the plan. Note that you can also determine this by checking the USE_FEEDBACK_STATS column in V$SQL_SHARED_CURSOR.


See also the VLDB document and presentation.

Plus Jonathan Lewis post on Cardinality Feedback.

This behaviour can be controlled by _optimizer_use_feedback.

Wednesday, April 27, 2011

KGH: NO ACCESS

Finally there is information on the KGH: NO ACCESS shared pool component appearing on BLOGs and at Oracle Support.

Monday, August 23, 2010

Oracle Block Remastering

Here's good arcticle on dynamic remastering.

Remastering granularity
  • Oracle 10g-11g.  Block range (128 blocks in a range)

Queries:


select kj.*, le.le_Addr from (
select kjblname, kjblname2, kjblowner, kjblmaster, kjbllockp,
substr ( kjblname2,  instr(kjblname2,',')+1,   instr(kjblname2,',',1,2)-instr(kjblname2,',',1,1)-1)/65536 fl,
substr ( kjblname2, 1, instr(kjblname2,',')-1) blk
 from x$kjbl
) kj, x$le le
where le.le_kjbl = kj.kjbllockp
 order by le.le_addr
/


select * from x$object_affinity_statistics
/

select * from v$gcspfmaster_info
/

Remasting Technique

  • With object remastering feature, if an object is accessed by an instance aggressively, then that instance will become the master of the object reducing gc remote grants improving performance of the application. In the prior sentence, I used the word “accessed”, but it is a loose term, and the correct term is if the instance is requesting much BL locks on an object, then that object can be remastered. 
  • Instance GRD is frozen during reconfiguration and in a very busy instances, this can take many seconds leading to instance freeze for several seconds.
Internal Parameters
  • _gc_affinity_limit - the number of "opens" before an object is considered for remastering (defaults to 50)
  • _gc_affinity_time - how often the queue is checked for remastering (defaults to 10 minutes)
  • _gc_affinity_minimum - minimum amount of dynamic affinity activity per minute to be a candidate for remastering (defaults to 600/minute/cpu)
  • _gc_undo_affinity - controls whether dynamic undo remastering is enabled
Evidence of remastering performance problems
  • Wait events: "gcs drm freeze", "gc remaster", "gcs freeze"
  • Log entries at the same time "DRM start"

Tuesday, June 1, 2010

Oracle Clusterware Failures: Useful Logs

To determine why a node failed, try the following clusterware logs (in order of usefulness):

  • alert log
  • ocssd log
  • evm log

Oracle Clusterware Parameters: misscount, disktimeout, reboottime

Clusterware timeout parameters:
  • misscount - It represents maximum time in seconds that, a heartbeat can be missed before entering into a cluster reconfiguration to evict the node. 
  • disktimeout - It is the maximum amount of time allowed for a voting file I/O to complete; if this time is exceeded the voting disk will be marked as offline. 
  • reboottime - It is the amount of time allowed for a node to complete a reboot after the CSS daemon has been evicted. 
 Default values for these parameters are as follows:
  • misscount = 60 seconds
  • disktimeout = 200 seconds
  • reboottime = 3 seconds 
Commands to check / modify CSS parameters:
  • crsctl get css misscount ---------- to check misscount value 
  • crsctl get css disktimeout --------- to check disktimeout value 
  • crsctl get css reboottime ---------- to check reboottime value 
  • crsctl set css misscount 120 --------- to set misscount to 120 seconds 
  • crsctl set css disktimeout 200 ------- to set disktimeout to 200 seconds
  • crsctl set css reboottime 3 ----------- to set reboottime to 3 seconds 
Only non-default values are only returned from get calls above. To confirm the default values, look in ocssd.log for the following:
lssnmNMInitialize: misscount set to (30)

clssnmNMInitialize: Network heartbeat thresholds are: impending reconfig 15000 ms, reconfig start (misscount) 30000 ms

clssgmInitCMInfo: Wait for remote node termination set to 13 seconds
clssnmNMInitialize: misscount set to (60), impending reconfig threshold set to (56000)
clssnmNMInitialize: diskShortTimeout set to (57000)ms
clssnmNMInitialize: diskLongTimeout set to (200000)ms

clssnmHandleUpdate: diskTimeout set to (200000)ms

Friday, November 27, 2009

[Oracle] Consistent gets from cache (fastpath)

Looking for information on "consistent gets from cache (fastpath)", google found this great blog entry.
THE Q U A D R O BLOG: Consistent gets from cache (fastpath)
In cases I've seen the number of consistent gets between 10g and 11g are the same. This equates to a comment by Roderick in the above post.
Roderick said...
FYI. The reduction in consistent gets is actually due to a separate internal optimization. The fastpath gets allows those fewer consistent gets to use a shorter code path.
In my scenario, there were two identical plans and an identical number of consistent gets. In 11g it ran more than twice as fast.

10g:
Plan hash value: 4121601496
------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                           | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------------------------------------------------------------------
|*  1 |  FILTER                      |                                |    142 |        |       |            |          |    100K|00:02:32.29 |      20M|     57 |
|   2 |   TABLE ACCESS BY INDEX ROWID| CUSTOMER_NODE_HISTORY          |    142 |      2 |    98 |     5   (0)| 00:00:01 |   2840K|00:00:11.74 |    2777K|      0 |
|*  3 |    INDEX RANGE SCAN          | P_CUSTOMER_NODE_HISTORY        |    142 |      2 |       |     2   (0)| 00:00:01 |   2840K|00:00:02.84 |   10542 |      0 |
|*  4 |   TABLE ACCESS BY INDEX ROWID| PRODUCT_INSTANCE_HISTORY       |   2840K|      1 |    30 |   199   (3)| 00:00:01 |    100K|00:02:07.75 |      17M|     57 |
|   5 |    AND-EQUAL                 |                                |   2840K|        |       |            |          |    100K|00:01:59.76 |      17M|     54 |
|*  6 |     INDEX RANGE SCAN         | I_PRODUCT_INSTANCE_HISTORY_CN  |   2840K|      6 |       |     3   (0)| 00:00:01 |   3118K|00:00:26.55 |    8998K|      0 |
|*  7 |     INDEX RANGE SCAN         | I_PRODUCT_INSTANCE_HISTORY_PRD |   2840K|  72000 |       |   195   (3)| 00:00:01 |    347K|00:01:24.34 |    8844K|     54 |
------------------------------------------------------------------------------------------------------------------------------------------------------------------
consistent gets from cache 23,100,146
11g:
Plan hash value: 4121601496
---------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                           | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------------------------------------------------- |   0 | SELECT STATEMENT             |                                |    142 |        |       |   228 (100)|          |    100K|00:01:08.66 |      20M| |*  1 |  FILTER                      |                                |    142 |        |       |            |          |    100K|00:01:08.66 |      20M| |   2 |   TABLE ACCESS BY INDEX ROWID| CUSTOMER_NODE_HISTORY          |    142 |      2 |    98 |     5   (0)| 00:00:01 |   2840K|00:00:09.16 |    2796K| |*  3 |    INDEX RANGE SCAN          | P_CUSTOMER_NODE_HISTORY        |    142 |      2 |       |     2   (0)| 00:00:01 |   2840K|00:00:01.22 |   10684 | |*  4 |   TABLE ACCESS BY INDEX ROWID| PRODUCT_INSTANCE_HISTORY       |   2840K|      1 |    30 |   199   (3)| 00:00:02 |    100K|00:00:47.42 |      17M| |   5 |    AND-EQUAL                 |                                |   2840K|        |       |            |          |    100K|00:00:45.24 |      17M| |*  6 |     INDEX RANGE SCAN         | I_PRODUCT_INSTANCE_HISTORY_CN  |   2840K|      6 |       |     3   (0)| 00:00:01 |   3118K|00:00:20.43 |    8998K| |*  7 |     INDEX RANGE SCAN         | I_PRODUCT_INSTANCE_HISTORY_PRD |   2840K|  72000 |       |   195   (3)| 00:00:01 |    347K|00:00:21.17 |    8854K| --------------------------------------------------------------------------------------------------------------------------------------------------------- consistent gets from cache 23,012,794 consistent gets from cache (fastp 10,022,007