Monitoring Performance by Analyzing GCS and GES Statistics
In order to determine the amount of work and cost related to inter-instance messaging and contention, examine block transfer rates, remote requests made by each transaction, the number and time waited for global cache events as described under the following headings:
• Analyzing Cache Fusion Impact in Real Application Clusters
• Analyzing Performance Using GCS and GES Statistics
Analyzing Cache Fusion Impact in Real Application Clusters
The effect of accessing blocks in the global cache and maintaining coherency is represented by
• The Global Cache Service statistics for current and cr blocks, for example, gc current blocks received, gc cr blocks received, and so on)
• The Global Cache Service wait events, for gc current block 3-way, gc cr grant 2-way, and so on.
The response time for cache fusion transfers is determined by the messaging and processing times imposed by the physical interconnect components, the IPC protocol and the GCS protocol. It is not affected by disk I/O factors other than occasional log writes. The cache fusion protocol does not require I/O to data files in order to guarantee cache coherency and RAC inherently does not cause any more I/O to disk than a non-clustered instance.
Analyzing Performance Using GCS and GES Statistics
Monitor Global Cache Service performance by identifying data blocks and objects which are frequently used ("hot") by all instances. High concurrency on certain blocks may be identified by Global Cache Service wait events and times.
The following wait events indicate that the access to cached data blocks was held up because they were busy either in the remote or the local cache, respectively:
• gc current block busy
• gc current block 2-way busy
• gc current block 3-way busy
• gc cr block 2-way busy
• gc cr block 3-way busy
This means that the blocks were pinned or held up by sessions or delayed by a log write on a remote instance (for example, gc current, cr 2-way busy, or cr 3-way busy), or that a session on the same instance is already accessing a block which is in transition between instances and the current session needs to wait behind it (for example, gc current block busy).
The V$SESSION_WAIT view to identify objects and data blocks with contention. The gc wait events contain the file and block number for a block request in p1 and p2, respectively.
An additional segment statistic, gc buffer busy, has been added to quickly determine the "busy" objects without recourse to the query on V$SESSION_WAIT mentioned earlier.
The AWR infrastructure provides a view of active session history which can also be used to trace recent wait events and their arguments. It is therefore useful for hot block analysis.
Most of the reporting facilities used by AWR and Statspack contain the object statistics and cluster wait class category, so that sampling of the views mentioned earlier is largely unnecessary.
It is advisable to run ADDM on the snapshot data collected by the AWR infrastructure to obtain an overall evaluation of the impact of the global cache. The advisory will also identify the busy objects and SQL highest cluster wait time.
Analyzing Cache Fusion Transfer Impact Using GCS Statistics
Monitor Global Cache Service performance by identifying objects read and modified frequently and the service times imposed by the remote access. Waiting for blocks to arrive may constitute a significant portion of the response time, in the same way that reading from disk could increase the block access delays, only that cache fusion transfers in most cases are faster than disk access latencies.
The following wait events indicate that the remotely cached blocks were shipped to the local instance without having been busy, pinned or requiring a log flush:
• gc current block 2-way
• gc current block 3-way
• gc cr block 2-way
• gc cr block 3-way
The object statistics for gc current blocks received and gc cr blocks received enable quick identification of the indexes and tables which are shared by the active instances. As mentioned earlier, creating an ADDM analysis will, in most cases, point you to the SQL statements and database objects that could be impacted by inter-instance contention.
Note:
You must run Statspack at level 7 to collect statistics related to block contention and segment block waits.
Any increases in the average wait times for the events mentioned earlier could be caused by the following:
• High load: CPU shortages, long run queues, scheduling delays
• Misconfiguration: using public instead of private interconnect for message and block traffic
If the average wait times are acceptable and no interconnect or load issues can be diagnosed, then the accumulated time waited can usually be attributed to a few SQL statements which need to be tuned to minimize the number of blocks accessed.
The column CLUSTER_WAIT_TIME in V$SQLAREA represents the wait time incurred by individual SQL statements for global cache events and will identify the SQL which may need to be tuned.
Analyzing Response Times Based on Wait Events
Most global cache wait events that show a high total time as reported in the AWR and Statspack reports or in the dynamic performance views are normal and may present themselves as the top database time consumers without actually indicating a problem. This section describes the most important and frequent wait events that you should be aware of when interpreting performance data.
If user response times increases and a high proportion of time waited is for global cache (gc), then the cause should be determined. Most reports include a breakdown of events sorted by percentage of the total time.
It is useful to start with an ADDM report, which would analyze the routinely collected performance statistics with respect to their impact and point to the objects and SQL contributing most to the time waited, and then move on to the more detailed reports produced by AWR and Statspack.
The most important wait events for RAC include various categories, such as:
• Block-oriented
• gc current block 2-way
• gc current block 3-way
• gc cr block 2-way
• gc cr block 3-way
Message-oriented
• gc current grant 2-way
• gc cr grant 2-way
Contention-oriented
• gc current block busy
• gc cr block busy
• gc current buffer busy
Load-oriented
• gc current block congested
• gc cr block congested
The block-oriented wait event statistics indicate that a block was received as either the result of a 2-way or a 3-way message, that is, the block was sent from either the resource master requiring 1 message and 1 transfer, or was forwarded to a third node from which it was sent, requiring 2 messages and 1 block transfer.
These events are usually the most frequent in the absence of block contention and the length of the wait is determined by the time it takes on the physical network, the time to process the request in the serving instances and the time it takes for the requesting process to wake up after the block arrives.
The average wait time and the total wait time should be considered when being alerted to performance issues where these particular waits have a high impact. Usually, either interconnect or load issues or SQL execution against a large shared working set can be found to be the root cause.
The message-oriented wait event statistics indicate that no block was received because it was not cached in any instance. Instead a global grant was given, allowing the requesting instance to read the block from disk or modify it.
If the time consumed by these events is high, then it may be assumed that the frequently executed SQL causes a lot of disk I/O (in the event of the cr grant) or that the workload inserts a lot of data and needs to find and format new blocks frequently (in the event of the current grant).
The contention-oriented wait event statistics indicate that a block was received which was pinned by a session on another node, was deferred because a change had not yet been flushed to disk or because of high concurrency, and therefore could not be shipped immediately. A buffer may also be busy locally when a session has already initiated a cache fusion operation and is waiting for its completion when another session on the same node is trying to read or modify the same data. High service times for blocks exchanged in the global cache may exacerbate the contention, which can be caused by frequent concurrent read and write accesses to the same data.
The load-oriented wait events indicate that a delay in processing has occurred in the GCS, which is usually caused by high load, CPU saturation and would have to be solved by additional CPUs, load-balancing, offloading processing to different times or a new cluster node.For the events mentioned, the wait time encompasses the entire round trip from the time a session starts to wait after initiating a block request until the block arrives.