I.癥狀:
業務側反饋應用緩慢,我方發現數據庫緩慢,并伴隨有LATCH:ROW CACHE OBJECTS等待事件,數據庫CPU過高。
II.解決方案:
通過如下分析是由于故障時段有大量硬解析,硬解析需要去獲取數據字典資源,這需要獲得latch,若硬解析量太大,會造成嚴重的latch爭用,占用非常多的資源,導致CPU使用過高,從而表現出數據庫ROW CACHE OBJECT等待事件,最終導致數據庫系統緩慢。
根據分析,我方有如下2點建議:
1. 對業務側進行SQL整改,將未使用綁定變量改成綁定變量,成為軟解析,減少獲取數據字典的次數,從而降低LATCH爭用次數,從根本解決問題(推薦)
2. 對數據庫參數cursor_sharing進行調整,可部分緩解這個問題,但不能從根本上解決問題。
III.分析過程:
1.數據庫存在LATCH等待情況,看到這幾個等待事件,這個數據庫的內存很大,我懷疑是硬解析或者熱塊爭用有關;
2.SGA使用率,Shared pool還有17G,排除內存不足
select POOL,BYTES/1024/1024 FREE_MB from v$sgastat a where a.NAME like 'free%';
3. 通過AWR發現Parse CPU to Parse Elapsed為42%,Non-Parse CPU為57%,正常情況下98%以上,該值表示SQL解析時間占比,越高越好,如果過低說明解析中等待資源時間太長。軟解析比例也過低,正常在98%以上。應該就是硬解析的問題了
4. 通過分析數據庫內部資源,發現數據字典的各種LATCH爭用過高,(GETS表示請求該資源的次數,MISSES表示請求失敗重新請求的次數,SLEEPS表示請求失敗進入睡眠隊列),成功率最底12%,正常應為98%。
查詢LATCH 的GET量,以及成為率
col LATCH_NAME for a20
SELECT a.addr,a.latch#,a.child#,a.level#,a.name LATCH_NAME,a.gets,a.misses,round((1-a.misses/a.gets)*100,2) SUCESS_PCT,a.sleeps
FROM v$latch_children a
WHERE a.name='row cache objects' AND a.gets <>0
ORDER BY a.gets desc;
4.查詢ROW CACHE中的GET量及命中率
SELECT r.cache#,r.parameter name,r.TYPE,r.subordinate#,r.gets,r.GETMISSES,round((1 - r.GETMISSES/r.gets)*100,2) SUC_PCT FROM v$rowcache r where r.gets <>0 ORDER BY 5 desc;
爭用最多的是 DC_OBJECTS,DC_OBJECTS_GRANTS,DC_HISTOGRAM_DATA,DC_HISTOGRAM_DEF
5.統計硬解析的SQL語句
查SQL硬解析次數1000以上的FORCE_MATCHING_SIGNATURE,若需要得到SQL語句用FORCE_MATCHING_SIGNATURE關聯即可:
select to_char(FORCE_MATCHING_SIGNATURE) FORCE_MATCHING_SIGNATURE, count(1) counts from v$sql
where FORCE_MATCHING_SIGNATURE>0 and FORCE_MATCHING_SIGNATURE <> EXACT_MATCHING_SIGNATURE and EXECUTIONS<=5
group by FORCE_MATCHING_SIGNATURE
having count(1) > 1000
order by 2 desc;
官方v$SQL視圖:
EXACT_MATCHING_SIGNATURE
|
NUMBER
|
Signature calculated on the normalized SQL text. The normalization includes the removal of white space and the uppercasing of all non-literal strings.
|
FORCE_MATCHING_SIGNATURE
|
NUMBER
|
The signature used when the CURSOR_SHARING parameter is set to FORCE
|
將變量硬編碼至SQL中的游標,FORCE_MATCHING_SIGNATURE值完全相同,而EXACT_MATCHING_SIGNATURE值各有不同。FORCE_MATCHING_SIGNATURE值相同說明在游標共享FORCE模式下,這些游標滿足CURSOR SHARING的條件
6.SQL解析到底哪一步訪問了ROWCACHE,哪一步爭用的latch?
這個問題我查詢了好久,我之前以為只是在語義檢查需要到row cache,其實是在生成執行計劃的時候需要訪問的數據字典次數更多,爭用latch也就更頻繁了,所以這里才是最慢的。
http://docs.oracle.com/cd/E11882_01/server.112/e40540/sqllangu.htm#CNCPT015
SQL execution flow
Parse operations fall into the following categories, depending on the type of statement submitted and the result of the hash check:
-
Hard parse
If Oracle Database cannot reuse existing code, then it must build a new executable version of the application code. This operation is known as ahard parse, or a library cache miss. The database always perform a hard parse of DDL.
During the hard parse, the database accesses the library cache and data dictionary cache numerous times to check the data dictionary. When the database accesses these areas, it uses a serialization device called a latch on required objects so that their definition does not change (see"Latches"). Latch contention increases statement execution time and decreases concurrency.
在硬解析期間,數據庫需要訪問 library cache and data dictionary cache 非常多次去檢查數據字典,當數據庫訪問這些區域的時候,它用一個序列化的設備調用一個latch鎖存這個對象,使其的定義不會被改變。latch 的爭用會增加語句執行時間以及減少并發量。
-
Soft parse
A soft parse is any parse that is not a hard parse. If the submitted statement is the same as a reusable SQL statement in the shared pool, then Oracle Database reuses the existing code. This reuse of code is also called a library cache hit.
Soft parses can vary in the amount of work they perform. For example, configuring the session shared SQL area can sometimes reduce the amount of latching in the soft parses, making them "softer."
In general, a soft parse is preferable to a hard parse because the database skips the optimization and row source generation steps, proceeding straight to execution.
當然有時熱塊爭用也會造成latch,后續我會再分析