您好,登錄后才能下訂單哦!
一直以來DISTINCT的用法都為DBA所知,就是用來濾重。我們也沒有必要質疑DISTINCT的濾重準確性。
但今天突然不知道怎么想的,就想質疑一下。DISTINCT真的能保證過濾的對象沒有重復記錄嗎?
SQL> create table test_distinct as select rownum id from all_objects where rownum < 50000; insert into test_distinct select rownum id from all_objects where rownum < 50000;
這表中有重復的數據,數據插入順序1~4999,然后再重復一次
SQL> select count(id) from (select distinct id from test_distinct) group by id having count(id) > 1; no rows selected
可以看到,通過這條語句發現確實沒有重復的行。那么進一步思考如果不用distinct如何實現這個效果呢。我想起了ROWID這個東西,以下兩句的效果相同。
select count(distinct phoneno) from CUSTPHONE select count(phoneno) from CUSTPHONE where rowid in( select min(rowid) from CUSTPHONE group by phoneno)
我想了解一下DISTINCT語句實際在Oracle中是如何操作的,通過10046事件和tkprof工具獲取跟蹤的信息。
SQL ID: 8vtyapcbqkbwf Plan Hash: 2372476266 select distinct id from test_distinct where rownum < 100 call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.02 0.02 0 138 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 8 0.00 0.00 0 4 0 99 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 10 0.02 0.02 0 142 0 99 Misses in library cache during parse: 1 Optimizer mode: ALL_ROWS Parsing user id: SYS Rows Row Source Operation ------- --------------------------------------------------- 99 HASH UNIQUE (cr=4 pr=0 pw=0 time=0 us cost=528 size=1287 card=99) 99 COUNT STOPKEY (cr=4 pr=0 pw=0 time=196 us) 99 TABLE ACCESS FULL TEST_DISTINCT (cr=4 pr=0 pw=0 time=98 us cost=44 size=1318174 card=101398) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ SQL*Net message to client 8 0.00 0.00 asynch descriptor resize 2 0.00 0.00 SQL*Net message from client 8 59.43 59.43
可以看到執行計劃中DISINCT是通過HASH UNIQUE算法來實現的。同時ROWNUM虛列使用的是COUNT算法,STOPKEY說明我給ROWNUM虛列加了限定條件100,當到達這個限定條件時,該語句查詢結束。
那么到這我該怎么理解HASH UNIQUE算法的目的呢?我在網上查看了相關信息,發現真有人做了實驗實驗幫助我們加上對該算法的印象。在10G2R以前,Oracle對DISTINCT使用的是sort unique這種操作方式因為涉及到排序,是非常影響語句的執行效率的。因此10G2R之后的版本,Oracle改進了算法。
SQL> select distinct id from test_distinct where rownum < 100; 99 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2372476266 --------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | --------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 99 | 1287 | | 528 (1)| 00:00:07 | | 1 | HASH UNIQUE | | 99 | 1287 | 2000K| 528 (1)| 00:00:07 | |* 2 | COUNT STOPKEY | | | | | | | | 3 | TABLE ACCESS FULL| TEST_DISTINCT | 101K| 1287K| | 44 (3)| 00:00:01 | --------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(ROWNUM<100) Note ----- - dynamic sampling used for this statement (level=2) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 4 consistent gets 8 physical reads 0 redo size 2134 bytes sent via SQL*Net to client 589 bytes received via SQL*Net from client 8 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 99 rows processed SQL> alter system flush buffer_cache SQL> select id from test_distinct where rownum < 100 group by id; 99 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 521476922 ------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 99 | 1287 | 47 (9)| 00:00:01 | | 1 | HASH GROUP BY | | 99 | 1287 | 47 (9)| 00:00:01 | |* 2 | COUNT STOPKEY | | | | | | | 3 | TABLE ACCESS FULL| TEST_DISTINCT | 101K| 1287K| 44 (3)| 00:00:01 | ------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(ROWNUM<100) Note ----- - dynamic sampling used for this statement (level=2) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 4 consistent gets 8 physical reads 0 redo size 2134 bytes sent via SQL*Net to client 589 bytes received via SQL*Net from client 8 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 99 rows processed
通過網上這組實驗可以看到DISTINCT和GROUP BY分別使用了HASH UNIQUE和HASH GROUP BY算法。而兩者執行效果和結果都相同。基于規則的DISTINCT和GROUP BY的查詢的執行計劃如下
SQL> select /*+ rule*/ distinct id from test_distinct where rownum < 100; 99 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 3449293992 --------------------------------------------- | Id | Operation | Name | --------------------------------------------- | 0 | SELECT STATEMENT | | | 1 | SORT UNIQUE | | |* 2 | COUNT STOPKEY | | | 3 | TABLE ACCESS FULL| TEST_DISTINCT | --------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(ROWNUM<100) Note ----- - rule based optimizer used (consider using cbo) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 4 consistent gets 8 physical reads 0 redo size 2134 bytes sent via SQL*Net to client 589 bytes received via SQL*Net from client 8 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 99 rows processed SQL> select /*+ rule*/ id from test_distinct where rownum < 100 group by id; 99 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 351786816 --------------------------------------------- | Id | Operation | Name | --------------------------------------------- | 0 | SELECT STATEMENT | | | 1 | SORT GROUP BY | | |* 2 | COUNT STOPKEY | | | 3 | TABLE ACCESS FULL| TEST_DISTINCT | --------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(ROWNUM<100) Note ----- - rule based optimizer used (consider using cbo) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 4 consistent gets 8 physical reads 0 redo size 2134 bytes sent via SQL*Net to client 589 bytes received via SQL*Net from client 8 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 99 rows processed
因為表的數據量教啥,基于rule的算法和通過hash的算法區別并不明顯。但重點在于了解Oracle的不同算法的可能。
另外,ORACLE除了提供DISTINCT以外,還提供了UNIQUE來過濾重復的數據。官方文檔給出的解釋。
http://docs.oracle.com/cd/E11882_01/server.112/e41084/statements_10002.htm#SQLRF55272
DISTINCT | UNIQUE
Specify DISTINCT
or UNIQUE
if you want the database to return only one copy of each set of duplicate rows selected. These two keywords are synonymous. Duplicate rows are those with matching values for each expression in the select list.
到這里我已經理解了DISTINCT的作用,同時還學習了其它新的知識。非常棒!
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。