近一个同事问为啥这个SQL怎么优化都是走全表扫描,让我帮着看看,从微信发来得txt执行计划看,眼我就预料是这个问题,隐式类型转换,函数屏蔽了索引,经过分析验证,终确认是这个问题,下面我把问题仔细分析下,也再重温下hash-join和nested-loop得原理。
这是一个简单得SQL,从执行计划看两表HASH连接,hash连接时filter数据,具体SQL就不展示了
SELECT * FROM TABLE (DBMS_XPLAN.DISPLAY);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1376627223
-------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6 | 894 | 9957 (1) | 00:00:01 |
| 1 | SORT ORDER BY | | 6 | 894 | 9957 (1 ) | 00:00:01 |
|* 2 | HASH JOIN | | 6 | 894 | 9956 (1) | 00:00:01 |
|* 3 | TABLE ACCESS BY INDEX ROWID BATCHED | mytab-1 | 1 | 59 | 4 (0) | 00:00:01 |
|* 4 | INDEX RANGE SCAN | IDX_mytab-1_BOE_CODE | 1 | | 3 (0) | 00:00:01 |
|* 5 | TABLE ACCESS FULL | mytab-2 | 1400K| 120M| 9948 (1) | 00:00:01 |
-------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("BS"."ID"=TO_NUMBER("HK"."BUSINESS_ID_"))
3 - filter("BS"."GROUP_ID"=:2)
4 - access("BS"."BOE_CODE"=:1)
5 - filter("HK"."ASSIGNEE_NAME_"<>'ACTION' OR "HK"."ASSIGNEE_NAME_" IS NULL)
其执行步骤是4-3-5-2-1-0,而4,3得结果集与5全表扫得结果集做hash jion,注意这个hash-join肯定有
过滤条件,否则就走笛卡尔积了,从谓词部分我们知道这个过滤条件是
access("BS"."ID"=TO_NUMBER("HK"."BUSINESS_ID_"))
也就是从这个字段做hash然后再做join,奇怪得是3,4的结果集仅仅返回一行数据,又在"HK"."BUSINESS_ID_"字段
做hash过滤数据,为啥不走嵌套循环呢,如果"HK"."BUSINESS_ID_"字段再表mytab-2中NDV很高,其实这里返回6行数据,说明适合通过索引解决
我们下面看下索引分布情况
TABLE_NAME INDEX_NAME COLUMN_NAME
------------------------------ ---------------------------------------- ------------------------------
mytab-1 IDX_mytab-1_BOE_CODE BOE_CODE
mytab-1 PK_mytab-1 ID
mytab-2 IDX_mytab-2_BUSINESS_ ID_ BUSINESS_ID_
mytab-2 SYS_C00411492 ID_
分析:表hk(mytab-2 )在 BUSINESS_ID_有个组合索引,我们理论分析它应该走跳跃扫描,然后通过嵌套执行在过滤数据,
但是原始执行计划用了hash,这里得问题就是为啥表从mytab-1中返回一条数据后,与表mytab-2关联为啥没走索引。
我们要注意下原始执行计划得第二步
2 - access("BS"."ID"=TO_NUMBER("HK"."BUSINESS_ID_"))
显然Oracle在字段"HK"."BUSINESS_ID_"加了函数TO_NUMBER,说明字段"BS"."ID"和"HK"."BUSINESS_ID_"得数据类型不一致
通过查询我们知道"BS"."ID"为number,而"HK"."BUSINESS_ID_"为varchar2,
问题找到了,由于表字段数据类型设计问题,导致了隐式类型转换,在有索引得字段加了函数导致索引失效。
解决方法:
创建函数索引
create indexuser1.IDX_mytab-2_NUMBER_BUSINESS_ID_ onuser1.mytab-2(TO_NUMBER("HK"."BUSINESS_ID_"));
创建函数索引后
SQL> l
1* SELECT * FROM TABLE (DBMS_XPLAN.DISPLAY)
SQL> /
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1195847964
-------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6 | 924 | 13 (8)| 00:00:01 |
| 1 | SORT ORDER BY | | 6 | 924 | 13 (8)| 00:00:01 |
| 2 | NESTED LOOPS | | 6 | 924 | 12 (0)| 00:00:01 |
|* 3 | TABLE ACCESS BY INDEX ROWID BATCHED | mytab-1 | 1 | 59 | 4 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | IDX_mytab-1_BOE_CODE | 1 | | 3 (0)| 00:00:01 |
|* 5 | TABLE ACCESS BY INDEX ROWID BATCHED | mytab-2 | 6 | 570 | 8 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | IDX_mytab-2_NUMBER_BUSINESS_ID_ | 6 | | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("BS"."GROUP_ID"=:2)
4 - access("BS"."BOE_CODE"=:1)
5 - filter("HK"."ASSIGNEE_NAME_"<>'ACTION' OR "HK"."ASSIGNEE_NAME_" IS NULL)
6 - access("BS"."ID"=TO_NUMBER("BUSNESS_ID_"))
从执行计划看全表扫描没有了cost成本也下降了很多,(说明表mytab-2 不是小表),3,4步骤得结果集返回1行数据,这行数据传值
到"BS"."ID"=TO_NUMBER("BUSNESS_ID_")过滤条件,5,6步骤通过这个值通过索引找到6行数据,或者说因为只有6行数据,Oracle决定
使用索引,从而选择了嵌套循环得jion方式,从而优化了该SQL。