绑定完请刷新页面
取消
刷新

分享好友

×
取消 复制
直方图如何影响优化器选择合理的执行计划
2022-09-29 10:56:17

直方图原理剖析测试

 表结构两个字段id1,id2,其中 id1 NDV67181id2NDV10

SQL> desc t1;

 Name                                      Null?    Type

 ----------------------------------------- -------- ----------------------------

 ID1                                                NUMBER

 ID2                                                NUMBER

 

SQL> select count(distinct id1)  "id1-ndv" from t1;

 

   id1-ndv

----------

     67181

 

 

SQL> select count(*)  "rows-of-t1" from t1;

 

rows-of-t1

----------

     67181

 

SQL> select count(distinct id2) "id2-ndv" from t1;

 

   id2-ndv

----------

        10

SQL> select distinct id2 ,count(*) from t1 group by id2 order by id2;

 

       ID2   COUNT(*)

---------- ----------

         2          1

         3          1

         4          1

         5          1

         6          1

         7          1

         8          1

         9          1

        10          1

     10000      67172

 

10 rows selected.

测试一: 我们创建索引,不收集直方图。

 

SQL> create index idx_t1_id2 on t1(id2);

 

Index created.

 

收集列id1,id2上的统计信息,但是不创建直方图

09:21:34 SQL> exec dbms_stats.gather_table_stats(user,'T1',method_opt=>'for columns id2 size 1');

 

PL/SQL procedure successfully completed.

 

Elapsed: 00:00:02.53

09:28:12 SQL> exec dbms_stats.gather_table_stats(user,'T1',method_opt=>'for columns id1 size 1');

 

PL/SQL procedure successfully completed.

 

Elapsed: 00:00:00.35

09:28:13 SQL> select table_name,column_name,num_distinct,low_value,high_value,density,num_nulls,num_buckets,histogram from user_tab_columns where table_name='T1';

 

TABLE_NAME COLUMN_NAME    NUM_DISTINCT LOW_VALUE    HIGH_VALUE       DENSITY  NUM_NULLS NUM_BUCKETS HISTOGRAM

---------- -------------------- ------------ -------------------- -------------------- ---------- ---------- ----------- ----------

T1         ID1                    67181      C103                 C3082533      .000014885                1    NONE

T1         ID2                    10         C103                 C302           .1                        1    NONE

 

2 rows selected.

这里统计信息没有直方图信息,但是NUM_DISTINCTDENSITY都有了,这个DENSITY优化器会用,计算表返回的行数,从而选择合适的执行计划 。

 

09:39:25 SQL> set autotr on;

09:39:29 SQL> select * from t1 where id2=6;

 

       ID1        ID2

---------- ----------

         6          6

 

1 row selected.

 

Elapsed: 00:00:00.04

 

Execution Plan

----------------------------------------------------------

Plan hash value: 2490154277

 

-------------------------------------------------------------------------------

| Id  | Operation                              | Name       | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT                      |           | 6718 | 53744 |    28   (0)| 00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1         | 6718 | 53744 |    28   (0)| 00:00:01 |

|*  2 |   INDEX RANGE SCAN                   | IDX_T1_ID2  | 6718 |     |    14   (0)| 00:00:01 |

-------------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("ID2"=6)

Statistics

----------------------------------------------------------

          1  recursive calls

            db block gets

          4  consistent gets

            physical reads

            redo size

        625  bytes sent via SQL*Net to client

        390  bytes received via SQL*Net from client

          2  SQL*Net roundtrips to/from client

            sorts (memory)

            sorts (disk)

          1  rows processed

10:10:23 SQL>select * from t1 where id2=10000

。。。。。省略大量输出

62647      10000

 

67172 rows selected.

 

Elapsed: 00:00:18.25

 

Execution Plan

----------------------------------------------------------

Plan hash value: 2490154277

-------------------------------------------------------------------------------------------

| Id  | Operation                            | Name       | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT                      |          | 6718 | 53744 |    28   (0)| 00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED | T1        | 6718 | 53744 |   28   (0)| 00:00:01 |

|*  2 |   INDEX RANGE SCAN                   | IDX_T1_ID2 |  6718 |     |    14   (0)| 00:00:01 |

-------------------------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

 

   2 - access("ID2"=10000)

 

Statistics

----------------------------------------------------------

          1  recursive calls

            db block gets

       9210  consistent gets

            physical reads

            redo size

    1732580  bytes sent via SQL*Net to client

      49652  bytes received via SQL*Net from client

       4480  SQL*Net roundtrips to/from client

            sorts (memory)

            sorts (disk)

      67172  rows processed

这两个字段的数据分布显然不同,id2=1000067172行,显然这里的统计信息给予优化器的信息不准确,其实优化器使用了Density这个值,计算每个字段的数据,目前没有直方图,这个统计信息默认将10个不同的值平均分布,即平均。导致对于id2=10000的统计信息不准确,从而走了错误的执行计划,应该走全表扫描。下面我们收集直方图

 

 

 

10:00:06 SQL> select COLUMN_NAME,ENDPOINT_NUMBER,ENDPOINT_VALUE,ENDPOINT_REPEAT_COUNT from user_tab_histograms where table_name='T1' and column_name='ID2';

 

COLUMN_NAME     ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_REPEAT_COUNT

-------------------- --------------- -------------- ---------------------

ID2                                1              2                    

ID2                                2              3                    

ID2                                3              4                    

ID2                                4              5                    

ID2                                5              6                    

ID2                                6              7                    

ID2                                7              8                     

ID2                                8              9                    

ID2                                9             10                    

ID2                            67181          10000                    

 

10 rows selected.

对于频率直方图而言,endpoint_number的差是对应ndv值数量,比如32-1=1个,100067181-9=67172

 

再次看执行计划是否改变

10:16:32 SQL> explain plan for        

10:17:05   2  select * from t1 where id2=10000;

 

Explained.

 

Elapsed: 00:00:00.00

10:17:17 SQL> select * from table(dbms_xplan.display());

 

PLAN_TABLE_OUTPUT

-------------------------------------------------------------------------------

Plan hash value: 3617692013

 

--------------------------------------------------------------------------

| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------

|   0 | SELECT STATEMENT  |       | 67172  |   524K|    69   (2)| 00:00:01 |

|*  1 |  TABLE ACCESS FULL  | T1   | 67172  |   524K|    69   (2)| 00:00:01 |

--------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

   1 - filter("ID2"=10000)

13 rows selected.

 

Elapsed: 00:00:00.06

 

这里统计信息变了,通过直方图优化器知道返回67192行,此时选择了全表扫描,注意此时的COst值为69比走索引时的值高,但是此时是一个合理的执行计划,

索引对于DBA,看SQL是否有基本不看Cost值,它是优化器做选择的依据,但是这个选择有时是错误的。

有了直方图后,优化器计算通过索引的成本就会增加,我们使用hint强制走索引看下此时的成本值

67172 rows selected.

 

Elapsed: 00:00:16.90

 

Execution Plan

----------------------------------------------------------

Plan hash value: 2490154277

 

-------------------------------------------------------------------------------

| Id  | Operation                              | Name       | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT                     |            | 67172 |   524K|   271   (1)| 00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1        | 67172 |   524K|   271   (1)| 00:00:01 |

|*  2 |   INDEX RANGE SCAN                  | IDX_T1_ID2  | 67172 |      |   132   (0)| 00:00:01 |

------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("ID2"=10000)

Statistics

----------------------------------------------------------

          1  recursive calls

            db block gets

       9210  consistent gets

            physical reads

            redo size

    1732580  bytes sent via SQL*Net to client

      49902  bytes received via SQL*Net from client

       4480  SQL*Net roundtrips to/from client

            sorts (memory)

            sorts (disk)

      67172  rows processed

 

发现此时的cost值为271显然成本很高,所以在有直方图的情况下,Oracle选择使用全表扫描,而不是通过索引找id=10000的数据。

分享好友

分享这个小栈给你的朋友们,一起进步吧。

Oracle运维新鲜事-技术与管理各占半边天
创建时间:2020-08-04 11:34:57
本技术栈旨在分享技术心得,运维趣事,故障处理经验,调优案例,故障处理涉及集群,DG,OGG,大家生产中遇到的问题基本都会囊括了,我会发布生产库遇到的故障,希望在交流中互助互益,共同提高,也希望大家讨论,如果您有生产中遇到的集群问题,也可以在这里提出来,一起讨论,现实中也帮助不少同学解决了生产库的故障。
展开
订阅须知

• 所有用户可根据关注领域订阅专区或所有专区

• 付费订阅:虚拟交易,一经交易不退款;若特殊情况,可3日内客服咨询

• 专区发布评论属默认订阅所评论专区(除付费小栈外)

栈主、嘉宾

查看更多
  • Abraham林老师
    栈主
  • 小雨滴
    嘉宾
  • hawkliu
    嘉宾
  • u_97a59a25246404
    嘉宾

小栈成员

查看更多
  • 栈栈
  • dapan
  • 小菜鸟___
  • hwayw
戳我,来吐槽~