【WITH Clause】使用WITH子句提高查询统计效率-颠覆思维定势

2020-01-18 14:22:41

这里介绍的的是一种SQL查询方法，颠覆您日常以select开始的SQL查询写法。
这种神奇的使用方法的背后隐藏着性能的提升！
我们一起来体验一下其中的奥妙。

1.创建表T，并简单初始化几条数据
sec@ora10g> create table t (x number(10), y number(10));
sec@ora10g> insert into t values (1,110);
sec@ora10g> insert into t values (2,120);
sec@ora10g> insert into t values (2,80);
sec@ora10g> insert into t values (3,150);
sec@ora10g> insert into t values (3,30);
sec@ora10g> insert into t values (3,60);
sec@ora10g> commit;

sec@ora10g> select * from t;

         X          Y
---------- ----------
         1        110
         2        120
         2         80
         3        150
         3         30
         3         60

6 rows selected.

2.需求描述
按照x列分组后统计y列的总值，我们的终目标是选出比y列总值的三分之一大的那些分组统计信息。

3.使用子查询方式实现
这可能是大家容易想到的方法。
sec@ora10g> SELECT x, SUM (y) AS total_y
2    FROM t
3 GROUP BY x
4 HAVING SUM (y) > (SELECT SUM (y) / 3 FROM t)
5 ORDER BY total_y
6 /

         X    TOTAL_Y
---------- ----------
         2        200
         3        240

4.WITH Clause方法闪亮登场
sec@ora10g> WITH secooler_sum AS (SELECT x, SUM (y) total_y
2                          FROM t
3                        GROUP BY x)
4 SELECT x, total_y
5    FROM secooler_sum
6   WHERE total_y > (SELECT SUM (total_y) / 3 FROM secooler_sum)
7 ORDER BY total_y
8 /

         X    TOTAL_Y
---------- ----------
         2        200
         3        240

是不是很神奇！这里的查询语句不是以select开始的，而是以“WITH”关键字开头。可以认为Oracle在真正进行查询之前预先构造了一个临时表secooler_sum，之后我们便可多次使用它做进一步的分析和处理。

5.WITH Clause方法的优点
使用WITH Clause方法有什么好处呢？
首先，增加了SQL的易读性，如果构造了多个子查询，结构会更清晰；
更重要的是：“一次分析，多次使用”，这也是为什么会提供性能的地方，达到了“少读”的目标。

6.知其所以然
为什么WITH Clause方法会提高效率？通过查看上面两种方法的执行计划便可略知一二。
1）使用子查询的执行计划
sec@ora10g> set autot trace exp
sec@ora10g> SELECT x, SUM (y) AS total_y
2    FROM t
3 GROUP BY x
4 HAVING SUM (y) > (SELECT SUM (y) / 3 FROM t)
5 ORDER BY total_y
6 /

Execution Plan
----------------------------------------------------------
Plan hash value: 4167292448

-----------------------------------------------------------------------------
| Id | Operation            | Name | Rows | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |     6 |   156 |     5 (40)| 00:00:01 |
|   1 | SORT ORDER BY       |      |     6 |   156 |     5 (40)| 00:00:01 |
|* 2 |   FILTER             |      |       |       |            |          |
|   3 |    HASH GROUP BY     |      |     6 |   156 |     5 (40)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| T    |     6 |   156 |     3   (0)| 00:00:01 |
|   5 |    SORT AGGREGATE    |      |     1 |    13 |            |          |
|   6 |     TABLE ACCESS FULL| T    |     6 |    78 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(SUM("Y")> (SELECT SUM("Y")/3 FROM "T" "T"))

Note
-----
   - dynamic sampling used for this statement

2）使用子WITH Clause的执行计划
sec@ora10g> WITH secooler_sum AS (SELECT x, SUM (y) total_y
2                          FROM t
3                        GROUP BY x)
4 SELECT x, total_y
5    FROM secooler_sum
6   WHERE total_y > (SELECT SUM (total_y) / 3 FROM secooler_sum)
7 ORDER BY total_y
8 /

Execution Plan
----------------------------------------------------------
Plan hash value: 706070671

---------------------------------------------------------------------------------------------------------
| Id | Operation                  | Name                       | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                            |     6 |   156 |     9 (23)| 00:00:01 |
|   1 | TEMP TABLE TRANSFORMATION |                            |       |       |            |          |
|   2 |   LOAD AS SELECT           |                            |       |       |            |          |
|   3 |    HASH GROUP BY           |                            |     6 |   156 |     4 (25)| 00:00:01 |
|   4 |     TABLE ACCESS FULL      | T                          |     6 |   156 |     3   (0)| 00:00:01 |
|   5 |   SORT ORDER BY            |                            |     6 |   156 |     5 (20)| 00:00:01 |
|* 6 |    VIEW                    |                            |     6 |   156 |     2   (0)| 00:00:01 |
|   7 |     TABLE ACCESS FULL      | SYS_TEMP_0FD9D6660_23A196E |     6 |   156 |     2   (0)| 00:00:01 |
|   8 |     SORT AGGREGATE         |                            |     1 |    13 |            |          |
|   9 |      VIEW                  |                            |     6 |    78 |     2   (0)| 00:00:01 |
| 10 |       TABLE ACCESS FULL    | SYS_TEMP_0FD9D6660_23A196E |     6 |   156 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - filter("TOTAL_Y"> (SELECT SUM("TOTAL_Y")/3 FROM (SELECT /*+ CACHE_TEMP_TABLE ("T1") */
              "C0" "X","C1" "TOTAL_Y" FROM "SYS"."SYS_TEMP_0FD9D6660_23A196E" "T1") "SECOOLER_SUM"))

Note
-----
   - dynamic sampling used for this statement

可见，种使用子查询的方法T表被扫描了两次，而使用WITH Clause方法，T表仅被扫描一次。
这也是为什么在大型数据仓库系统中推荐使用WITH Clause方法进行查询统计的原因，这样可以大大的提高数据分析和查询的效率。

另外，观察WITH Clause方法执行计划，其中“SYS_TEMP_0FD9D6660_23A196E”便是在运行过程中构造的中间统计结果临时表。

7.Oracle官方文档中有关WITH Clause的简单描述
Computation Using the WITH Clause

The WITH clause (formally known as subquery_factoring_clause) enables you to reuse the same query block in a SELECT statement when it occurs more than once within a complex query. WITH is a part of the SQL-99 standard. This is particularly useful when a query has multiple references to the same query block and there are joins and aggregations. Using the WITH clause, Oracle retrieves the results of a query block and stores them in the user's temporary tablespace. Note that Oracle Database does not support recursive use of the WITH clause.

参考链接：http://download.oracle.com/docs/cd/B19306_01/server.102/b14223/aggreg.htm#sthref1670

8.小结
WITH Clause方法在数据仓库或大数据量查询中有着自己的优势。
一切提高性能的措施都是推崇的。灵活掌握，裨益无限。在遇到具体问题时可善加利用。

Good luck.

secooler
10.03.22

-- The End --