【MySQL】MySQL 8.0 新特性之 - 公用表表达式（CTE）

MySQL 8.0 新特性之 - 公用表表达式（CTE）

1. 公用表表达式（CTE） - WITH 介绍
2. CTE 与 Derived Table

1. 公用表表达式（CTE） - WITH 介绍

1.1 公用表表表达式

1.1.1 什么是公用表表达式

官网：https://dev.mysql.com/doc/refman/8.0/en/with.html#common-table-expressions

MySQL 从 8.0 开始支持 WITH 语法，即：Common Table Expressions - CTE，公用表表达式。

CTE 是一个命名的临时结果集合，仅在单个 SQL 语句（select、insert、update 或 delete）的执行范围内存在。

与派生表类似的是：CTE 不作为对象存储，仅在查询执行期间持续。与派生表不同的是：CTE 可以是自引用（递归CTE），也可以在同一查询中多次引用。此外，与派生表相比，CTE 提供了更好的可读性和性能。

1.1.2 CTE 语法

CTE 的结构包括：名称、可选列列表和定义 CTE 的查询。定义 CTE 后，可以像 select、insert、update、delete 或 create view 语句中的视图一样使用它。

with cte_name (column_list) as (
    query
    )
select * from cte_name;

查询中的列数必须与 column_list 中的列数相同。如果省略 column_list，CTE 将使用定义 CTE 的查询的列列表。

1.1.3 CTE示例

初始化数据：

-- create table
create table department
(
    id        bigint auto_increment comment '主键ID'
        primary key,
    dept_name varchar(32)      not null comment '部门名称',
    parent_id bigint default 0 not null comment '父级id'
);

-- insert values
insert into `department` values (null, '总部', 0);
insert into `department` values (null, '研发部', 1);
insert into `department` values (null, '测试部', 1);
insert into `department` values (null, '产品部', 1);
insert into `department` values (null, 'Java组', 2);
insert into `department` values (null, 'Python组', 2);
insert into `department` values (null, '前端组', 2);
insert into `department` values (null, '供应链测试组', 3);
insert into `department` values (null, '商城测试组', 3);
insert into `department` values (null, '供应链产品组', 4);
insert into `department` values (null, '商城产品组', 4);
insert into `department` values (null, 'Java1组', 5);
insert into `department` values (null, 'Java2组', 5);

（1）最基本的CTE语法：

mysql> with cte1 as (select * from `department` where id in (1, 2)),
    ->      cte2 as (select * from `department` where id in (2, 3))
    -> select *
    -> from cte1
    ->          join cte2
    -> where cte1.id = cte2.id;
+----+-----------+-----------+----+-----------+-----------+
| id | dept_name | parent_id | id | dept_name | parent_id |
+----+-----------+-----------+----+-----------+-----------+
|  2 | 研发部     |         1 |  2 | 研发部     |         1 |
+----+-----------+-----------+----+-----------+-----------+
1 row in set (0.00 sec)

（2）一个 CTE 引用另一个 CTE

mysql> with cte1 as (select * from `department` where id = 1),
    ->      cte2 as (select * from cte1)
    -> select *
    -> from cte2;
+----+-----------+-----------+
| id | dept_name | parent_id |
+----+-----------+-----------+
|  1 | 总部       |         0 |
+----+-----------+-----------+
1 row in set (0.00 sec)

1.3 递归 CTE

1.3.1 递归 CTE 简介

递归 CTE 是一个具有引用 CTE 名称本身的子查询的 CTE。递归 CTE 的语法为：

with recursive cte_name as (
    initial_query  -- anchor member
union all
recursive_query -- recursive member that references to the cte name
)
select * from cte_name;

递归 CTE 由三个主要部分组成：

形成 CTE 结构的基本结果集的初始查询（initial_query），初始查询部分被称为锚成员。
递归查询部分是引用 CTE 名称的查询，因此称为递归成员。递归成员由一个 union all 或 union distinct 运算符与锚成员相连。
终止条件是当递归成员没有返回任何行时，确保递归停止。

递归 CTE 的执行顺序如下：

首先，将成员分为两个：锚点和递归成员。
接下来，执行锚成员形成基本结果集（R0），并使用该基本结果集进行下一次迭代。
然后，将 Ri 结果集作为输入执行递归成员，并将 Ri + 1 作为输出。
之后，重复第三步，直到递归成员返回一个空结果集，换句话说，满足终止条件。
最后，使用 union all 运算符将结果集从 R0 到 Rn 组合。

1.3.2 递归成员限制

递归成功不能包含以下结构：

聚合函数，如 max、min、sum、avg、count 等。
group by 子句
order by 子句
limit 子句
distinct

上述约束不适用于锚点成员。另外，只有在使用 union 运算符时，要禁止 distinct 才适用。如果使用 union distinct 运算符，则允许使用 distinct。

另外，递归成员只能在其子句中引用 CTE 名称，而不是引用任何子查询。

1.3.3 递归 CTE 示例

with recursive cte_count (n)
                   as (
        select 1
        union all
        select n + 1
        from cte_count
        where n < 3
    )
select n from cte_count;

在此示例中，以下查询：

select 1

是作为基本结果集返回 1 的锚成员。

以下查询：

select n + 1
from cte_count
where n < 3

是递归成员，因为它引用了 cte_count 的 CTE 名称。递归成员中的表达式 < 3 是终止条件。当 n 等于 3，递归成员将返回一个空集合，将停止递归。

下图显示了上述 CTE 的元素：

在这里插入图片描述
递归 CTE 返回以下输出：

+------+
|    n |
+------+
|    1 |
|    2 |
|    3 |
+------+

递归 CTE 的执行步骤如下：

首先，分离锚和递归成员。
接下来，锚定成员形成初始行 select 1，因此第一次迭代在 n = 1 时产生 1 + 1 = 2。
然后，第二次迭代对第一次迭代的输出 2 进行操作，并且在 n = 2 时产生 2 + 1 = 3。
之后，在第三次操作 n = 3 之前，满足终止条件 n <3 ，因此查询停止。
最后，使用 union all 运算符组合所有结果集 1，2 和 3。

1.3.4 使用递归 CTE 遍历分层数据

查部门 id = 2 的所有下级部门和本级：

mysql> with recursive cte_tab as (select id, dept_name, parent_id, 1 as level
    ->                            from department
    ->                            where id = 2
    ->                            union all
    ->                            select d.id, d.dept_name, d.parent_id, level + 1
    ->                            from cte_tab c
    ->                                     inner join department d on c.id = d.parent_id
    -> )
    -> select *
    -> from cte_tab;
+------+-----------+-----------+-------+
| id   | dept_name | parent_id | level |
+------+-----------+-----------+-------+
|    2 | 研发部     |         1 |     1 |
|    5 | Java组    |         2 |     2 |
|    6 | Python组  |         2 |     2 |
|    7 | 前端组     |         2 |     2 |
|   12 | Java1组   |         5 |     3 |
|   13 | Java2组   |         5 |     3 |
+------+-----------+-----------+-------+
6 rows in set (0.00 sec)

2. CTE 与 Derived Table

针对 from 子句里面的 subquery，MySQL 在不同版本中，是做过一系列的优化，接下来我们就来看看。

在 5.6 版本中

MySQL 会对每一个 Derived Table 进行物化，生成一个临时表保存 Derived Table 的结果，然后利用临时表来完成父查询的操作，具体如下：

mysql> explain
    -> select * from (select * from department where id <= 1000) t1 join (select * from department where id >= 990) t2 on t1.id = t2.id;
+----+-------------+------------+------------+--------+---------------+---------+---------+----------------------------+------+----------+-------------+
| id | select_type | table      | partitions | type   | possible_keys | key     | key_len | ref                        | rows | filtered | Extra       |
+----+-------------+------------+------------+--------+---------------+---------+---------+----------------------------+------+----------+-------------+
|  1 | SIMPLE      | department | NULL       | range  | PRIMARY       | PRIMARY | 8       | NULL                       |    1 |   100.00 | Using where |
|  1 | SIMPLE      | department | NULL       | eq_ref | PRIMARY       | PRIMARY | 8       | pointer_mall.department.id |    1 |   100.00 | NULL        |
+----+-------------+------------+------------+--------+---------------+---------+---------+----------------------------+------+----------+-------------+
2 rows in set, 1 warning (0.01 sec)

在 5.7 版本中

MySQL 引入了 Derived Merge 新特性，允许符合条件的 Derived Table 中的子表与父查询的表进行合并，具体如下：

mysql> explain
    -> select * from (select * from department where id <= 1000) t1 join (select * from department where id >= 990) t2 on t1.id = t2.id;
+----+-------------+------------+------------+-------+---------------+-------------+---------+-------+---------+----------+-------------+
| id | select_type | table      | partitions | type  | possible_keys | key         | key_len | ref   | rows    | filtered | Extra       |
+----+-------------+------------+------------+-------+---------------+-------------+---------+-------+---------+----------+-------------+
|  1 | PRIMARY     | <derived2> | NULL       | ALL   | NULL          | NULL        | NULL    | NULL  |    1900 |   100.00 | NULL        |
|  1 | PRIMARY     | <derived3> | NULL       | ref   | <auto_key0>   | <auto_key0> | 8       | t1.id |    2563 |   100.00 | NULL        |
|  3 | DERIVED     | department | NULL       | range | PRIMARY       | PRIMARY     | 8       | NULL  | 4870486 |   100.00 | Using where |
|  2 | DERIVED     | department | NULL       | range | PRIMARY       | PRIMARY     | 8       | NULL  |    1900 |   100.00 | Using where |
+----+-------------+------------+------------+-------+---------------+-------------+---------+-------+---------+----------+-------------+
4 rows in set, 1 warning (0.00 sec)

在 8.0 版本中

我们可以使用 CTE 实现，其执行计划也是和 Derived Table 一样

mysql> explain
    -> with t1 as (select * from department where id <= 1000),
    ->      t2 as (select * from department where id >= 990)
    -> select * from t1 join t2 on t1.id = t2.id;
+----+-------------+------------+------------+--------+---------------+---------+---------+----------------------------+------+----------+-------------+
| id | select_type | table      | partitions | type   | possible_keys | key     | key_len | ref                        | rows | filtered | Extra       |
+----+-------------+------------+------------+--------+---------------+---------+---------+----------------------------+------+----------+-------------+
|  1 | SIMPLE      | department | NULL       | range  | PRIMARY       | PRIMARY | 8       | NULL                       |    1 |   100.00 | Using where |
|  1 | SIMPLE      | department | NULL       | eq_ref | PRIMARY       | PRIMARY | 8       | pointer_mall.department.id |    1 |   100.00 | NULL        |
+----+-------------+------------+------------+--------+---------------+---------+---------+----------------------------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)

从测试结果来看，CTE 似乎是 Derived Table 的一个替代品？其实不是的，虽然 CTE 内部优化流程与 Derived Table 类似，但是两者还是区别的，具体如下：

一个 CTE 可以引用另一个 CTE
CTE 可以自引用
CTE 在语句级别生成临时表，多次调用只需要执行一次，提高性能

从上面介绍可以知道，CTE 一方面可以非常方便进行 SQL 开发，另一方面也可以提升 SQL 执行效率。