1179. 重新格式化部门表
表 Department:
±--------------±--------+
| Column Name | Type |
±--------------±--------+
| id | int |
| revenue | int |
| month | varchar |
±--------------±--------+
在 SQL 中,(id, month) 是表的联合主键。
这个表格有关于每个部门每月收入的信息。
月份(month)可以取下列值 [“Jan”,“Feb”,“Mar”,“Apr”,“May”,“Jun”,“Jul”,“Aug”,“Sep”,“Oct”,“Nov”,“Dec”]。
重新格式化表格,使得 每个月 都有一个部门 id 列和一个收入列。
以 任意顺序 返回结果表。
结果格式如以下示例所示。
示例 1:
输入:
Department table:
±-----±--------±------+
| id | revenue | month |
±-----±--------±------+
| 1 | 8000 | Jan |
| 2 | 9000 | Jan |
| 3 | 10000 | Feb |
| 1 | 7000 | Feb |
| 1 | 6000 | Mar |
±-----±--------±------+
输出:
±-----±------------±------------±------------±----±------------+
| id | Jan_Revenue | Feb_Revenue | Mar_Revenue | … | Dec_Revenue |
±-----±------------±------------±------------±----±------------+
| 1 | 8000 | 7000 | 6000 | … | null |
| 2 | 9000 | null | null | … | null |
| 3 | null | 10000 | null | … | null |
±-----±------------±------------±------------±----±------------+
解释:四月到十二月的收入为空。
请注意,结果表共有 13 列(1 列用于部门 ID,其余 12 列用于各个月份)。
题解
格式化表格,使得 每个月 都有一个部门 id 列和一个收入列
- 经典的行转列,可以使用聚合函数+group by + case when来实现
方法一 SUM + group by
select
id
,SUM(case when month='Jan' then revenue else null end) as Jan_Revenue
,SUM(case when month='Feb' then revenue else null end) as Feb_Revenue
,SUM(case when month='Mar' then revenue else null end) as Mar_Revenue
,SUM(case when month='Apr' then revenue else null end) as Apr_Revenue
,SUM(case when month='May' then revenue else null end) as May_Revenue
,SUM(case when month='Jun' then revenue else null end) as Jun_Revenue
,SUM(case when month='Jul' then revenue else null end) as Jul_Revenue
,SUM(case when month='Aug' then revenue else null end) as Aug_Revenue
,SUM(case when month='Sep' then revenue else null end) as Sep_Revenue
,SUM(case when month='Oct' then revenue else null end) as Oct_Revenue
,SUM(case when month='Nov' then revenue else null end) as Nov_Revenue
,SUM(case when month='Dec' then revenue else null end) as Dec_Revenue
from Department
group by id
方法二 MAX + group by
select
id
,MAX(case when month='Jan' then revenue else null end) as Jan_Revenue
,MAX(case when month='Feb' then revenue else null end) as Feb_Revenue
,MAX(case when month='Mar' then revenue else null end) as Mar_Revenue
,MAX(case when month='Apr' then revenue else null end) as Apr_Revenue
,MAX(case when month='May' then revenue else null end) as May_Revenue
,MAX(case when month='Jun' then revenue else null end) as Jun_Revenue
,MAX(case when month='Jul' then revenue else null end) as Jul_Revenue
,MAX(case when month='Aug' then revenue else null end) as Aug_Revenue
,MAX(case when month='Sep' then revenue else null end) as Sep_Revenue
,MAX(case when month='Oct' then revenue else null end) as Oct_Revenue
,MAX(case when month='Nov' then revenue else null end) as Nov_Revenue
,MAX(case when month='Dec' then revenue else null end) as Dec_Revenue
from Department
group by id
方法三 MIN + group by
select
id
,MIN(case when month='Jan' then revenue else null end) as Jan_Revenue
,MIN(case when month='Feb' then revenue else null end) as Feb_Revenue
,MIN(case when month='Mar' then revenue else null end) as Mar_Revenue
,MIN(case when month='Apr' then revenue else null end) as Apr_Revenue
,MIN(case when month='May' then revenue else null end) as May_Revenue
,MIN(case when month='Jun' then revenue else null end) as Jun_Revenue
,MIN(case when month='Jul' then revenue else null end) as Jul_Revenue
,MIN(case when month='Aug' then revenue else null end) as Aug_Revenue
,MIN(case when month='Sep' then revenue else null end) as Sep_Revenue
,MIN(case when month='Oct' then revenue else null end) as Oct_Revenue
,MIN(case when month='Nov' then revenue else null end) as Nov_Revenue
,MIN(case when month='Dec' then revenue else null end) as Dec_Revenue
from Department
group by id
可能一开始看到SUM、MAX、MIN会不理解为啥?
可以看下这2个图例呢?
中间分组的过程其实是内部存储的,无法查询出来的一个虚拟的结果,一个框是一个集合的内容,这样的话就比较好理解为啥用聚合函数了。
如果不使用聚合函数会怎么样呢?
如果不使用的话,行数不会减少,会和输入数据一样的行数,就需要考虑一个合并的问题了。
大致效果是:
1, 100,null,null,null,…
2,null,100,null,null,…
1,null,100,null,null,…
显然id=1的数据没有合并,违背了行转列的预期效果。
分析案例
解题思路
由于筛选结果中每个ID是一个记录 因此GROUP BY ID.
每个月份是一列,因此筛选每个月份时使用CASE [when…then…] END只取当前月份.
需要使用SUM()聚合函数 因为如果没有聚合函数 筛选出来的是
GROUP BY、CASE…END之后的第一行.
比如 Department 表:
+------+---------+-------+
| id | revenue | month |
+------+---------+-------+
| 1 | 8000 | Jan |
| 2 | 9000 | Jan |
| 3 | 10000 | Feb |
| 1 | 7000 | Feb |
| 1 | 6000 | Mar |
+------+---------+-------+
GROUP BY ID
+------+---------+-------+
| id | revenue | month |
+------+---------+-------+
| 1 | 8000 | Jan |
| 1 | 7000 | Feb |
| 1 | 6000 | Mar |
-------------------------
| 2 | 9000 | Jan |
-------------------------
| 3 | 10000 | Feb |
+------+---------+-------+
如果没有聚合函数 只输出第一行 比如
SELECT ID, (CASE WHEN MONTH='JAN' THEN REVENUE END) AS JAN_REVENUE,
(CASE WHEN MONTH='FEB' THEN REVENUE END) AS FEB_REVENUE
FROM DEPARTMENT GROUP BY ID
会输出
+------+-------------+-------------+
| ID | JAN_REVENUE | FEB_REVENUE |
+------+-------------+-------------+
| 1 | 8000 | NULL |
| 2 | 7000 | NULL |
| 3 | NULL | 10000 |
+------+-------------+-------------+
其中 ID=1 的 FEB_REVENUE 结果不对,这是因为 ID=1 时, (CASE WHEN MONTH='FEB' THEN REVENUE END)= [NULL, 7000, NULL], 没有聚合函数会只取第一个,即NULL