01 某音短视频
某音短视频
SQL1 各个视频的平均完播率
描述: 用户-视频互动表tb_user_video_log
id | uid | video_id | start_time | end_time | if_follow | if_like | if_retweet | comment_id |
1 | 101 | 2001 | 2021-10-01 10:00:00 | 2021-10-01 10:00:30 | 0 | 1 | 1 | NULL |
2 | 102 | 2001 | 2021-10-01 10:00:00 | 2021-10-01 10:00:24 | 0 | 0 | 1 | NULL |
3 | 103 | 2001 | 2021-10-01 11:00:00 | 2021-10-01 11:00:34 | 0 | 1 | 0 | 1732526 |
4 | 101 | 2002 | 2021-09-01 10:00:00 | 2021-9-01 10:00:42 | 1 | 0 | 1 | NULL |
5 | 102 | 2002 | 2021-10-01 11:00:00 | 2021-10-01 10:00:30 | 1 | 0 | 1 | NULL |
(uid-用户ID, video_id-视频ID, start_time-开始观看时间, end_time-结束观看时间, if_follow-是否关注, if_like-是否点赞, if_retweet-是否转发, comment_id-评论ID)
短视频信息表tb_video_info
id | video_id | author | tag | duration | release_time |
1 | 2001 | 901 | 影视 | 30 | 2021-01-01 07:00:00 |
2 | 2002 | 901 | 美食 | 60 | 2021-01-01 07:00:00 |
3 | 2003 | 902 | 旅游 | 90 | 2021-01-01 07:00:00 |
(video_id-视频ID, author-创作者ID, tag-类别标签, duration-视频时长(秒), release_time-发布时间)
问题:计算2021年里有播放记录的每个视频的完播率(结果保留三位小数),并按完播率降序排序
注:视频完播率是指完成播放次数占总播放次数的比例。简单起见,结束观看时间与开始播放时间的差>=视频时长时,视为完成播放。
SELECT a.video_id,
ROUND( # ROUND(必需(要舍入的字段), 可选(规定要返回的小数位数)) 把数值字段舍入为指定的小数位数
SUM(
CASE
# 当 结束观看时间 与 开始播放时间 的差 >= 视频时长时,视为完成播放。
WHEN timestampdiff(second, b.start_time, b.end_time) >= a.duration
THEN 1 ELSE 0 END
) / COUNT(b.uid), 3
) AS avg_com_play_rate
FROM tb_video_info a JOIN tb_user_video_log b ON a.video_id=b.video_id
WHERE YEAR(b.start_time) = 2021
GROUP BY a.video_id
ORDER BY avg_com_play_rate DESC
# 根据 视频ID 分组 按 视频完播率 倒序 查询 开始观看时间 为 2021年 的视频的完播率
SQL2 平均播放进度大于60%的视频类别
描述: 用户-视频互动表tb_user_video_log
id | uid | video_id | start_time | end_time | if_follow | if_like | if_retweet | comment_id |
1 | 101 | 2001 | 2021-10-01 10:00:00 | 2021-10-01 10:00:30 | 0 | 1 | 1 | NULL |
2 | 102 | 2001 | 2021-10-01 10:00:00 | 2021-10-01 10:00:21 | 0 | 0 | 1 | NULL |
3 | 103 | 2001 | 2021-10-01 11:00:50 | 2021-10-01 11:01:20 | 0 | 1 | 0 | 1732526 |
4 | 102 | 2002 | 2021-10-01 11:00:00 | 2021-10-01 11:00:30 | 1 | 0 | 1 | NULL |
5 | 103 | 2002 | 2021-10-01 10:59:05 | 2021-10-01 11:00:05 | 1 | 0 | 1 | NULL |
(uid-用户ID, video_id-视频ID, start_time-开始观看时间, end_time-结束观看时间, if_follow-是否关注, if_like-是否点赞, if_retweet-是否转发, comment_id-评论ID)
短视频信息表tb_video_info
id | video_id | author | tag | duration | release_time |
1 | 2001 | 901 | 影视 | 30 | 2021-01-01 07:00:00 |
2 | 2002 | 901 | 美食 | 60 | 2021-01-01 07:00:00 |
3 | 2003 | 902 | 旅游 | 90 | 2021-01-01 07:00:00 |
(video_id-视频ID, author-创作者ID, tag-类别标签, duration-视频时长, release_time-发布时间)
问题:计算各类视频的平均播放进度,将进度大于60%的类别输出。
注:
- 播放进度=播放时长÷视频时长*100%,当播放时长大于视频时长时,播放进度均记为100%。
- 结果保留两位小数,并按播放进度倒序排序。
SELECT tag, CONCAT( # 多个字符串合并为一个字符串
ROUND( # 返回离 x 最近的整数
AVG( # 返回一个表达式的平均值 返回视频播放进度的平均值
# IF(expr,v1,v2) 如果表达式 expr 成立,返回结果 v1;否则,返回结果 v2。
# 播放时长 >= 视频时长 成立 返回 1, 不成立 返回 视频播放进度
IF(timestampdiff(second, start_time, end_time) >= duration, 1, timestampdiff(second, start_time, end_time) / duration)
) * 100, 2
), '%'
) avg_play_progress
FROM tb_user_video_log a JOIN tb_video_info b ON a.video_id = b.video_id
GROUP BY b.tag HAVING # 在 SQL 中增加 HAVING 子句原因是,WHERE 关键字无法与聚合函数一起使用。
# HAVING 子句可以让我们筛选分组后的各组数据。
# 此处 HAVING 子句的判断条件是 平均播放进度 > 60%
AVG(
IF(timestampdiff(second, start_time, end_time) >= duration, 1, timestampdiff(second, start_time, end_time) / duration)
) > 0.6 # 平均播放进度 > 60% 的
ORDER BY avg_play_progress DESC
SQL3 每类视频近一个月的转发量/率
描述: 用户-视频互动表tb_user_video_log
id | uid | video_id | start_time | end_time | if_follow | if_like | if_retweet | comment_id |
1 | 101 | 2001 | 2021-10-01 10:00:00 | 2021-10-01 10:00:20 | 0 | 1 | 1 | NULL |
2 | 102 | 2001 | 2021-10-01 10:00:00 | 2021-10-01 10:00:15 | 0 | 0 | 1 | NULL |
3 | 103 | 2001 | 2021-10-01 11:00:50 | 2021-10-01 11:01:15 | 0 | 1 | 0 | 1732526 |
4 | 102 | 2002 | 2021-09-10 11:00:00 | 2021-09-10 11:00:30 | 1 | 0 | 1 | NULL |
5 | 103 | 2002 | 2021-10-01 10:59:05 | 2021-10-01 11:00:05 | 1 | 0 | 0 | NULL |
(uid-用户ID, video_id-视频ID, start_time-开始观看时间, end_time-结束观看时间, if_follow-是否关注, if_like-是否点赞, if_retweet-是否转发, comment_id-评论ID)
短视频信息表tb_video_info
id | video_id | author | tag | duration | release_time |
1 | 2001 | 901 | 影视 | 30 | 2021-01-01 07:00:00 |
2 | 2002 | 901 | 美食 | 60 | 2021-01-01 07:00:00 |
3 | 2003 | 902 | 旅游 | 90 | 2020-01-01 07:00:00 |
(video_id-视频ID, author-创作者ID, tag-类别标签, duration-视频时长, release_time-发布时间)
问题:统计在有用户互动的最近一个月(按包含当天在内的近30天算,比如10月31日的近30天为10.2~10.31之间的数据)中,每类视频的转发量和转发率(保留3位小数)。
注:转发率=转发量÷播放量。结果按转发率降序排序。
# 查询计算 视频标签 视频转发数量 视频转发率
SELECT tag, SUM(if_retweet) AS retweet_cnt, ROUND(SUM(if_retweet) / COUNT(*), 3) AS retweet_rate
FROM (
# 查询 视频ID 是否转发 开始观看时间 结束观看时间 类别标签
SELECT l.id, if_retweet, start_time, end_time, tag
FROM tb_user_video_log AS l LEFT JOIN tb_video_info AS i ON l.video_id = i.video_id
) AS t
# DATEDIFF(d1,d2) 计算日期 d1->d2 之间相隔的天数
WHERE DATEDIFF((SELECT MAX(start_time) FROM tb_user_video_log), start_time) <= 29
GROUP BY tag ORDER BY retweet_rate DESC
SQL4 每个创作者每月的涨粉率及截止当前的总粉丝量
描述: 用户-视频互动表tb_user_video_log
id | uid | video_id | start_time | end_time | if_follow | if_like | if_retweet | comment_id |
1 | 101 | 2001 | 2021-09-01 10:00:00 | 2021-09-01 10:00:20 | 0 | 1 | 1 | NULL |
2 | 105 | 2002 | 2021-09-10 11:00:00 | 2021-09-10 11:00:30 | 1 | 0 | 1 | NULL |
3 | 101 | 2001 | 2021-10-01 10:00:00 | 2021-10-01 10:00:20 | 1 | 1 | 1 | NULL |
4 | 102 | 2001 | 2021-10-01 10:00:00 | 2021-10-01 10:00:15 | 0 | 0 | 1 | NULL |
5 | 103 | 2001 | 2021-10-01 11:00:50 | 2021-10-01 11:01:15 | 1 | 1 | 0 | 1732526 |
6 | 106 | 2002 | 2021-10-01 10:59:05 | 2021-10-01 11:00:05 | 2 | 0 | 0 | NULL |
(uid-用户ID, video_id-视频ID, start_time-开始观看时间, end_time-结束观看时间, if_follow-是否关注, if_like-是否点赞, if_retweet-是否转发, comment_id-评论ID)
短视频信息表tb_video_info
id | video_id | author | tag | duration | release_time |
1 | 2001 | 901 | 影视 | 30 | 2021-01-01 07:00:00 |
2 | 2002 | 901 | 美食 | 60 | 2021-01-01 07:00:00 |
3 | 2003 | 902 | 旅游 | 90 | 2020-01-01 07:00:00 |
4 | 2004 | 902 | 美女 | 90 | 2020-01-01 08:00:00 |
(video_id-视频ID, author-创作者ID, tag-类别标签, duration-视频时长, release_time-发布时间)
问题:计算2021年里每个创作者每月的涨粉率及截止当月的总粉丝量
注:
- 涨粉率=(加粉量 - 掉粉量) / 播放量。结果按创作者ID、总粉丝量升序排序。
- if_follow-是否关注为1表示用户观看视频中关注了视频创作者,为0表示此次互动前后关注状态未发生变化,为2表示本次观看过程中取消了关注。
# 查询 视频作者 观看视频时间
SELECT b.author AS author, DATE_FORMAT(a.start_time, '%Y-%m') AS month,
ROUND(( # 计算 涨粉率
COUNT(CASE WHEN a.if_follow = 1 THEN 1 END ) # 观看过程中关注了作者 的人数
-
COUNT(CASE WHEN a.if_follow = 2 THEN 1 END) # 观看过程中取消了关注 的人数
) / COUNT(1), 3) AS fans_growth_rate,
SUM( # 关注人数
SUM(CASE WHEN a.if_follow = 1 THEN 1 WHEN a.if_follow = 2 THEN -1 ELSE 0 END)
)
# OVER(PARTITION BY… ORDER BY…) 分组后对组内数据排序
OVER (PARTITION BY b.author ORDER BY DATE_FORMAT(a.start_time, '%Y-%m')) fans_total
FROM tb_user_video_log a LEFT JOIN tb_video_info b ON a.video_id = b.video_id
WHERE YEAR(a.start_time) = 2021 AND YEAR(a.end_time) = 2021
GROUP BY b.author, DATE_FORMAT(a.start_time, '%Y-%m')
ORDER BY author, fans_total
SQL5 国庆期间每类视频点赞量和转发量
描述: 用户-视频互动表tb_user_video_log
id | uid | video_id | start_time | end_time | if_follow | if_like | if_retweet | comment_id |
1 | 101 | 2001 | 2021-09-24 10:00:00 | 2021-09-24 10:00:20 | 1 | 1 | 0 | NULL |
2 | 105 | 2002 | 2021-09-25 11:00:00 | 2021-09-25 11:00:30 | 0 | 0 | 1 | NULL |
3 | 102 | 2002 | 2021-09-25 11:00:00 | 2021-09-25 11:00:30 | 1 | 1 | 1 | NULL |
4 | 101 | 2002 | 2021-09-26 11:00:00 | 2021-09-26 11:00:30 | 1 | 0 | 1 | NULL |
5 | 101 | 2002 | 2021-09-27 11:00:00 | 2021-09-27 11:00:30 | 1 | 1 | 0 | NULL |
6 | 102 | 2002 | 2021-09-28 11:00:00 | 2021-09-28 11:00:30 | 1 | 0 | 1 | NULL |
7 | 103 | 2002 | 2021-09-29 11:00:00 | 2021-10-02 11:00:30 | 1 | 0 | 1 | NULL |
8 | 102 | 2002 | 2021-09-30 11:00:00 | 2021-09-30 11:00:30 | 1 | 1 | 1 | NULL |
9 | 101 | 2001 | 2021-10-01 10:00:00 | 2021-10-01 10:00:20 | 1 | 1 | 0 | NULL |
10 | 102 | 2001 | 2021-10-01 10:00:00 | 2021-10-01 10:00:15 | 0 | 0 | 1 | NULL |
11 | 103 | 2001 | 2021-10-01 11:00:50 | 2021-10-01 11:01:15 | 1 | 1 | 0 | 1732526 |
12 | 106 | 2002 | 2021-10-02 10:59:05 | 2021-10-02 11:00:05 | 2 | 0 | 1 | NULL |
13 | 107 | 2002 | 2021-10-02 10:59:05 | 2021-10-02 11:00:05 | 1 | 0 | 1 | NULL |
14 | 108 | 2002 | 2021-10-02 10:59:05 | 2021-10-02 11:00:05 | 1 | 1 | 1 | NULL |
15 | 109 | 2002 | 2021-10-03 10:59:05 | 2021-10-03 11:00:05 | 0 | 1 | 0 | NULL |
(uid-用户ID, video_id-视频ID, start_time-开始观看时间, end_time-结束观看时间, if_follow-是否关注, if_like-是否点赞, if_retweet-是否转发, comment_id-评论ID)
短视频信息表tb_video_info
id | video_id | author | tag | duration | release_time |
1 | 2001 | 901 | 影视 | 30 | 2020-01-01 07:00:00 |
2 | 2002 | 901 | 美食 | 60 | 2021-01-01 07:00:00 |
3 | 2003 | 902 | 旅游 | 90 | 2020-01-01 07:00:00 |
4 | 2004 | 902 | 美女 | 90 | 2020-01-01 08:00:00 |
(video_id-视频ID, author-创作者ID, tag-类别标签, duration-视频时长, release_time-发布时间)
问题:统计2021年国庆头3天每类视频每天的近一周总点赞量和一周内最大单天转发量,结果按视频类别降序、日期升序排序。假设数据库中数据足够多,至少每个类别下国庆头3天及之前一周的每天都有播放记录。
SELECT l.tag, l.day, SUM(i.likes), MAX(i.retweet)
FROM (
SELECT i.tag, LEFT(l.start_time, 10) day, SUM(l.if_like) AS likes, SUM(l.if_retweet) AS retweet
FROM tb_user_video_log l LEFT JOIN tb_video_info i ON l.video_id = i.video_id
GROUP BY i.tag, day) l LEFT JOIN (
SELECT i.tag, LEFT(l.start_time, 10) day, SUM(l.if_like) AS likes, SUM(l.if_retweet) AS retweet
FROM tb_user_video_log l LEFT JOIN tb_video_info i ON l.video_id = i.video_id
GROUP BY i.tag, day) i ON l.tag = i.tag
WHERE TIMESTAMPDIFF(day, i.day, l.day) < 7 AND TIMESTAMPDIFF(day, i.day, l.day) >= 0 AND l.day
IN ("2021-10-01", "2021-10-02", "2021-10-03")
GROUP BY l.tag, l.day
SQL6 近一个月发布的视频中热度最高的top3视频
描述: 现有用户-视频互动表tb_user_video_log
id | uid | video_id | start_time | end_time | if_follow | if_like | if_retweet | comment_id |
1 | 101 | 2001 | 2021-09-24 10:00:00 | 2021-09-24 10:00:30 | 1 | 1 | 1 | NULL |
2 | 101 | 2001 | 2021-10-01 10:00:00 | 2021-10-01 10:00:31 | 1 | 1 | 0 | NULL |
3 | 102 | 2001 | 2021-10-01 10:00:00 | 2021-10-01 10:00:35 | 0 | 0 | 1 | NULL |
4 | 103 | 2001 | 2021-10-03 11:00:50 | 2021-10-03 10:00:35 | 1 | 1 | 0 | 1732526 |
5 | 106 | 2002 | 2021-10-02 11:00:05 | 2021-10-02 11:01:04 | 2 | 0 | 1 | NULL |
6 | 107 | 2002 | 2021-10-02 10:59:05 | 2021-10-02 11:00:06 | 1 | 0 | 0 | NULL |
7 | 108 | 2002 | 2021-10-02 10:59:05 | 2021-10-02 11:00:05 | 1 | 1 | 1 | NULL |
8 | 109 | 2002 | 2021-10-03 10:59:05 | 2021-10-03 11:00:01 | 0 | 1 | 0 | NULL |
9 | 105 | 2002 | 2021-09-25 11:00:00 | 2021-09-25 11:00:30 | 1 | 0 | 1 | NULL |
10 | 101 | 2003 | 2021-09-26 11:00:00 | 2021-09-26 11:00:30 | 1 | 0 | 0 | NULL |
11 | 101 | 2003 | 2021-09-30 11:00:00 | 2021-09-30 11:00:30 | 1 | 1 | 0 | NULL |
(uid-用户ID, video_id-视频ID, start_time-开始观看时间, end_time-结束观看时间, if_follow-是否关注, if_like-是否点赞, if_retweet-是否转发, comment_id-评论ID)
短视频信息表tb_video_info
id | video_id | author | tag | duration | release_time |
1 | 2001 | 901 | 旅游 | 30 | 2021-09-05 07:00:00 |
2 | 2002 | 901 | 旅游 | 60 | 2021-09-05 07:00:00 |
3 | 2003 | 902 | 影视 | 90 | 2021-09-05 07:00:00 |
4 | 2004 | 902 | 影视 | 90 | 2021-09-05 08:00:00 |
(video_id-视频ID, author-创作者ID, tag-类别标签, duration-视频时长, release_time-发布时间)
问题:找出近一个月发布的视频中热度最高的top3视频。
注:
- 热度=(a*视频完播率+b*点赞数+c*评论数+d*转发数)*新鲜度;
- 新鲜度=1/(最近无播放天数+1);
- 当前配置的参数a,b,c,d分别为100、5、3、2。
- 最近播放日期以end_time-结束观看时间为准,假设为T,则最近一个月按[T-29, T]闭区间统计。
- 结果中热度保留为整数,并按热度降序排序。
SELECT t1.video_id, round(
((SUM(CASE WHEN timestampdiff(second, start_time, end_time)>=duration THEN 1 ELSE 0 END))
/ count(*) * 100 + SUM(if_like) * 5 + COUNT(comment_id) * 3 + SUM(if_retweet) * 2) * 1
/ (DATEDIFF((SELECT date(MAX(end_time))
FROM tb_user_video_log), date(MAX(end_time))) + 1)
) AS hot_index
FROM tb_user_video_log t1, tb_video_info t2
WHERE t1.video_id = t2.video_id
AND DATEDIFF(DATE((SELECT MAX(end_time) FROM tb_user_video_log)), DATE(release_time)) <= 29
GROUP BY t1.video_id ORDER BY hot_index DESC LIMIT 3
02 用户增长场景(某度信息流)
用户增长场景(某度信息流)
SQL7 2021年11月每天的人均浏览文章时长
描述: 用户行为日志表tb_user_log
id | uid | artical_id | in_time | out_time | sign_cin |
1 | 101 | 9001 | 2021-11-01 10:00:00 | 2021-11-01 10:00:31 | 0 |
2 | 102 | 9001 | 2021-11-01 10:00:00 | 2021-11-01 10:00:24 | 0 |
3 | 102 | 9002 | 2021-11-01 11:00:00 | 2021-11-01 11:00:11 | 0 |
4 | 101 | 9001 | 2021-11-02 10:00:00 | 2021-11-02 10:00:50 | 0 |
5 | 102 | 9002 | 2021-11-02 11:00:01 | 2021-11-02 11:00:24 | 0 |
(uid-用户ID, artical_id-文章ID, in_time-进入时间, out_time-离开时间, sign_in-是否签到)
场景逻辑说明:artical_id-文章ID代表用户浏览的文章的ID,artical_id-文章ID为0表示用户在非文章内容页(比如App内的列表页、活动页等)。
问题:统计2021年11月每天的人均浏览文章时长(秒数),结果保留1位小数,并按时长由短到长排序。
SELECT date_format(in_time, "%Y-%m-%d") dt,
round (SUM(timestampdiff(second, in_time, out_time)) / COUNT(DISTINCT uid), 1)avg_view_len_sec
FROM tb_user_log WHERE month(in_time) = 11 AND artical_id != 0
GROUP BY dt ORDER BY avg_view_len_sec
SQL8 每篇文章同一时刻最大在看人数
描述: 用户行为日志表tb_user_log
id | uid | artical_id | in_time | out_time | sign_cin |
1 | 101 | 9001 | 2021-11-01 10:00:00 | 2021-11-01 10:00:11 | 0 |
2 | 102 | 9001 | 2021-11-01 10:00:09 | 2021-11-01 10:00:38 | 0 |
3 | 103 | 9001 | 2021-11-01 10:00:28 | 2021-11-01 10:00:58 | 0 |
4 | 104 | 9002 | 2021-11-01 11:00:45 | 2021-11-01 11:01:11 | 0 |
5 | 105 | 9001 | 2021-11-01 10:00:51 | 2021-11-01 10:00:59 | 0 |
6 | 106 | 9002 | 2021-11-01 11:00:55 | 2021-11-01 11:01:24 | 0 |
7 | 107 | 9001 | 2021-11-01 10:00:01 | 2021-11-01 10:01:50 | 0 |
(uid-用户ID, artical_id-文章ID, in_time-进入时间, out_time-离开时间, sign_in-是否签到)
场景逻辑说明:artical_id-文章ID代表用户浏览的文章的ID,artical_id-文章ID为0表示用户在非文章内容页(比如App内的列表页、活动页等)。
问题:统计每篇文章同一时刻最大在看人数,如果同一时刻有进入也有离开时,先记录用户数增加再记录减少,结果按最大人数降序。
WITH t AS(
SELECT artical_id, in_time dt, 1 diff # 开始等待,人数+1
FROM tb_user_log WHERE artical_id != 0 UNION ALL
SELECT artical_id, out_time dt, -1 diff # 开始等待,人数+1
FROM tb_user_log WHERE artical_id != 0
)
SELECT artical_id, MAX(cnt)ca
FROM(
SELECT artical_id, dt, SUM(diff) OVER(PARTITION BY artical_id ORDER BY dt, diff DESC) cnt FROM t
) t1 GROUP BY artical_id ORDER BY ca DESC
SQL9 2021年11月每天新用户的次日留存率
描述: 用户行为日志表tb_user_log
id | uid | artical_id | in_time | out_time | sign_cin |
1 | 101 | 0 | 2021-11-01 10:00:00 | 2021-11-01 10:00:42 | 1 |
2 | 102 | 9001 | 2021-11-01 10:00:00 | 2021-11-01 10:00:09 | 0 |
3 | 103 | 9001 | 2021-11-01 10:00:01 | 2021-11-01 10:01:50 | 0 |
4 | 101 | 9002 | 2021-11-02 10:00:09 | 2021-11-02 10:00:28 | 0 |
5 | 103 | 9002 | 2021-11-02 10:00:51 | 2021-11-02 10:00:59 | 0 |
6 | 104 | 9001 | 2021-11-02 11:00:28 | 2021-11-02 11:01:24 | 0 |
7 | 101 | 9003 | 2021-11-03 11:00:55 | 2021-11-03 11:01:24 | 0 |
8 | 104 | 9003 | 2021-11-03 11:00:45 | 2021-11-03 11:00:55 | 0 |
9 | 105 | 9003 | 2021-11-03 11:00:53 | 2021-11-03 11:00:59 | 0 |
10 | 101 | 9002 | 2021-11-04 11:00:55 | 2021-11-04 11:00:59 | 0 |
(uid-用户ID, artical_id-文章ID, in_time-进入时间, out_time-离开时间, sign_in-是否签到)
问题:统计2021年11月每天新用户的次日留存率(保留2位小数)
注:
- 次日留存率为当天新增的用户数中第二天又活跃了的用户数占比。
- 如果in_time-进入时间和out_time-离开时间跨天了,在两天里都记为该用户活跃过,结果按日期升序。
WITH 11_new_user AS (
# 2021年11月新用户
SELECT
uid 11_new_user,
first_log
FROM(
# 所有用户第一次登录时间
SELECT
uid, DATE(MIN(in_time)) first_log
FROM tb_user_log
GROUP BY uid
) t1
WHERE DATE_FORMAT(first_log,'%Y%m') = '202111'
), all_user AS (
# 所有用户登录日期
SELECT
DISTINCT uid, DATE(in_time) log_time
FROM tb_user_log
UNION SELECT DISTINCT uid, DATE(out_time)
FROM tb_user_log
)
# 关联新用户和所有用户
SELECT n.first_log dt, round(COUNT(DISTINCT IF(datediff(log_time, first_log) = 1, uid, null)) / COUNT(DISTINCT 11_new_user), 2) uv_left_rate
FROM 11_new_user n, all_user a WHERE uid = 11_new_user GROUP BY n.first_log ORDER BY n.first_log
SQL10 统计活跃间隔对用户分级结果
描述: 用户行为日志表tb_user_log
id | uid | artical_id | in_time | out_time | sign_cin |
1 | 109 | 9001 | 2021-08-31 10:00:00 | 2021-08-31 10:00:09 | 0 |
2 | 109 | 9002 | 2021-11-04 11:00:55 | 2021-11-04 11:00:59 | 0 |
3 | 108 | 9001 | 2021-09-01 10:00:01 | 2021-09-01 10:01:50 | 0 |
4 | 108 | 9001 | 2021-11-03 10:00:01 | 2021-11-03 10:01:50 | 0 |
5 | 104 | 9001 | 2021-11-02 10:00:28 | 2021-11-02 10:00:50 | 0 |
6 | 104 | 9003 | 2021-09-03 11:00:45 | 2021-09-03 11:00:55 | 0 |
7 | 105 | 9003 | 2021-11-03 11:00:53 | 2021-11-03 11:00:59 | 0 |
8 | 102 | 9001 | 2021-10-30 10:00:00 | 2021-10-30 10:00:09 | 0 |
9 | 103 | 9001 | 2021-10-21 10:00:00 | 2021-10-21 10:00:09 | 0 |
10 | 101 | 0 | 2021-10-01 10:00:00 | 2021-10-01 10:00:42 | 1 |
(uid-用户ID, artical_id-文章ID, in_time-进入时间, out_time-离开时间, sign_in-是否签到)
问题:统计活跃间隔对用户分级后,各活跃等级用户占比,结果保留两位小数,且按占比降序排序。
注:
- 用户等级标准简化为:忠实用户(近7天活跃过且非新晋用户)、新晋用户(近7天新增)、沉睡用户(近7天未活跃但更早前活跃过)、流失用户(近30天未活跃但更早前活跃过)。
- 假设今天就是数据中所有日期的最大值。
- 近7天表示包含当天T的近7天,即闭区间[T-6, T]。
SELECT user_grade,
round(COUNT(uid) / (SELECT COUNT(DISTINCT uid) FROM tb_user_log), 2) ratio
FROM (SELECT uid, (
CASE
WHEN datediff(td, last_active) <= 6 AND datediff(td, first_reg) > 6 THEN '忠实用户'
WHEN datediff(td, last_active) <= 6 AND datediff(td, first_reg) <= 6 THEN '新晋用户'
WHEN datediff(td, last_active) > 6 AND datediff(td, last_active) <= 29 THEN '沉睡用户'
WHEN datediff(td, last_active) > 29 THEN '流失用户' ELSE '未分类' END
) user_grade FROM(SELECT uid, MAX(out_time) last_active, MIN(in_time) first_reg, (
SELECT MAX(out_time) FROM tb_user_log) td FROM tb_user_log GROUP BY uid) a
) b GROUP BY user_grade ORDER BY ratio DESC
SQL11 每天的日活数及新用户占比
描述: 用户行为日志表tb_user_log
id | uid | artical_id | in_time | out_time | sign_cin |
1 | 101 | 9001 | 2021-10-31 10:00:00 | 2021-10-31 10:00:09 | 0 |
2 | 102 | 9001 | 2021-10-31 10:00:00 | 2021-10-31 10:00:09 | 0 |
3 | 101 | 0 | 2021-11-01 10:00:00 | 2021-11-01 10:00:42 | 1 |
4 | 102 | 9001 | 2021-11-01 10:00:00 | 2021-11-01 10:00:09 | 0 |
5 | 108 | 9001 | 2021-11-01 10:00:01 | 2021-11-01 10:00:50 | 0 |
6 | 108 | 9001 | 2021-11-02 10:00:01 | 2021-11-02 10:00:50 | 0 |
7 | 104 | 9001 | 2021-11-02 10:00:28 | 2021-11-02 10:00:50 | 0 |
8 | 106 | 9001 | 2021-11-02 10:00:28 | 2021-11-02 10:00:50 | 0 |
9 | 108 | 9001 | 2021-11-03 10:00:01 | 2021-11-03 10:00:50 | 0 |
10 | 109 | 9002 | 2021-11-03 11:00:55 | 2021-11-03 11:00:59 | 0 |
11 | 104 | 9003 | 2021-11-03 11:00:45 | 2021-11-03 11:00:55 | 0 |
12 | 105 | 9003 | 2021-11-03 11:00:53 | 2021-11-03 11:00:59 | 0 |
13 | 106 | 9003 | 2021-11-03 11:00:45 | 2021-11-03 11:00:55 | 0 |
(uid-用户ID, artical_id-文章ID, in_time-进入时间, out_time-离开时间, sign_in-是否签到)
问题:统计每天的日活数及新用户占比
注:
- 新用户占比=当天的新用户数÷当天活跃用户数(日活数)。
- 如果in_time-进入时间和out_time-离开时间跨天了,在两天里都记为该用户活跃过。
- 新用户占比保留2位小数,结果按日期升序排序。
SELECT dt, COUNT(*) AS dau, round(SUM(new) / COUNT(*), 2) AS uv_new_ratio
FROM (SELECT uid, dt, CASE WHEN dt = first_dt THEN 1 ELSE 0 END AS new
FROM (SELECT uid, date(in_time) AS dt
FROM tb_user_log UNION
SELECT uid, date(out_time) AS dt
FROM tb_user_log) t1
LEFT JOIN (SELECT uid, MIN(date(in_time)) AS first_dt
FROM tb_user_log GROUP BY uid) t2 USING(uid)
) t GROUP BY dt ORDER BY dt
SQL12 连续签到领金币
描述: 用户行为日志表tb_user_log
id | uid | artical_id | in_time | out_time | sign_in |
1 | 101 | 0 | 2021-07-07 10:00:00 | 2021-07-07 10:00:09 | 1 |
2 | 101 | 0 | 2021-07-08 10:00:00 | 2021-07-08 10:00:09 | 1 |
3 | 101 | 0 | 2021-07-09 10:00:00 | 2021-07-09 10:00:42 | 1 |
4 | 101 | 0 | 2021-07-10 10:00:00 | 2021-07-10 10:00:09 | 1 |
5 | 101 | 0 | 2021-07-11 23:59:55 | 2021-07-11 23:59:59 | 1 |
6 | 101 | 0 | 2021-07-12 10:00:28 | 2021-07-12 10:00:50 | 1 |
7 | 101 | 0 | 2021-07-13 10:00:28 | 2021-07-13 10:00:50 | 1 |
8 | 102 | 0 | 2021-10-01 10:00:28 | 2021-10-01 10:00:50 | 1 |
9 | 102 | 0 | 2021-10-02 10:00:01 | 2021-10-02 10:01:50 | 1 |
10 | 102 | 0 | 2021-10-03 10:00:55 | 2021-10-03 11:00:59 | 1 |
11 | 102 | 0 | 2021-10-04 10:00:45 | 2021-10-04 11:00:55 | 0 |
12 | 102 | 0 | 2021-10-05 10:00:53 | 2021-10-05 11:00:59 | 1 |
13 | 102 | 0 | 2021-10-06 10:00:45 | 2021-10-06 11:00:55 | 1 |
(uid-用户ID, artical_id-文章ID, in_time-进入时间, out_time-离开时间, sign_in-是否签到)
场景逻辑说明:
- artical_id-文章ID代表用户浏览的文章的ID,特殊情况artical_id-文章ID为0表示用户在非文章内容页(比如App内的列表页、活动页等)。注意:只有artical_id为0时sign_in值才有效。
- 从2021年7月7日0点开始,用户每天签到可以领1金币,并可以开始累积签到天数,连续签到的第3、7天分别可额外领2、6金币。
- 每连续签到7天后重新累积签到天数(即重置签到天数:连续第8天签到时记为新的一轮签到的第一天,领1金币)
问题:计算每个用户2021年7月以来每月获得的金币数(该活动到10月底结束,11月1日开始的签到不再获得金币)。结果按月份、ID升序排序。
注:如果签到记录的in_time-进入时间和out_time-离开时间跨天了,也只记作in_time对应的日期签到了。
# 给确定每个人每个签到阶段的起始日期
SELECT uid,date_format(dt, '%Y%m') AS month, SUM(
CASE WHEN stage_index = 2 THEN 3 WHEN stage_index = 6 THEN 7 ELSE 1 END
) AS coin
FROM (
SELECT uid, dt, (row_number() over(PARTITION BY uid, init_date ORDER BY dt) - 1) % 7 AS stage_index
FROM (
SELECT uid, dt, rn, subdate(dt, rn) AS init_date
FROM (
# 给符合条件的每个人的日期编号
SELECT uid,date(in_time) AS dt, row_number() over(PARTITION BY uid ORDER BY date(in_time)) AS rn
FROM tb_user_log
WHERE date(in_time) >= '2021-07-07'
AND date(in_time) < '2021-11-01'
AND artical_id = 0 AND sign_in = 1
) raw_t
) init_date_t
) a GROUP BY uid, month ORDER BY month, uid
03 电商场景(某东商城)
电商场景(某东商城)
SQL13 计算商城中2021年每月的GMV
描述: 现有订单总表tb_order_overall
id | order_id | uid | event_time | total_amount | total_cnt | status |
1 | 301001 | 101 | 2021-10-01 10:00:00 | 15900 | 2 | 1 |
2 | 301002 | 101 | 2021-10-01 11:00:00 | 15900 | 2 | 1 |
3 | 301003 | 102 | 2021-10-02 10:00:00 | 34500 | 8 | 0 |
4 | 301004 | 103 | 2021-10-12 10:00:00 | 43500 | 9 | 1 |
5 | 301005 | 105 | 2021-11-01 10:00:00 | 31900 | 7 | 1 |
6 | 301006 | 102 | 2021-11-02 10:00:00 | 24500 | 6 | 1 |
7 | 301007 | 102 | 2021-11-03 10:00:00 | -24500 | 6 | 2 |
8 | 301008 | 104 | 2021-11-04 10:00:00 | 55500 | 12 | 0 |
(order_id-订单号, uid-用户ID, event_time-下单时间, total_amount-订单总金额, total_cnt-订单商品总件数, status-订单状态)
场景逻辑说明:
- 用户将购物车中多件商品一起下单时,订单总表会生成一个订单(但此时未付款,status-订单状态为0,表示待付款);
- 当用户支付完成时,在订单总表修改对应订单记录的status-订单状态为1,表示已付款;
- 若用户退货退款,在订单总表生成一条交易总金额为负值的记录(表示退款金额,订单号为退款单号,status-订单状态为2表示已退款)。
问题:请计算商城中2021年每月的GMV,输出GMV大于10w的每月GMV,值保留到整数。
注:GMV为已付款订单和未付款订单两者之和。结果按GMV升序排序。
SELECT date_format(event_time, "%Y-%m") month, SUM(total_amount) GMV
FROM tb_order_overall
WHERE status IN (1, 0) AND year(event_time) = 2021
GROUP BY month HAVING GMV > 100000 ORDER BY GMV
SQL14 统计2021年10月每个退货率不大于0.5的商品各项指标
描述: 现有用户对展示的商品行为表tb_user_event
id | uid | product_id | event_time | if_click | if_cart | if_payment | if_refund |
1 | 101 | 8001 | 2021-10-01 10:00:00 | 0 | 0 | 0 | 0 |
2 | 102 | 8001 | 2021-10-01 10:00:00 | 1 | 0 | 0 | 0 |
3 | 103 | 8001 | 2021-10-01 10:00:00 | 1 | 1 | 0 | 0 |
4 | 104 | 8001 | 2021-10-02 10:00:00 | 1 | 1 | 1 | 0 |
5 | 105 | 8001 | 2021-10-02 10:00:00 | 1 | 1 | 1 | 0 |
6 | 101 | 8002 | 2021-10-03 10:00:00 | 1 | 1 | 1 | 0 |
7 | 109 | 8001 | 2021-10-04 10:00:00 | 1 | 1 | 1 | 1 |
(uid-用户ID, product_id-商品ID, event_time-行为时间, if_click-是否点击, if_cart-是否加购物车, if_payment-是否付款, if_refund-是否退货退款)
问题:请统计2021年10月每个有展示记录的退货率不大于0.5的商品各项指标,
注:
- 商品点展比=点击数÷展示数;
- 加购率=加购数÷点击数;
- 成单率=付款数÷加购数;退货率=退款数÷付款数,
- 当分母为0时整体结果记为0,结果中各项指标保留3位小数,并按商品ID升序排序。
SELECT product_id, round(click_cnt/show_cnt, 3) AS ctr,
round(IF(click_cnt>0, cart_cnt/click_cnt, 0), 3) AS cart_rate,
round(IF(cart_cnt>0, payment_cnt/cart_cnt, 0), 3) AS payment_rate,
round(IF(payment_cnt>0, refund_cnt/payment_cnt, 0), 3) AS refund_rate
FROM (
SELECT product_id, COUNT(1) AS show_cnt, SUM(if_click) AS click_cnt, SUM(if_cart) AS cart_cnt, SUM(if_payment) AS payment_cnt, SUM(if_refund) AS refund_cnt
FROM tb_user_event WHERE DATE_FORMAT(event_time, '%Y%m') = '202110' GROUP BY product_id
) AS t_product_index_cnt WHERE payment_cnt = 0 OR refund_cnt/payment_cnt <= 0.5 ORDER BY product_id
SQL15 某店铺的各商品毛利率及店铺整体毛利率
描述: 商品信息表tb_product_info
id | product_id | shop_id | tag | in_price | quantity | release_time |
1 | 8001 | 901 | 家电 | 6000 | 100 | 2020-01-01 10:00:00 |
2 | 8002 | 902 | 家电 | 12000 | 50 | 2020-01-01 10:00:00 |
3 | 8003 | 901 | 3C数码 | 12000 | 50 | 2020-01-01 10:00:00 |
(product_id-商品ID, shop_id-店铺ID, tag-商品类别标签, in_price-进货价格, quantity-进货数量, release_time-上架时间)
订单总表tb_order_overall
id | order_id | uid | event_time | total_amount | total_cnt | status |
1 | 301001 | 101 | 2021-10-01 10:00:00 | 30000 | 3 | 1 |
2 | 301002 | 102 | 2021-10-01 11:00:00 | 23900 | 2 | 1 |
3 | 301003 | 103 | 2021-10-02 10:00:00 | 31000 | 2 | 1 |
(order_id-订单号, uid-用户ID, event_time-下单时间, total_amount-订单总金额, total_cnt-订单商品总件数, status-订单状态)
订单明细表tb_order_detail
id | order_id | product_id | price | cnt |
1 | 301001 | 8001 | 8500 | 2 |
2 | 301001 | 8002 | 15000 | 1 |
3 | 301002 | 8001 | 8500 | 1 |
4 | 301002 | 8002 | 16000 | 1 |
5 | 301003 | 8002 | 14000 | 1 |
6 | 301003 | 8003 | 18000 | 1 |
(order_id-订单号, product_id-商品ID, price-商品单价, cnt-下单数量)
场景逻辑说明:
-
用户将购物车中多件商品一起下单时,订单总表会生成一个订单(但此时未付款,status-订单状态为0表示待付款),在订单明细表生成该订单中每个商品的信息;
-
当用户支付完成时,在订单总表修改对应订单记录的status-订单状态为1表示已付款;
-
若用户退货退款,在订单总表生成一条交易总金额为负值的记录(表示退款金额,订单号为退款单号,status-订单状态为2表示已退款)。
问题:请计算2021年10月以来店铺901中商品毛利率大于24.9%的商品信息及店铺整体毛利率。
注:商品毛利率=(1-进价/平均单件售价)*100%;
店铺毛利率=(1-总进价成本/总销售收入)*100%。
结果先输出店铺毛利率,再按商品ID升序输出各商品毛利率,均保留1位小数。
SELECT '店铺汇总' AS product_id, concat(round((1 - (SUM(in_price*cnt) / SUM(price * cnt))) * 100, 1), "%")
FROM tb_order_detail AS o LEFT JOIN tb_order_overall AS oo ON o.order_id = oo.order_id
LEFT JOIN tb_product_info AS p ON o.product_id = p.product_id
WHERE TIMESTAMPDIFF(day, "2021-09-30", event_time) > 0
AND p.shop_id = 901 GROUP BY p.shop_id UNION ALL
SELECT product_id, concat(round(rate * 100, 1), "%")
FROM (SELECT o.product_id AS product_id, (1 - (SUM(in_price) * SUM(cnt)) / (SUM(price) * SUM(cnt))) AS rate
FROM tb_order_detail AS o LEFT JOIN tb_order_overall AS oo ON o.order_id = oo.order_id
LEFT JOIN tb_product_info AS p ON o.product_id = p.product_id
WHERE TIMESTAMPDIFF(day, "2021-09-30", event_time) > 0 AND p.shop_id = 901
GROUP BY o.product_id ORDER BY product_id) AS t WHERE rate > 0.249
SQL16 零食类商品中复购率top3高的商品
描述: 商品信息表tb_product_info
id | product_id | shop_id | tag | int_ | quantity | release_time |
1 | 8001 | 901 | 零食 | 60 | 1000 | 2020-01-01 10:00:00 |
2 | 8002 | 901 | 零食 | 140 | 500 | 2020-01-01 10:00:00 |
3 | 8003 | 901 | 零食 | 160 | 500 | 2020-01-01 10:00:00 |
(product_id-商品ID, shop_id-店铺ID, tag-商品类别标签, in_price-进货价格, quantity-进货数量, release_time-上架时间)
订单总表tb_order_overall
id | order_id | uid | event_time | total_amount | total_cnt | status |
1 | 301001 | 101 | 2021-09-30 10:00:00 | 140 | 1 | 1 |
2 | 301002 | 102 | 2021-10-01 11:00:00 | 235 | 2 | 1 |
3 | 301011 | 102 | 2021-10-31 11:00:00 | 250 | 2 | 1 |
4 | 301003 | 101 | 2021-10-02 10:00:00 | 300 | 2 | 1 |
5 | 301013 | 105 | 2021-10-02 10:00:00 | 300 | 2 | 1 |
6 | 301005 | 104 | 2021-10-03 10:00:00 | 170 | 1 | 1 |
(order_id-订单号, uid-用户ID, event_time-下单时间, total_amount-订单总金额, total_cnt-订单商品总件数, status-订单状态)
订单明细表tb_order_detail
id | order_id | product_id | price | cnt |
1 | 301001 | 8002 | 150 | 1 |
2 | 301011 | 8003 | 200 | 1 |
3 | 301011 | 8001 | 80 | 1 |
4 | 301002 | 8001 | 85 | 1 |
5 | 301002 | 8003 | 180 | 1 |
6 | 301003 | 8002 | 140 | 1 |
7 | 301003 | 8003 | 180 | 1 |
8 | 301013 | 8002 | 140 | 2 |
9 | 301005 | 8003 | 180 | 1 |
(order_id-订单号, product_id-商品ID, price-商品单价, cnt-下单数量)
场景逻辑说明:
-
用户将购物车中多件商品一起下单时,订单总表会生成一个订单(但此时未付款, status-订单状态-订单状态为0表示待付款),在订单明细表生成该订单中每个商品的信息;
-
当用户支付完成时,在订单总表修改对应订单记录的status-订单状态-订单状态为1表示已付款;
-
若用户退货退款,在订单总表生成一条交易总金额为负值的记录(表示退款金额,订单号为退款单号,订单状态为2表示已退款)。
问题:请统计零食类商品中复购率top3高的商品。
注:复购率指用户在一段时间内对某商品的重复购买比例,复购率越大,则反映出消费者对品牌的忠诚度就越高,也叫回头率。
此处我们定义:某商品复购率 = 近90天内购买它至少两次的人数 ÷ 购买它的总人数
近90天指包含最大日期(记为当天)在内的近90天。结果中复购率保留3位小数,并按复购率倒序、商品ID升序排序。
SELECT product_id, round(SUM(IF(cnt >= 2, 1, 0)) / COUNT(*), 3) repurchase_rate FROM
(
SELECT a.product_id, b.uid, COUNT(uid) cnt FROM tb_order_detail AS a
LEFT JOIN tb_order_overall AS b ON a.order_id = b.order_id
LEFT JOIN tb_product_info AS c ON c.product_id = a.product_id
WHERE datediff((SELECT MAX(event_time) FROM tb_order_overall), event_time) < 90 AND tag='零食'
GROUP BY a.product_id, b.uid
) AS d GROUP BY product_id ORDER BY repurchase_rate DESC, product_id LIMIT 3
SQL17 10月的新户客单价和获客成本
描述: 商品信息表tb_product_info
id | product_id | shop_id | tag | int_ | quantity | release_time |
1 | 8001 | 901 | 日用 | 60 | 1000 | 2020-01-01 10:00:00 |
2 | 8002 | 901 | 零食 | 140 | 500 | 2020-01-01 10:00:00 |
3 | 8003 | 901 | 零食 | 160 | 500 | 2020-01-01 10:00:00 |
4 | 8004 | 902 | 零食 | 130 | 500 | 2020-01-01 10:00:00 |
(product_id-商品ID, shop_id-店铺ID, tag-商品类别标签, in_price-进货价格, quantity-进货数量, release_time-上架时间)
订单总表tb_order_overall
id | order_id | uid | event_time | total_amount | total_cnt | status |
1 | 301002 | 102 | 2021-10-01 11:00:00 | 235 | 2 | 1 |
2 | 301003 | 101 | 2021-10-02 10:00:00 | 300 | 2 | 1 |
3 | 301005 | 104 | 2021-10-03 10:00:00 | 160 | 1 | 1 |
(order_id-订单号, uid-用户ID, event_time-下单时间, total_amount-订单总金额, total_cnt-订单商品总件数, status-订单状态)
订单明细表tb_order_detail
id | order_id | product_id | price | cnt |
1 | 301002 | 8001 | 85 | 1 |
2 | 301002 | 8003 | 180 | 1 |
3 | 301003 | 8004 | 140 | 1 |
4 | 301003 | 8003 | 180 | 1 |
5 | 301005 | 8003 | 180 | 1 |
(order_id-订单号, product_id-商品ID, price-商品单价, cnt-下单数量)
问题:请计算2021年10月商城里所有新用户的首单平均交易金额(客单价)和平均获客成本(保留一位小数)。
注:订单的优惠金额 = 订单明细里的{该订单各商品单价×数量之和} - 订单总表里的{订单总金额} 。
SELECT
ROUND(SUM(total_amount) / COUNT(uid), 1) AS avg_amount,
ROUND(SUM(firstly_amount - total_amount) / COUNT(uid), 1) AS avg_cost
FROM
(SELECT uid, order_id, total_amount
FROM
(SELECT *, RANK()OVER(PARTITION BY uid ORDER BY event_time) AS order_rank
FROM tb_order_overall) AS t1
WHERE order_rank = 1 AND DATE_FORMAT(event_time, '%Y%m') = 202110) AS t2
JOIN
(SELECT order_id, SUM(price * cnt) AS firstly_amount
FROM tb_order_detail GROUP BY order_id) AS t3 ON t2.order_id = t3.order_id
SQL18 店铺901国庆期间的7日动销率和滞销率
描述: 商品信息表tb_product_info
id | product_id | shop_id | tag | int_ | quantity | release_time |
1 | 8001 | 901 | 日用 | 60 | 1000 | 2020-01-01 10:00:00 |
2 | 8002 | 901 | 零食 | 140 | 500 | 2020-01-01 10:00:00 |
3 | 8003 | 901 | 零食 | 160 | 500 | 2020-01-01 10:00:00 |
(product_id-商品ID, shop_id-店铺ID, tag-商品类别标签, in_price-进货价格, quantity-进货数量, release_time-上架时间)
订单总表tb_order_overall
id | order_id | uid | event_time | total_amount | total_cnt | status |
1 | 301004 | 102 | 2021-09-30 10:00:00 | 170 | 1 | 1 |
2 | 301005 | 104 | 2021-10-01 10:00:00 | 160 | 1 | 1 |
3 | 301003 | 101 | 2021-10-02 10:00:00 | 300 | 2 | 1 |
4 | 301002 | 102 | 2021-10-03 11:00:00 | 235 | 2 | 1 |
(order_id-订单号, uid-用户ID, event_time-下单时间, total_amount-订单总金额, total_cnt-订单商品总件数, status-订单状态)
订单明细表tb_order_detail
id | order_id | product_id | price | cnt |
1 | 301004 | 8002 | 180 | 1 |
2 | 301005 | 8002 | 170 | 1 |
3 | 301002 | 8001 | 85 | 1 |
4 | 301002 | 8003 | 180 | 1 |
5 | 301003 | 8002 | 150 | 1 |
6 | 301003 | 8003 | 180 | 1 |
(order_id-订单号, product_id-商品ID, price-商品单价, cnt-下单数量)
问题:请计算店铺901在2021年国庆头3天的7日动销率和滞销率,结果保留3位小数,按日期升序排序。
注:
- 动销率定义为店铺中一段时间内有销量的商品占当前已上架总商品数的比例(有销量的商品/已上架总商品数)。
- 滞销率定义为店铺中一段时间内没有销量的商品占当前已上架总商品数的比例。(没有销量的商品/已上架总商品数)。
- 只要当天任一店铺有任何商品的销量就输出该天的结果,即使店铺901当天的动销率为0。
WITH t AS (
SELECT tod.order_id, tod.product_id, date(event_time) dt, shop_id
FROM tb_order_detail tod
LEFT JOIN tb_order_overall too ON tod.order_id = too.order_id
LEFT JOIN tb_product_info tpi ON tod.product_id = tpi.product_id
WHERE shop_id = 901
)
SELECT dt,sale_rate,unsale_rate
FROM (
SELECT '2021-10-01' dt, round(SUM(inde_1) / COUNT(*), 3) sale_rate, round(1 - SUM(inde_1) / COUNT(*), 3) unsale_rate
FROM (SELECT MAX(IF(dt BETWEEN '2021-09-25' AND '2021-10-01', 1, 0)) inde_1 FROM t GROUP BY product_id
) t1 UNION ALL
SELECT '2021-10-02' dt, round(SUM(inde_2) / COUNT(*), 3) sale_rate, round(1 - SUM(inde_2) / COUNT(*), 3) unsale_rate
FROM (
SELECT MAX(IF(dt BETWEEN '2021-09-26' AND '2021-10-02', 1, 0)) inde_2 FROM t GROUP BY product_id
) t2 UNION ALL SELECT '2021-10-03' dt, round(SUM(inde_3) / count(*), 3) sale_rate, round(1 - SUM(inde_3) / COUNT(*), 3) unsale_rate
FROM (SELECT MAX(IF(dt BETWEEN '2021-09-27' AND '2021-10-03', 1, 0)) inde_3 FROM t GROUP BY product_id
) t3) tt WHERE tt.dt IN (SELECT DISTINCT date(event_time) FROM tb_order_overall)
04 出行场景(某滴打车)
出行场景(某滴打车)
SQL19 2021年国庆在北京接单3次及以上的司机统计信息
描述: 用户打车记录表tb_get_car_record
id | uid | city | event_time | end_time | order_id |
1 | 101 | 北京 | 2021-10-01 07:00:00 | 2021-10-01 07:02:00 | NULL |
2 | 102 | 北京 | 2021-10-01 09:00:30 | 2021-10-01 09:01:00 | 9001 |
3 | 101 | 北京 | 2021-10-01 08:28:10 | 2021-10-01 08:30:00 | 9002 |
4 | 103 | 北京 | 2021-10-02 07:59:00 | 2021-10-02 08:01:00 | 9003 |
5 | 104 | 北京 | 2021-10-03 07:59:20 | 2021-10-03 08:01:00 | 9004 |
6 | 105 | 北京 | 2021-10-01 08:00:00 | 2021-10-01 08:02:10 | 9005 |
7 | 106 | 北京 | 2021-10-01 17:58:00 | 2021-10-01 18:01:00 | 9006 |
8 | 107 | 北京 | 2021-10-02 11:00:00 | 2021-10-02 11:01:00 | 9007 |
9 | 108 | 北京 | 2021-10-02 21:00:00 | 2021-10-02 21:01:00 | 9008 |
(uid-用户ID, city-城市, event_time-打车时间, end_time-打车结束时间, order_id-订单号)
打车订单表tb_get_car_order
id | oeder_id | uid | driver_id | order_time | start_time | finish_time | mileage | fare | grade |
1 | 9002 | 101 | 201 | 2021-10-01 08:30:00 | NULL | 2021-10-01 08:31:00 | NULL | NULL | NULL |
2 | 9001 | 102 | 202 | 2021-10-01 09:01:00 | 2021-10-01 09:06:00 | 2021-10-01 09:31:00 | 10 | 41.5 | 5 |
3 | 9003 | 103 | 202 | 2021-10-02 08:01:00 | 2021-10-02 08:15:00 | 2021-10-02 08:31:00 | 11 | 41.5 | 4 |
4 | 9004 | 104 | 202 | 2021-10-03 08:01:00 | 2021-10-03 08:13:00 | 2021-10-03 08:31:00 | 7.5 | 22 | 4 |
5 | 9005 | 105 | 203 | 2021-10-01 08:02:10 | 2021-10-01 08:18:00 | 2021-10-01 08:31:00 | 15 | 44 | 5 |
6 | 9006 | 106 | 203 | 2021-10-01 18:01:00 | 2021-10-01 18:09:00 | 2021-10-01 18:31:00 | 8 | 25 | 5 |
7 | 9007 | 107 | 203 | 2021-10-02 11:01:00 | 2021-10-02 11:07:00 | 2021-10-02 11:31:00 | 9.9 | 30 | 5 |
8 | 9008 | 108 | 203 | 2021-10-02 21:01:00 | 2021-10-02 21:10:00 | 2021-10-02 21:31:00 | 13.2 | 38 | 4 |
(order_id-订单号, uid-用户ID, driver_id-司机ID, order_time-接单时间, start_time-开始计费的上车时间, finish_time-订单完成时间, mileage-行驶里程数, fare-费用, grade-评分)
场景逻辑说明:
-
用户提交打车请求后,在用户打车记录表生成一条打车记录,order_id-订单号设为null;
-
当有司机接单时,在打车订单表生成一条订单,填充order_time-接单时间及其左边的字段,start_time-开始计费的上车时间及其右边的字段全部为null,并把order_id-订单号和order_time-接单时间(end_time-打车结束时间)写入打车记录表;若一直无司机接单,超时或中途用户主动取消打车,则记录end_time-打车结束时间。
-
若乘客上车前,乘客或司机点击取消订单,会将打车订单表对应订单的finish_time-订单完成时间填充为取消时间,其余字段设为null。
-
当司机接上乘客时,填充订单表中该start_time-开始计费的上车时间。
-
当订单完成时填充订单完成时间、里程数、费用;评分设为null,在用户给司机打1~5星评价后填充。
问题:请统计2021年国庆7天期间在北京市接单至少3次的司机的平均接单数和平均兼职收入(暂不考虑平台佣金,直接计算完成的订单费用总额),结果保留3位小数。
SELECT "北京" as city, ROUND(AVG(order_num), 3) AS avg_order_num, ROUND(AVG(income), 3) AS avg_income
FROM (
SELECT driver_id, COUNT(order_id) AS order_num, SUM(fare) AS income
FROM tb_get_car_order
JOIN tb_get_car_record USING(order_id)
WHERE city = "北京" AND DATE_FORMAT(order_time, "%Y%m%d") BETWEEN '20211001' AND '20211007'
GROUP BY driver_id
HAVING COUNT(order_id) >= 3
) AS t_driver_info
SQL20 有取消订单记录的司机平均评分
描述: 现有用户打车记录表tb_get_car_record
id | uid | city | event_time | end_time | order_id |
1 | 101 | 北京 | 2021-10-01 07:00:00 | 2021-10-01 07:02:00 | NULL |
2 | 102 | 北京 | 2021-10-01 09:00:30 | 2021-10-01 09:01:00 | 9001 |
3 | 101 | 北京 | 2021-10-01 08:28:10 | 2021-10-01 08:30:00 | 9002 |
4 | 103 | 北京 | 2021-10-02 07:59:00 | 2021-10-02 08:01:00 | 9003 |
5 | 104 | 北京 | 2021-10-03 07:59:20 | 2021-10-03 08:01:00 | 9004 |
6 | 105 | 北京 | 2021-10-01 08:00:00 | 2021-10-01 08:02:10 | 9005 |
7 | 106 | 北京 | 2021-10-01 17:58:00 | 2021-10-01 18:01:00 | 9006 |
8 | 107 | 北京 | 2021-10-02 11:00:00 | 2021-10-02 11:01:00 | 9007 |
9 | 108 | 北京 | 2021-10-02 21:00:00 | 2021-10-02 21:01:00 | 9008 |
10 | 109 | 北京 | 2021-10-08 18:00:00 | 2021-10-08 18:01:00 | 9009 |
(uid-用户ID, city-城市, event_time-打车时间, end_time-打车结束时间, order_id-订单号)
打车订单表tb_get_car_order
id | order_id | uid | driver_id | order_time | start_time | finish_time | mileage | fare | grade |
1 | 9002 | 101 | 202 | 2021-10-01 08:30:00 | null | 2021-10-01 08:31:00 | null | null | null |
2 | 9001 | 102 | 202 | 2021-10-01 09:01:00 | 2021-10-01 09:06:00 | 2021-10-01 09:31:00 | 10.0 | 41.5 | 5 |
3 | 9003 | 103 | 202 | 2021-10-02 08:01:00 | 2021-10-02 08:15:00 | 2021-10-02 08:31:00 | 11.0 | 41.5 | 4 |
4 | 9004 | 104 | 202 | 2021-10-03 08:01:00 | 2021-10-03 08:13:00 | 2021-10-03 08:31:00 | 7.5 | 22 | 4 |
5 | 9005 | 105 | 203 | 2021-10-01 08:02:10 | null | 2021-10-01 08:31:00 | null | null | null |
6 | 9006 | 106 | 203 | 2021-10-01 18:01:00 | 2021-10-01 18:09:00 | 2021-10-01 18:31:00 | 8.0 | 25.5 | 5 |
7 | 9007 | 107 | 203 | 2021-10-02 11:01:00 | 2021-10-02 11:07:00 | 2021-10-02 11:31:00 | 9.9 | 30 | 5 |
8 | 9008 | 108 | 203 | 2021-10-02 21:01:00 | 2021-10-02 21:10:00 | 2021-10-02 21:31:00 | 13.2 | 38 | 4 |
9 | 9009 | 109 | 203 | 2021-10-08 18:01:00 | 2021-10-08 18:11:50 | 2021-10-08 18:51:00 | 13 | 40 | 5 |
(order_id-订单号, uid-用户ID, driver_id-司机ID, order_time-接单时间, start_time-开始计费的上车时间, finish_time-订单完成时间, mileage-行驶里程数, fare-费用, grade-评分)
场景逻辑说明:
-
用户提交打车请求后,在用户打车记录表生成一条打车记录,order_id-订单号设为null;
-
当有司机接单时,在打车订单表生成一条订单,填充order_time-接单时间及其左边的字段,start_time-开始计费的上车时间及其右边的字段全部为null,并把order_id-订单号和order_time-接单时间(end_time-打车结束时间)写入打车记录表;若一直无司机接单,超时或中途用户主动取消打车,则记录end_time-打车结束时间。
-
若乘客上车前,乘客或司机点击取消订单,会将打车订单表对应订单的finish_time-订单完成时间填充为取消时间,其余字段设为null。
-
当司机接上乘客时,填充订单表中该start_time-开始计费的上车时间。
-
当订单完成时填充订单完成时间、里程数、费用;评分设为null,在用户给司机打1~5星评价后填充。
问题:请找到2021年10月有过取消订单记录的司机,计算他们每人全部已完成的有评分订单的平均评分及总体平均评分,保留1位小数。先按driver_id升序输出,再输出总体情况。
SELECT COALESCE(driver_id, '总体'), round(SUM(grade) / count(*), 1) FROM tb_get_car_order
WHERE driver_id IN (
SELECT driver_id FROM tb_get_car_order
WHERE start_time IS null AND DATE_FORMAT(order_time, '%Y-%m') = '2021-10'
) AND start_time IS NOT null GROUP BY driver_id WITH ROLLUP
SQL21 每个城市中评分最高的司机信息
描述: 用户打车记录表tb_get_car_record
id | uid | city | event_time | end_time | order_id |
1 | 101 | 北京 | 2021-10-01 07:00:00 | 2021-10-01 07:02:00 | NULL |
2 | 102 | 北京 | 2021-10-01 09:00:30 | 2021-10-01 09:01:00 | 9001 |
3 | 101 | 北京 | 2021-10-01 08:28:10 | 2021-10-01 08:30:00 | 9002 |
4 | 103 | 北京 | 2021-10-02 07:59:00 | 2021-10-02 08:01:00 | 9003 |
5 | 104 | 北京 | 2021-10-03 07:59:20 | 2021-10-03 08:01:00 | 9004 |
6 | 105 | 北京 | 2021-10-01 08:00:00 | 2021-10-01 08:02:10 | 9005 |
7 | 106 | 北京 | 2021-10-01 17:58:00 | 2021-10-01 18:01:00 | 9006 |
8 | 107 | 北京 | 2021-10-02 11:00:00 | 2021-10-02 11:01:00 | 9007 |
9 | 108 | 北京 | 2021-10-02 21:00:00 | 2021-10-02 21:01:00 | 9008 |
10 | 109 | 北京 | 2021-10-08 18:00:00 | 2021-10-08 18:01:00 | 9009 |
(uid-用户ID, city-城市, event_time-打车时间, end_time-打车结束时间, order_id-订单号)
打车订单表tb_get_car_order
id | oeder_id | uid | driver_id | order_time | start_time | finish_time | mileage | fare | grade |
1 | 9002 | 101 | 202 | 2021-10-01 08:30:00 | NULL | 2021-10-01 08:31:00 | NULL | NULL | NULL |
2 | 9001 | 102 | 202 | 2021-10-01 09:01:00 | 2021-10-01 09:06:00 | 2021-10-01 09:31:00 | 10 | 41.5 | 5 |
3 | 9003 | 103 | 202 | 2021-10-02 08:01:00 | 2021-10-02 08:15:00 | 2021-10-02 08:31:00 | 11 | 41.5 | 4 |
4 | 9004 | 104 | 202 | 2021-10-03 08:01:00 | 2021-10-03 08:13:00 | 2021-10-03 08:31:00 | 7.5 | 22 | 4 |
5 | 9005 | 105 | 203 | 2021-10-01 08:02:10 | NULL | 2021-10-01 08:31:00 | NULL | NULL | NULL |
6 | 9006 | 106 | 203 | 2021-10-01 18:01:00 | 2021-10-01 18:09:00 | 2021-10-01 18:31:00 | 8 | 25.5 | 5 |
7 | 9007 | 107 | 203 | 2021-10-02 11:01:00 | 2021-10-02 11:07:00 | 2021-10-02 11:31:00 | 9.9 | 30 | 5 |
8 | 9008 | 108 | 203 | 2021-10-02 21:01:00 | 2021-10-02 21:10:00 | 2021-10-02 21:31:00 | 13.2 | 38 | 4 |
9 | 9009 | 109 | 203 | 2021-10-08 18:01:00 | 2021-10-08 18:11:50 | 2021-10-08 18:51:00 | 13 | 40 | 5 |
(order_id-订单号, uid-用户ID, driver_id-司机ID, order_time-接单时间, start_time-开始计费的上车时间, finish_time-订单完成时间, mileage-行驶里程数, fare-费用, grade-评分)
场景逻辑说明:
-
用户提交打车请求后,在用户打车记录表生成一条打车记录,order_id-订单号设为null;
-
当有司机接单时,在打车订单表生成一条订单,填充order_time-接单时间及其左边的字段,start_time-开始计费的上车时间及其右边的字段全部为null,并把order_id-订单号和order_time-接单时间(end_time-打车结束时间)写入打车记录表;若一直无司机接单,超时或中途用户主动取消打车,则记录end_time-打车结束时间。
-
若乘客上车前,乘客或司机点击取消订单,会将打车订单表对应订单的finish_time-订单完成时间填充为取消时间,其余字段设为null。
-
当司机接上乘客时,填充订单表中该start_time-开始计费的上车时间。
-
当订单完成时填充订单完成时间、里程数、费用;评分设为null,在用户给司机打1~5星评价后填充。
问题:请统计每个城市中评分最高的司机平均评分、日均接单量和日均行驶里程数。
注:有多个司机评分并列最高时,都输出。
平均评分和日均接单量保留1位小数,
日均行驶里程数保留3位小数,按日均接单数升序排序。
WITH c AS (SELECT driver_id
FROM(
SELECT *,rank() over(PARTITION BY city ORDER BY avg_grade DESC) AS r
FROM (
SELECT city,driver_id, AVG(grade) AS avg_grade
FROM tb_get_car_order tbgco
JOIN tb_get_car_record tbgcr USING(order_id)
GROUP BY city,driver_id) AS a) AS b
WHERE r = 1)
SELECT city, tbgco.driver_id,
round(AVG(grade), 1) AS avg_grade,
round(count(order_time) / COUNT(DISTINCT DATE(order_time)),1) AS avg_order_num,
SUM(mileage) / COUNT(DISTINCT DATE(order_time)) AS avg_mileage
FROM tb_get_car_order tbgco
JOIN tb_get_car_record tbgcr USING(order_id)
WHERE tbgco.driver_id IN (SELECT * FROM c)
GROUP BY city,tbgco.driver_id
ORDER BY avg_order_num
SQL22 国庆期间近7日日均取消订单量
描述: 现有用户打车记录表tb_get_car_record
id | uid | city | event_time | end_time | order_id |
1 | 101 | 北京 | 2021-09-25 08:28:10 | 2021-09-25 08:30:00 | 9011 |
2 | 102 | 北京 | 2021-09-25 09:00:30 | 2021-09-25 09:01:00 | 9012 |
3 | 103 | 北京 | 2021-09-26 07:59:00 | 2021-09-26 08:01:00 | 9013 |
4 | 104 | 北京 | 2021-09-26 07:59:00 | 2021-09-26 08:01:00 | 9023 |
5 | 104 | 北京 | 2021-09-27 07:59:20 | 2021-09-27 08:01:00 | 9014 |
6 | 105 | 北京 | 2021-09-28 08:00:00 | 2021-09-28 08:02:10 | 9015 |
7 | 106 | 北京 | 2021-09-29 17:58:00 | 2021-09-29 18:01:00 | 9016 |
8 | 107 | 北京 | 2021-09-30 11:00:00 | 2021-09-30 11:01:00 | 9017 |
9 | 108 | 北京 | 2021-09-30 21:00:00 | 2021-09-30 21:01:00 | 9018 |
10 | 102 | 北京 | 2021-10-01 09:00:30 | 2021-10-01 09:01:00 | 9002 |
11 | 106 | 北京 | 2021-10-01 17:58:00 | 2021-10-01 18:01:00 | 9006 |
12 | 101 | 北京 | 2021-10-02 08:28:10 | 2021-10-02 08:30:00 | 9001 |
13 | 107 | 北京 | 2021-10-02 11:00:00 | 2021-10-02 11:01:00 | 9007 |
14 | 108 | 北京 | 2021-10-02 21:00:00 | 2021-10-02 21:01:00 | 9008 |
15 | 103 | 北京 | 2021-10-02 07:59:00 | 2021-10-02 08:01:00 | 9003 |
16 | 104 | 北京 | 2021-10-03 07:59:20 | 2021-10-03 08:01:00 | 9004 |
17 | 109 | 北京 | 2021-10-03 18:00:00 | 2021-10-03 18:01:00 | 9009 |
(uid-用户ID, city-城市, event_time-打车时间, end_time-打车结束时间, order_id-订单号)
打车订单表tb_get_car_order
id | oeder_id | uid | driver_id | order_time | start_time | finish_time | mileage | fare | grade |
1 | 9011 | 101 | 211 | 2021-09-25 08:30:00 | 2021-09-25 08:31:00 | 2021-09-25 08:54:00 | 10 | 35 | 5 |
2 | 9012 | 102 | 211 | 2021-09-25 09:01:00 | 2021-09-25 09:01:50 | 2021-09-25 09:28:00 | 11 | 32 | 5 |
3 | 9013 | 103 | 212 | 2021-09-26 08:01:00 | 2021-09-26 08:03:00 | 2021-09-26 08:27:00 | 12 | 31 | 4 |
4 | 9023 | 104 | 213 | 2021-09-26 08:01:00 | NULL | 2021-09-26 08:27:00 | NULL | NULL | NULL |
5 | 9014 | 104 | 212 | 2021-09-27 08:01:00 | 2021-09-27 08:04:00 | 2021-09-27 08:21:00 | 11 | 31 | 5 |
6 | 9015 | 105 | 212 | 2021-09-28 08:02:10 | 2021-09-28 08:04:10 | 2021-09-28 08:25:10 | 12 | 31 | 4 |
7 | 9016 | 106 | 213 | 2021-09-29 18:01:00 | 2021-09-2 918:02:10 | 2021-09-29 18:23:00 | 11 | 39 | 4 |
8 | 9017 | 107 | 213 | 2021-09-3011:01:00 | 2021-09-30 11:01:40 | 2021-09-30 11:31:00 | 11 | 38 | 5 |
9 | 9018 | 108 | 214 | 2021-09-30 21:01:00 | 2021-09-30 21:02:50 | 2021-09-30 21:21:00 | 14 | 38 | 5 |
10 | 9002 | 102 | 202 | 2021-10-01 09:01:00 | 2021-10-01 0 9:06:00 | 2021-10-01 09:31:00 | 10 | 41.5 | 5 |
11 | 9006 | 106 | 203 | 2021-10-0118:01:00 | 2021-10-01 18:09:00 | 2021-10-01 18:31:00 | 8 | 25.5 | 4 |
12 | 9001 | 101 | 202 | 2021-10-02 08:30:00 | NULL | 2021-10-02 08:31:00 | NULL | NULL | NULL |
13 | 9007 | 107 | 203 | 2021-10-02 11:01:00 | 2021-10-02 11:07:00 | 2021-10-02 11:31:00 | 9.9 | 30 | 5 |
14 | 9008 | 108 | 204 | 2021-10-02 21:01:00 | 2021-10-02 21:10:00 | 2021-10-02 21:31:00 | 13.2 | 38 | 4 |
15 | 9003 | 103 | 202 | 2021-10-02 08:01:00 | 2021-10-02 08:15:00 | 2021-10-02 08:31:00 | 11 | 41.5 | 4 |
16 | 9004 | 104 | 202 | 2021-10-03 08:01:00 | 2021-10-03 08:13:00 | 2021-10-03 08:31:00 | 7.5 | 22 | 4 |
17 | 9009 | 109 | 204 | 2021-10-0318:01:00 | NULL | 2021-10-03 18:51:00 | NULL | NULL | NULL |
(order_id-订单号, uid-用户ID, driver_id-司机ID, order_time-接单时间, start_time-开始计费的上车时间, finish_time-订单完成时间, mileage-行驶里程数, fare-费用, grade-评分)
场景逻辑说明:
-
用户提交打车请求后,在用户打车记录表生成一条打车记录,order_id-订单号设为null;
-
当有司机接单时,在打车订单表生成一条订单,填充order_time-接单时间及其左边的字段,start_time-开始计费的上车时间及其右边的字段全部为null,并把order_id-订单号和order_time-接单时间(end_time-打车结束时间)写入打车记录表;若一直无司机接单,超时或中途用户主动取消打车,则记录end_time-打车结束时间。
-
若乘客上车前,乘客或司机点击取消订单,会将打车订单表对应订单的finish_time-订单完成时间填充为取消时间,其余字段设为null。
-
当司机接上乘客时,填充订单表中该start_time-开始计费的上车时间。
-
当订单完成时填充订单完成时间、里程数、费用;评分设为null,在用户给司机打1~5星评价后填充。
问题:请统计国庆头3天里,每天的近7日日均订单完成量和日均订单取消量,按日期升序排序。结果保留2位小数。
WITH t1 AS (
SELECT date(order_time) AS dt,
COUNT(CASE WHEN mileage IS NOT null THEN order_id ELSE null END) AS finish_num, # 订单完成数量
COUNT(CASE WHEN mileage IS null THEN order_id ELSE null END) AS cancel_num # 订单取消数量
FROM tb_get_car_order
GROUP BY date(order_time)
), t2 AS (SELECT dt,
SUM(finish_num) over (ORDER BY dt rows 6 preceding) AS finish_num_7d, # 向过去滚动6天
SUM(cancel_num) over (ORDER BY dt rows 6 preceding) AS cancel_num_7d
FROM t1)
SELECT dt, round(finish_num_7d / 7, 2) AS finish_num_7d, round(cancel_num_7d / 7, 2) AS cancel_num_7d
FROM t2 WHERE dt BETWEEN '2021-10-01' AND '2021-10-03'
GROUP BY dt ORDER BY dt ASC
SQL23 工作日各时段叫车量、等待接单时间和调度时间
描述: 用户打车记录表tb_get_car_record
id | uid | city | event_time | enr_time | order_id |
1 | 107 | 北京 | 2021-09-20 11:00:00 | 2021-09-20 11:00:30 | 9017 |
2 | 108 | 北京 | 2021-09-20 21:00:00 | 2021-09-20 21:00:40 | 9008 |
3 | 108 | 北京 | 2021-09-20 18:59:30 | 2021-09-20 19:01:00 | 9018 |
4 | 102 | 北京 | 2021-09-21 08:59:00 | 2021-09-21 09:01:00 | 9002 |
5 | 106 | 北京 | 2021-09-21 17:58:00 | 2021-09-21 18:01:00 | 9006 |
6 | 103 | 北京 | 2021-09-22 07:58:00 | 2021-09-22 08:01:00 | 9003 |
7 | 104 | 北京 | 2021-09-23 07:59:00 | 2021-09-23 08:01:00 | 9004 |
8 | 103 | 北京 | 2021-09-24 19:59:20 | 2021-09-24 20:01:00 | 9019 |
9 | 101 | 北京 | 2021-09-24 08:28:10 | 2021-09-24 08:30:00 | 9011 |
(uid 用户ID, city-城市, event_time-打车时间, end_time-打车结束时间, order_id-订单号)
打车订单表tb_get_car_order
id | oeder_id | uid | driver_id | order_time | start_time | finish_time | mileage | fare | grade |
1 | 9017 | 107 | 213 | 2021-09-20 11:00:30 | 2021-09-20 11:02:10 | 2021-09-20 11:31:00 | 11 | 38 | 5 |
2 | 9008 | 108 | 204 | 2021-09-20 21:00:40 | 2021-09-20 21:03:00 | 2021-09-20 21:31:00 | 13.2 | 38 | 4 |
3 | 9018 | 108 | 214 | 2021-09-20 19:01:00 | 2021-09-20 19:04:50 | 2021-09-20 19:21:00 | 14 | 38 | 5 |
4 | 9002 | 102 | 202 | 2021-09-21 09:01:00 | 2021-09-21 09:06:00 | 2021-09-21 09:31:00 | 10 | 41.5 | 5 |
5 | 9006 | 106 | 203 | 2021-09-21 18:01:00 | 2021-09-21 18:09:00 | 2021-09-21 18:31:00 | 8 | 25.5 | 4 |
6 | 9007 | 107 | 203 | 2021-09-22 11:01:00 | 2021-09-22 11:07:00 | 2021-09-22 11:31:00 | 9.9 | 30 | 5 |
7 | 9003 | 103 | 202 | 2021-09-22 08:01:00 | 2021-10-22 08:15:00 | 2021-10-22 08:31:00 | 11 | 41.5 | 4 |
8 | 9004 | 104 | 202 | 2021-09-23 08:01:00 | 2021-09-23 08:13:00 | 2021-09-23 08:31:00 | 7.5 | 22 | 4 |
9 | 9005 | 105 | 202 | 2021-09-23 10:01:00 | 2021-09-23 10:13:00 | 2021-09-23 10:31:00 | 9 | 29 | 5 |
10 | 9019 | 103 | 202 | 2021-09-24 20:01:00 | 2021-09-24 20:11:00 | 2021-09-24 20:51:00 | 10 | 39 | 4 |
11 | 9011 | 101 | 211 | 2021-09-24 08:30:00 | 2021-09-24 08:31:00 | 2021-09-24 08:54:00 | 10 | 35 | 5 |
(order_id-订单号, uid-用户ID, driver_id-司机ID, order_time-接单时间, start_time-开始计费的上车时间, finish_time-订单完成时间, mileage-行驶里程数, fare-费用, grade-评分)
场景逻辑说明:
-
用户提交打车请求后,在用户打车记录表生成一条打车记录,订单号-order_id设为null;
-
当有司机接单时,在打车订单表生成一条订单,填充接单时间-order_time 及其左边的字段,上车时间-start_time及其右边的字段全部为null,并把订单号-order_id和接单时间-order_time(end_time-打车结束时间)写入打车记录表;若一直无司机接单,超时或中途用户主动取消打车,则记录打车结束时间-end_time。
-
若乘客上车前,乘客或司机点击取消订单,会将打车订单表对应订单的finish_time-订单完成时间填充为取消时间,其余字段设为null。
-
当司机接上乘客时,填充订单表中该订单的start_time-上车时间。
-
当订单完成时填充订单完成时间、里程数、费用;评分设为null,在用户给司机打1~5星评价后填充。
问题:统计周一到周五各时段的叫车量、平均等待接单时间和平均调度时间。全部以event_time-开始打车时间为时段划分依据,平均等待接单时间和平均调度时间均保留1位小数,平均调度时间仅计算完成了的订单,结果按叫车量升序排序。
注:
-
不同时段定义:早高峰 [07:00:00 , 09:00:00)、工作时间 [09:00:00 , 17:00:00)、晚高峰 [17:00:00 , 20:00:00)、休息时间 [20:00:00 , 07:00:00)
-
时间区间左闭右开(即7:00:00算作早高峰,而9:00:00不算做早高峰)
-
从开始打车到司机接单为等待接单时间,从司机接单到上车为调度时间。
SELECT(CASE WHEN time(event_time) >= '07:00:00' AND time(event_time) < '09:00:00'
THEN '早高峰' WHEN time(event_time) >= '09:00:00' AND time(event_time) < '17:00:00'
THEN '工作时间' WHEN time(event_time) >= '17:00:00' AND time(event_time) < '20:00:00'
THEN '晚高峰' ELSE '休息时间' END) AS period,
COUNT(tgcr.order_id) AS get_car_num,
round(AVG(timestampdiff(second, event_time, order_time) / 60), 1) AS avg_wait_time,
round(AVG(timestampdiff(second, order_time, start_time) / 60), 1) AS avg_dispatch_time
FROM tb_get_car_record AS tgcr
JOIN tb_get_car_order tgco ON tgcr.order_id = tgco.order_id
WHERE weekday(event_time) BETWEEN 0 AND 4
GROUP BY period
ORDER BY get_car_num
SQL24 各城市最大同时等车人数
描述: 用户打车记录表tb_get_car_record
id | uid | city | event_time | enr_time | order_id |
1 | 108 | 北京 | 2021-10-20 08:00:00 | 2021-10-20 08:00:40 | 9008 |
2 | 118 | 北京 | 2021-10-20 08:00:10 | 2021-10-20 08:00:45 | 9018 |
3 | 102 | 北京 | 2021-10-20 08:00:30 | 2021-10-20 08:00:50 | 9002 |
4 | 106 | 北京 | 2021-10-20 08:05:41 | 2021-10-20 08:06:00 | 9006 |
5 | 103 | 北京 | 2021-10-20 08:05:50 | 2021-10-20 08:07:10 | 9003 |
6 | 104 | 北京 | 2021-10-20 08:01:01 | 2021-10-20 08:01:20 | 9004 |
7 | 105 | 北京 | 2021-10-20 08:01:15 | 2021-10-20 08:01:30 | 9019 |
8 | 101 | 北京 | 2021-10-20 08:28:10 | 2021-10-20 08:30:00 | 9011 |
(uid-用户ID, city-城市, event_time-打车时间, end_time-打车结束时间, order_id-订单号)
打车订单表tb_get_car_order
id | oeder_id | uid | driver_id | order_time | start_time | finish_time | mileage | fare | grade |
1 | 9008 | 108 | 204 | 2021-10-20 08:00:40 | 2021-10-20 08:03:00 | 2021-10-20 08:31:00 | 13.2 | 38 | 4 |
2 | 9018 | 108 | 214 | 2021-10-20 08:00:45 | 2021-10-20 08:04:50 | 2021-10-20 08:21:00 | 14 | 38 | 5 |
3 | 9002 | 102 | 202 | 2021-10-20 08:00:50 | 2021-10-20 08:06:00 | 2021-10-20 08:31:00 | 10 | 41.5 | 5 |
4 | 9006 | 106 | 206 | 2021-10-20 08:06:00 | 2021-10-20 08:09:00 | 2021-10-20 08:31:00 | 8 | 25.5 | 4 |
5 | 9003 | 103 | 203 | 2021-10-20 08:07:10 | 2021-10-20 08:15:00 | 2021-10-20 08:31:00 | 11 | 41.5 | 4 |
6 | 9004 | 104 | 204 | 2021-10-20 08:01:20 | 2021-10-20 08:13:00 | 2021-10-20 08:31:00 | 7.5 | 22 | 4 |
7 | 9019 | 105 | 205 | 2021-10-20 08:01:30 | 2021-10-20 08:11:00 | 2021-10-20 08:51:00 | 10 | 39 | 4 |
8 | 9011 | 101 | 211 | 2021-10-20 08:30:00 | 2021-10-20 08:31:00 | 2021-10-20 08:54:00 | 10 | 35 | 5 |
(order_id-订单号, uid-用户ID, driver_id-司机ID, order_time-接单时间, start_time-开始计费的上车时间, finish_time-订单完成时间, mileage-行驶里程数, fare-费用, grade-评分)
场景逻辑说明:
-
用户提交打车请求后,在用户打车记录表生成一条打车记录,订单号-order_id设为null;
-
当有司机接单时,在打车订单表生成一条订单,填充接单时间-order_time及其左边的字段,上车时间及其右边的字段全部为null,并把订单号和接单时间(打车结束时间)写入打车记录表;若一直无司机接单、超时或中途用户主动取消打车,则记录打车结束时间。
-
若乘客上车前,乘客或司机点击取消订单,会将打车订单表对应订单的订单完成时间-finish_time填充为取消时间,其余字段设为null。
-
当司机接上乘客时,填充打车订单表中该订单的上车时间start_time。
-
当订单完成时填充订单完成时间、里程数、费用;评分设为null,在用户给司机打1~5星评价后填充。
问题:请统计各个城市在2021年10月期间,单日中最大的同时等车人数。
注: 等车指从开始打车起,直到取消打车、取消等待或上车前的这段时间里用户的状态。
如果同一时刻有人停止等车,有人开始等车,等车人数记作先增加后减少。
结果按各城市最大等车人数升序排序,相同时按城市升序排序。
WITH t1 AS(
SELECT city, date(event_time) AS dt, r.uid, event_time AS time, 1 AS tag
FROM tb_get_car_record r LEFT JOIN tb_get_car_order USING(order_id)
WHERE date_format(event_time, '%Y-%m') = '2021-10'
UNION SELECT city, date(event_time) AS dt, r.uid, IF(end_time = order_time, start_time, end_time) AS time, -1 AS tag
FROM tb_get_car_record r LEFT JOIN tb_get_car_order USING(order_id)
WHERE date_format(event_time, '%Y-%m') = '2021-10'), t2 AS(
SELECT *, SUM(tag) over(PARTITION BY city, dt ORDER BY time ASC, tag DESC) AS wait_num
FROM t1)
SELECT city, MAX(wait_num) AS max_wait_num
FROM t2 GROUP BY city, dt ORDER BY 2, 1
05 某宝店铺分析(电商模式)
某宝店铺分析(电商模式)
SQL25 某宝店铺的SPU数量
描述: 11月结束后,小牛同学需要对其在某宝的网店就11月份用户交易情况和产品情况进行分析以更好的经营小店。
已知产品情况表product_tb如下(其中,item_id指某款号的具体货号,style_id指款号,tag_price表示标签价格,inventory指库存量):
item_id | style_id | tag_price | inventory |
A001 | A | 100 | 20 |
A002 | A | 120 | 30 |
A003 | A | 200 | 15 |
B001 | B | 130 | 18 |
B002 | B | 150 | 22 |
B003 | B | 125 | 10 |
B004 | B | 155 | 12 |
C001 | C | 260 | 25 |
C002 | C | 280 | 18 |
问题:请你统计每款的SPU(货号)数量,并按SPU数量降序排序,以上例子的输出结果如下:
style_id | SPU_num |
B | 4 |
A | 3 |
C | 2 |
SELECT style_id, COUNT(item_id) AS SPU_num
FROM product_tb
GROUP BY style_id
ORDER BY SPU_num DESC
SQL26 某宝店铺的实际销售额与客单价
描述: 11月结束后,小牛同学需要对其在某宝的网店就11月份用户交易情况和产品情况进行分析以更好的经营小店。
已知11月份销售数据表sales_tb如下(其中,sales_date表示销售日期,user_id指用户编号,item_id指货号,sales_num表示销售数量,sales_price表示结算金额):
sales_date | user_id | item_id | sales_num | sales_price |
2021-11-01 | 1 | A001 | 1 | 90 |
2021-11-01 | 2 | A002 | 2 | 220 |
2021-11-01 | 2 | B001 | 1 | 120 |
2021-11-02 | 3 | C001 | 2 | 500 |
2021-11-02 | 4 | B001 | 1 | 120 |
2021-11-03 | 5 | C001 | 1 | 240 |
2021-11-03 | 6 | C002 | 1 | 270 |
2021-11-04 | 7 | A003 | 1 | 180 |
2021-11-04 | 8 | B002 | 1 | 140 |
2021-11-04 | 9 | B001 | 1 | 125 |
2021-11-05 | 10 | B003 | 1 | 120 |
2021-11-05 | 10 | B004 | 1 | 150 |
2021-11-05 | 10 | A003 | 1 | 180 |
2021-11-06 | 11 | B003 | 1 | 120 |
2021-11-06 | 10 | B004 | 1 | 150 |
问题:请你统计实际总销售额与客单价(人均付费,总收入/总用户数,结果保留两位小数),以上例子的输出结果如下:
sales_total | per_trans |
2725 | 247.73 |
SELECT SUM(sales_price) AS sales_total,
round(SUM(sales_price) / COUNT(DISTINCT user_id), 2) AS per_trans
FROM sales_tb
SQL27 某宝店铺折扣率
描述: 11月结束后,小牛同学需要对其在某宝的网店就11月份用户交易情况和产品情况进行分析以更好的经营小店。
已知产品情况表product_tb如下(其中,item_id指某款号的具体货号,style_id指款号,tag_price表示标签价格,inventory指库存量):
item_id | style_id | tag_price | inventory |
A001 | A | 100 | 20 |
A002 | A | 120 | 30 |
A003 | A | 200 | 15 |
B001 | B | 130 | 18 |
B002 | B | 150 | 22 |
B003 | B | 125 | 10 |
B004 | B | 155 | 12 |
C001 | C | 260 | 25 |
C002 | C | 280 | 18 |
11月份销售数据表sales_tb如下(其中,sales_date表示销售日期,user_id指用户编号,item_id指货号,sales_num表示销售数量,sales_price表示结算金额):
sales_date | user_id | item_id | sales_num | sales_price |
2021-11-01 | 1 | A001 | 1 | 90 |
2021-11-01 | 2 | A002 | 2 | 220 |
2021-11-01 | 2 | B001 | 1 | 120 |
2021-11-02 | 3 | C001 | 2 | 500 |
2021-11-02 | 4 | B001 | 1 | 120 |
2021-11-03 | 5 | C001 | 1 | 240 |
2021-11-03 | 6 | C002 | 1 | 270 |
2021-11-04 | 7 | A003 | 1 | 180 |
2021-11-04 | 8 | B002 | 1 | 140 |
2021-11-04 | 9 | B001 | 1 | 125 |
2021-11-05 | 10 | B003 | 1 | 120 |
2021-11-05 | 10 | B004 | 1 | 150 |
2021-11-05 | 10 | A003 | 1 | 180 |
2021-11-06 | 11 | B003 | 1 | 120 |
2021-11-06 | 10 | B004 | 1 | 150 |
问题:请你统计折扣率(GMV/吊牌金额,GMV指的是成交金额),以上例子的输出结果如下(折扣率保留两位小数):
discount_rate(%) |
93.97 |
SELECT round(SUM(sales_price) * 100 / SUM(sales_num * tag_price), 2) discount_rate
FROM sales_tb LEFT JOIN product_tb USING(item_id)
SQL28 某宝店铺动销率与售罄率
描述: 11月结束后,小牛同学需要对其在某宝的网店就11月份用户交易情况和产品情况进行分析以更好的经营小店。
已知产品情况表product_tb如下(其中,item_id指某款号的具体货号,style_id指款号,tag_price表示标签价格,inventory指库存量):
item_id | style_id | tag_price | inventory |
A001 | A | 100 | 20 |
A002 | A | 120 | 30 |
A003 | A | 200 | 15 |
B001 | B | 130 | 18 |
B002 | B | 150 | 22 |
B003 | B | 125 | 10 |
B004 | B | 155 | 12 |
C001 | C | 260 | 25 |
C002 | C | 280 | 18 |
11月份销售数据表sales_tb如下(其中,sales_date表示销售日期,user_id指用户编号,item_id指货号,sales_num表示销售数量,sales_price表示结算金额):
sales_date | user_id | item_id | sales_num | sales_price |
2021-11-01 | 1 | A001 | 1 | 90 |
2021-11-01 | 2 | A002 | 2 | 220 |
2021-11-01 | 2 | B001 | 1 | 120 |
2021-11-02 | 3 | C001 | 2 | 500 |
2021-11-02 | 4 | B001 | 1 | 120 |
2021-11-03 | 5 | C001 | 1 | 240 |
2021-11-03 | 6 | C002 | 1 | 270 |
2021-11-04 | 7 | A003 | 1 | 180 |
2021-11-04 | 8 | B002 | 1 | 140 |
2021-11-04 | 9 | B001 | 1 | 125 |
2021-11-05 | 10 | B003 | 1 | 120 |
2021-11-05 | 10 | B004 | 1 | 150 |
2021-11-05 | 10 | A003 | 1 | 180 |
2021-11-06 | 11 | B003 | 1 | 120 |
2021-11-06 | 10 | B004 | 1 | 150 |
问题:请你统计每款的动销率(pin_rate,有销售的SKU数量/在售SKU数量)与售罄率(sell-through_rate,GMV/备货值,备货值=吊牌价*库存数),按style_id升序排序,以上例子的输出结果如下:
style_id | pin_rate(%) | sell-through_rate(%) |
A | 8.33 | 7.79 |
B | 14.81 | 11.94 |
C | 10.26 | 8.75 |
SELECT a.style_id,
round(b.ys_cnt / (a.kc_cnt - b.ys_cnt) * 100, 2) 'pin_rate(%)',
round(b.GMV / a.bh_price * 100, 2) 'sell-through_rate(%)'
FROM (SELECT style_id, SUM(inventory) kc_cnt, SUM(tag_price*inventory) bh_price FROM product_tb
GROUP BY style_id) AS a
LEFT JOIN
(SELECT LEFT(item_id,1) style_id, SUM(sales_num) ys_cnt, SUM(sales_price) GMV FROM sales_tb
GROUP BY style_id) AS b ON a.style_id = b.style_id
ORDER BY a.style_id
SQL29 某宝店铺连续2天及以上购物的用户及其对应的天数
描述: 11月结束后,小牛同学需要对其在某宝的网店就11月份用户交易情况和产品情况进行分析以更好的经营小店。
11月份销售数据表sales_tb如下(其中,sales_date表示销售日期,user_id指用户编号,item_id指货号,sales_num表示销售数量,sales_price表示结算金额):
sales_date | user_id | item_id | sales_num | sales_price |
2021-11-01 | 1 | A001 | 1 | 90 |
2021-11-01 | 2 | A002 | 2 | 220 |
2021-11-01 | 2 | B001 | 1 | 120 |
2021-11-02 | 3 | C001 | 2 | 500 |
2021-11-02 | 4 | B001 | 1 | 120 |
2021-11-03 | 5 | C001 | 1 | 240 |
2021-11-03 | 6 | C002 | 1 | 270 |
2021-11-04 | 7 | A003 | 1 | 180 |
2021-11-04 | 8 | B002 | 1 | 140 |
2021-11-04 | 9 | B001 | 1 | 125 |
2021-11-05 | 10 | B003 | 1 | 120 |
2021-11-05 | 10 | B004 | 1 | 150 |
2021-11-05 | 10 | A003 | 1 | 180 |
2021-11-06 | 11 | B003 | 1 | 120 |
2021-11-06 | 10 | B004 | 1 | 150 |
问题:请你统计连续2天及以上在该店铺购物的用户及其对应的次数(若有多个用户,按user_id升序排序),以上例子的输出结果如下:
user_id | days_count |
10 | 2 |
SELECT user_id, days_count
FROM (SELECT user_id, sub_date, COUNT(*) days_count
FROM (SELECT *, date_sub(sales_date, INTERVAL row_number() over(PARTITION BY user_id ORDER BY sales_date) day) sub_date
FROM (SELECT DISTINCT sales_date, user_id FROM sales_tb) a ) b
GROUP BY user_id, sub_date) c
WHERE days_count >= 2
06 某客直播课分析(在线教育行业)
某客直播课分析(在线教育行业)
SQL30 某客直播转换率
描述: 某客某页面推出了数据分析系列直播课程介绍。用户可以选择报名任意一场或多场直播课。
已知课程表course_tb如下(其中course_id代表课程编号,course_name表示课程名称,course_datetime代表上课时间):
course_id | course_name | course_datetime |
1 | Python | 2021-12-1 19:00-21:00 |
2 | SQL | 2021-12-2 19:00-21:00 |
3 | R | 2021-12-3 19:00-21:00 |
用户行为表behavior_tb如下(其中user_id表示用户编号、if_vw表示是否浏览、if_fav表示是否收藏、if_sign表示是否报名、course_id代表课程编号):
user_id | if_vw | if_fav | if_sign | course_id |
100 | 1 | 1 | 1 | 1 |
100 | 1 | 1 | 1 | 2 |
100 | 1 | 1 | 1 | 3 |
101 | 1 | 1 | 1 | 1 |
101 | 1 | 1 | 1 | 2 |
101 | 1 | 0 | 0 | 3 |
102 | 1 | 1 | 1 | 1 |
102 | 1 | 1 | 1 | 2 |
102 | 1 | 1 | 1 | 3 |
103 | 1 | 1 | 0 | 1 |
103 | 1 | 0 | 0 | 2 |
103 | 1 | 0 | 0 | 3 |
104 | 1 | 1 | 1 | 1 |
104 | 1 | 1 | 1 | 2 |
104 | 1 | 1 | 0 | 3 |
105 | 1 | 0 | 0 | 1 |
106 | 1 | 0 | 0 | 1 |
107 | 1 | 0 | 0 | 1 |
107 | 1 | 1 | 1 | 2 |
108 | 1 | 1 | 1 | 3 |
问题:请统计每个科目的转换率(sign_rate(%),转化率=报名人数/浏览人数,结果保留两位小数)。
注:按照course_id升序排序。
course_id | course_name | sign_rate(%) |
1 | Python | 50.00 |
2 | SQL | 83.33 |
3 | R | 50.00 |
SELECT a.course_id, b.course_name, round(SUM(if_sign) / sum(if_vw) * 100, 2) `sign_rate(%)`
FROM behavior_tb a
LEFT JOIN course_tb b ON a.course_id = b.course_id
GROUP BY a.course_id , b.course_name ORDER BY a.course_id
SQL31 某客直播开始时各直播间在线人数
描述: 某客某页面推出了数据分析系列直播课程介绍。用户可以选择报名任意一场或多场直播课。
已知课程表course_tb如下(其中course_id代表课程编号,course_name表示课程名称,course_datetime代表上课时间):
course_id | course_name | course_datetime |
1 | Python | 2021-12-1 19:00-21:00 |
2 | SQL | 2021-12-2 19:00-21:00 |
3 | R | 2021-12-3 19:00-21:00 |
上课情况表attend_tb如下(其中user_id表示用户编号、course_id代表课程编号、in_datetime表示进入直播间的时间、out_datetime表示离开直播间的时间):
user_id | course_id | in_datetime | out_datetime |
100 | 1 | 2021-12-01 19:00:00 | 2021-12-01 19:28:00 |
100 | 1 | 2021-12-01 19:30:00 | 2021-12-01 19:53:00 |
101 | 1 | 2021-12-01 19:00:00 | 2021-12-01 20:55:00 |
102 | 1 | 2021-12-01 19:00:00 | 2021-12-01 19:05:00 |
104 | 1 | 2021-12-01 19:00:00 | 2021-12-01 20:59:00 |
101 | 2 | 2021-12-02 19:05:00 | 2021-12-02 20:58:00 |
102 | 2 | 2021-12-02 18:55:00 | 2021-12-02 21:00:00 |
104 | 2 | 2021-12-02 18:57:00 | 2021-12-02 20:56:00 |
107 | 2 | 2021-12-02 19:10:00 | 2021-12-02 19:18:00 |
100 | 3 | 2021-12-03 19:01:00 | 2021-12-03 21:00:00 |
102 | 3 | 2021-12-03 18:58:00 | 2021-12-03 19:05:00 |
108 | 3 | 2021-12-03 19:01:00 | 2021-12-03 19:56:00 |
问题:请统计直播开始时(19:00),各科目的在线人数,以上例子的输出结果为(按照course_id升序排序):
course_id | course_name | online_num |
1 | Python | 4 |
2 | SQL | 2 |
3 | R | 1 |
SELECT c.course_id as course_id, c.course_name as course_name, COUNT(DISTINCT IF(
a.in_datetime <= str_to_date(substring_index(c.course_datetime, '-', 3), '%Y-%m-%e %H:%i')
AND a.out_datetime > str_to_date(substring_index(c.course_datetime, '-', 3), '%Y-%m-%e %H:%i'),
a.user_id, null)) AS online_num
FROM course_tb c LEFT OUTER JOIN attend_tb a ON c.course_id = a.course_id
GROUP BY c.course_id, c.course_name ORDER BY course_id
SQL32 某客直播各科目平均观看时长
描述: 某客某页面推出了数据分析系列直播课程介绍。用户可以选择报名任意一场或多场直播课。
已知课程表course_tb如下(其中course_id代表课程编号,course_name表示课程名称,course_datetime代表上课时间):
course_id | course_name | course_datetime |
1 | Python | 2021-12-1 19:00-21:00 |
2 | SQL | 2021-12-2 19:00-21:00 |
3 | R | 2021-12-3 19:00-21:00 |
上课情况表attend_tb如下(其中user_id表示用户编号、course_id代表课程编号、in_datetime表示进入直播间的时间、out_datetime表示离开直播间的时间):
user_id | course_id | in_datetime | out_datetime |
100 | 1 | 2021-12-01 19:00:00 | 2021-12-01 19:28:00 |
100 | 1 | 2021-12-01 19:30:00 | 2021-12-01 19:53:00 |
101 | 1 | 2021-12-01 19:00:00 | 2021-12-01 20:55:00 |
102 | 1 | 2021-12-01 19:00:00 | 2021-12-01 19:05:00 |
104 | 1 | 2021-12-01 19:00:00 | 2021-12-01 20:59:00 |
101 | 2 | 2021-12-02 19:05:00 | 2021-12-02 20:58:00 |
102 | 2 | 2021-12-02 18:55:00 | 2021-12-02 21:00:00 |
104 | 2 | 2021-12-02 18:57:00 | 2021-12-02 20:56:00 |
107 | 2 | 2021-12-02 19:10:00 | 2021-12-02 19:18:00 |
100 | 3 | 2021-12-03 19:01:00 | 2021-12-03 21:00:00 |
102 | 3 | 2021-12-03 18:58:00 | 2021-12-03 19:05:00 |
108 | 3 | 2021-12-03 19:01:00 | 2021-12-03 19:56:00 |
问题:请统计每个科目的平均观看时长(观看时长定义为离开直播间的时间与进入直播间的时间之差,单位是分钟),输出结果按平均观看时长降序排序,结果保留两位小数。
course_name | avg_Len |
SQL | 91.25 |
R | 60.33 |
Python | 58.00 |
SELECT c.course_name, ROUND(AVG(TIMESTAMPDIFF(MINUTE, b.in_datetime, b.out_datetime)), 2) AS AVG_LEN
FROM course_tb c INNER JOIN attend_tb b ON c.course_id = b.course_id GROUP BY c.course_name ORDER BY AVG_LEN DESC
SQL33 某客直播各科目出勤率
描述: 某客某页面推出了数据分析系列直播课程介绍。用户可以选择报名任意一场或多场直播课。
已知课程表course_tb如下(其中course_id代表课程编号,course_name表示课程名称,course_datetime代表上课时间):
course_id | course_name | course_datetime |
1 | Python | 2021-12-1 19:00-21:00 |
2 | SQL | 2021-12-2 19:00-21:00 |
3 | R | 2021-12-3 19:00-21:00 |
用户行为表behavior_tb如下(其中user_id表示用户编号、if_vw表示是否浏览、if_fav表示是否收藏、if_sign表示是否报名、course_id代表课程编号):
user_id | if_vw | if_fav | if_sign | course_id |
100 | 1 | 1 | 1 | 1 |
100 | 1 | 1 | 1 | 2 |
100 | 1 | 1 | 1 | 3 |
101 | 1 | 1 | 1 | 1 |
101 | 1 | 1 | 1 | 2 |
101 | 1 | 0 | 0 | 3 |
102 | 1 | 1 | 1 | 1 |
102 | 1 | 1 | 1 | 2 |
102 | 1 | 1 | 1 | 3 |
103 | 1 | 1 | 0 | 1 |
103 | 1 | 0 | 0 | 2 |
103 | 1 | 0 | 0 | 3 |
104 | 1 | 1 | 1 | 1 |
104 | 1 | 1 | 1 | 2 |
104 | 1 | 1 | 0 | 3 |
105 | 1 | 0 | 0 | 1 |
106 | 1 | 0 | 0 | 1 |
107 | 1 | 0 | 0 | 1 |
107 | 1 | 1 | 1 | 2 |
108 | 1 | 1 | 1 | 3 |
上课情况表attend_tb如下(其中user_id表示用户编号、course_id代表课程编号、in_datetime表示进入直播间的时间、out_datetime表示离开直播间的时间):
user_id | course_id | in_datetime | out_datetime |
100 | 1 | 2021-12-01 19:00:00 | 2021-12-01 19:28:00 |
100 | 1 | 2021-12-01 19:30:00 | 2021-12-01 19:53:00 |
101 | 1 | 2021-12-01 19:00:00 | 2021-12-01 20:55:00 |
102 | 1 | 2021-12-01 19:00:00 | 2021-12-01 19:05:00 |
104 | 1 | 2021-12-01 19:00:00 | 2021-12-01 20:59:00 |
101 | 2 | 2021-12-02 19:05:00 | 2021-12-02 20:58:00 |
102 | 2 | 2021-12-02 18:55:00 | 2021-12-02 21:00:00 |
104 | 2 | 2021-12-02 18:57:00 | 2021-12-02 20:56:00 |
107 | 2 | 2021-12-02 19:10:00 | 2021-12-02 19:18:00 |
100 | 3 | 2021-12-03 19:01:00 | 2021-12-03 21:00:00 |
102 | 3 | 2021-12-03 18:58:00 | 2021-12-03 19:05:00 |
108 | 3 | 2021-12-03 19:01:00 | 2021-12-03 19:56:00 |
问题:请统计每个科目的出勤率(attend_rate(%),结果保留两位小数),出勤率=出勤(在线时长10分钟及以上)人数 / 报名人数,输出结果按course_id升序排序,以上数据的输出结果如下:
course_id | course_name | attend_rate(%) |
1 | Python | 75.00 |
2 | SQL | 60.00 |
3 | R | 66.67 |
SELECT c.course_id, course_name, ROUND(
COUNT(DISTINCT IF(
TIMESTAMPDIFF(second, in_datetime, out_datetime) >= 600, a.user_id, null
)) / COUNT(DISTINCT IF(if_sign = 1, b.user_id, null)) * 100, 2
)
FROM attend_tb a RIGHT JOIN behavior_tb b ON a.user_id = b.user_id AND a.course_id = b.course_id
JOIN course_tb c ON b.course_id = c.course_id GROUP BY c.course_id, course_name ORDER BY c.course_id
SQL34 某客直播各科目同时在线人数
描述: 某客某页面推出了数据分析系列直播课程介绍。用户可以选择报名任意一场或多场直播课。
已知课程表course_tb如下(其中course_id代表课程编号,course_name表示课程名称,course_datetime代表上课时间):
course_id | course_name | course_datetime |
1 | Python | 2021-12-1 19:00-21:00 |
2 | SQL | 2021-12-2 19:00-21:00 |
3 | R | 2021-12-3 19:00-21:00 |
上课情况表attend_tb如下(其中user_id表示用户编号、course_id代表课程编号、in_datetime表示进入直播间的时间、out_datetime表示离开直播间的时间):
user_id | course_id | in_datetime | out_datetime |
100 | 1 | 2021-12-01 19:00:00 | 2021-12-01 19:28:00 |
100 | 1 | 2021-12-01 19:30:00 | 2021-12-01 19:53:00 |
101 | 1 | 2021-12-01 19:00:00 | 2021-12-01 20:55:00 |
102 | 1 | 2021-12-01 19:00:00 | 2021-12-01 19:05:00 |
104 | 1 | 2021-12-01 19:00:00 | 2021-12-01 20:59:00 |
101 | 2 | 2021-12-02 19:05:00 | 2021-12-02 20:58:00 |
102 | 2 | 2021-12-02 18:55:00 | 2021-12-02 21:00:00 |
104 | 2 | 2021-12-02 18:57:00 | 2021-12-02 20:56:00 |
107 | 2 | 2021-12-02 19:10:00 | 2021-12-02 19:18:00 |
100 | 3 | 2021-12-03 19:01:00 | 2021-12-03 21:00:00 |
102 | 3 | 2021-12-03 18:58:00 | 2021-12-03 19:05:00 |
108 | 3 | 2021-12-03 19:01:00 | 2021-12-03 19:56:00 |
问题:请统计每个科目最大同时在线人数(按course_id排序),以上数据的输出结果如下:
course_id | course_name | max_num |
1 | Python | 4 |
2 | SQL | 4 |
3 | R | 3 |
SELECT ct.course_id, ct.course_name, bt.max_num
FROM course_tb ct JOIN (
SELECT course_id, MAX(at_num) AS max_num
FROM (SELECT *, SUM(num) over(PARTITION BY course_id ORDER BY course_id,at_time) AS at_num
FROM (SELECT user_id,course_id,time(in_datetime) AS at_time, 1 num
FROM attend_tb UNION ALL SELECT user_id, course_id, time(out_datetime) AS at_time, -1 num
FROM attend_tb ORDER BY course_id, at_time) a) b GROUP BY course_id) bt ON ct.course_id = bt.course_id
07 某乎问答(内容行业)
某乎问答(内容行业)
SQL35 某乎问答11月份日人均回答量
描述: 现有某乎问答创作者回答情况表answer_tb如下(其中answer_date表示创作日期、author_id指创作者编号、issue_id表示问题id、char_len表示回答字数):
answer_date | author_id | issue_id | char_len |
2021-11-01 | 101 | E001 | 150 |
2021-11-01 | 101 | E002 | 200 |
2021-11-01 | 102 | C003 | 50 |
2021-11-01 | 103 | P001 | 35 |
2021-11-01 | 104 | C003 | 120 |
2021-11-01 | 105 | P001 | 125 |
2021-11-01 | 102 | P002 | 105 |
2021-11-02 | 101 | P001 | 201 |
2021-11-02 | 110 | C002 | 200 |
2021-11-02 | 110 | C001 | 225 |
2021-11-02 | 110 | C002 | 220 |
2021-11-03 | 101 | C002 | 180 |
2021-11-04 | 109 | E003 | 130 |
2021-11-04 | 109 | E001 | 123 |
2021-11-05 | 108 | C001 | 160 |
2021-11-05 | 108 | C002 | 120 |
2021-11-05 | 110 | P001 | 180 |
2021-11-05 | 106 | P002 | 45 |
2021-11-05 | 107 | E003 | 56 |
问题:请统计11月份日人均回答量(回答问题数量/答题人数),按回答日期排序,结果保留两位小数,以上例子的输出结果如下:
answer_date | per_num |
2021-11-01 | 1.40 |
2021-11-02 | 2.00 |
2021-11-03 | 1.00 |
2021-11-04 | 2.00 |
2021-11-05 | 1.25 |
SELECT DISTINCT answer_date, ROUND(COUNT(issue_id) / COUNT(DISTINCT author_id), 2)
FROM answer_tb a WHERE month(answer_date) = 11 GROUP BY answer_date ORDER BY answer_date
SQL36 某乎问答高质量的回答中用户属于各级别的数量
描述: 现有某乎问答创作者信息表author_tb如下(其中author_id表示创作者编号、author_level表示创作者级别,共1-6六个级别、sex表示创作者性别):
author_id | author_level | sex |
101 | 6 | m |
102 | 1 | f |
103 | 1 | m |
104 | 3 | m |
105 | 4 | f |
106 | 2 | f |
107 | 2 | m |
108 | 5 | f |
109 | 6 | f |
110 | 5 | m |
创作者回答情况表answer_tb如下(其中answer_date表示创作日期、author_id指创作者编号、issue_id指问题编号、char_len表示回答字数):
answer_date | author_id | issue_id | char_len |
2021-11-01 | 101 | E001 | 150 |
2021-11-01 | 101 | E002 | 200 |
2021-11-01 | 102 | C003 | 50 |
2021-11-01 | 103 | P001 | 35 |
2021-11-01 | 104 | C003 | 120 |
2021-11-01 | 105 | P001 | 125 |
2021-11-01 | 102 | P002 | 105 |
2021-11-02 | 101 | P001 | 201 |
2021-11-02 | 110 | C002 | 200 |
2021-11-02 | 110 | C001 | 225 |
2021-11-02 | 110 | C002 | 220 |
2021-11-03 | 101 | C002 | 180 |
2021-11-04 | 109 | E003 | 130 |
2021-11-04 | 109 | E001 | 123 |
2021-11-05 | 108 | C001 | 160 |
2021-11-05 | 108 | C002 | 120 |
2021-11-05 | 110 | P001 | 180 |
2021-11-05 | 106 | P002 | 45 |
2021-11-05 | 107 | E003 | 56 |
回答字数大于等于100字的认为是高质量回答
问题:请统计某乎问答高质量的回答中用户属于1-2级、3-4级、5-6级的数量分别是多少,按数量降序排列,以上例子的输出结果如下:
level_cut | num |
5-6级 | 12 |
3-4级 | 2 |
1-2级 | 1 |
SELECT CASE
WHEN b.author_level = 1 OR b.author_level = 2 THEN '1-2级'
WHEN b.author_level = 3 OR b.author_level = 4 THEN '3-4级'
WHEN b.author_level = 5 OR b.author_level = 6 THEN '5-6级'
END level_cnt, COUNT(char_len) num
FROM answer_tb a LEFT JOIN author_tb b ON a.author_id=b.author_id
WHERE a.char_len >= 100 GROUP BY level_cnt ORDER BY num DESC
SQL37 某乎问答单日回答问题数大于等于3个的所有用户
描述: 现有某乎问答创作者回答情况表answer_tb如下(其中answer_date表示创作日期、author_id指创作者编号、issue_id指回答问题编号、char_len表示回答字数):
answer_date | author_id | issue_id | char_len |
2021-11-01 | 101 | E001 | 150 |
2021-11-01 | 101 | E002 | 200 |
2021-11-01 | 102 | C003 | 50 |
2021-11-01 | 103 | P001 | 35 |
2021-11-01 | 104 | C003 | 120 |
2021-11-01 | 105 | P001 | 125 |
2021-11-01 | 102 | P002 | 105 |
2021-11-02 | 101 | P001 | 201 |
2021-11-02 | 110 | C002 | 200 |
2021-11-02 | 110 | C001 | 225 |
2021-11-02 | 110 | C002 | 220 |
2021-11-03 | 101 | C002 | 180 |
2021-11-04 | 109 | E003 | 130 |
2021-11-04 | 109 | E001 | 123 |
2021-11-05 | 108 | C001 | 160 |
2021-11-05 | 108 | C002 | 120 |
2021-11-05 | 110 | P001 | 180 |
2021-11-05 | 106 | P002 | 45 |
2021-11-05 | 107 | E003 | 56 |
问题:请统计11月份单日回答问题数大于等于3个的所有用户信息(author_date表示回答日期、author_id表示创作者id,answer_cnt表示回答问题个数),以上例子的输出结果如下:
answer_date | author_id | answer_cnt |
2021-11-02 | 110 | 3 |
注:若有多条数据符合条件,按answer_date、author_id升序排序。
SELECT b.answer_date, b.author_id, b.answer_cnt
FROM (
SELECT a.answer_date, a.author_id, COUNT(a.issue_id) AS answer_cnt
FROM answer_tb AS a
WHERE a.answer_date >= '2021-11-01' AND a.answer_date <= '2021-11-30'
GROUP BY a.answer_date, a.author_id
) AS b WHERE b.answer_cnt >= 3 ORDER BY b.answer_date, b.author_id ASC
SQL38 某乎问答回答过教育类问题的用户里有多少用户回答过职场类问题
描述: 现有某乎问答题目信息表issue_tb如下(其中issue_id代表问题编号,issue_type表示问题类型):
issue_id | issue_type |
E001 | Education |
E002 | Education |
E003 | Education |
C001 | Career |
C002 | Career |
C003 | Career |
C004 | Career |
P001 | Psychology |
P002 | Psychology |
创作者回答情况表answer_tb如下(其中answer_date表示创作日期、author_id指创作者编号、issue_id指回答问题编号、char_len表示回答字数):
answer_date | author_id | issue_id | char_len |
2021-11-01 | 101 | E001 | 150 |
2021-11-01 | 101 | E002 | 200 |
2021-11-01 | 102 | C003 | 50 |
2021-11-01 | 103 | P001 | 35 |
2021-11-01 | 104 | C003 | 120 |
2021-11-01 | 105 | P001 | 125 |
2021-11-01 | 102 | P002 | 105 |
2021-11-02 | 101 | P001 | 201 |
2021-11-02 | 110 | C002 | 200 |
2021-11-02 | 110 | C001 | 225 |
2021-11-02 | 110 | C002 | 220 |
2021-11-03 | 101 | C002 | 180 |
2021-11-04 | 109 | E003 | 130 |
2021-11-04 | 109 | E001 | 123 |
2021-11-05 | 108 | C001 | 160 |
2021-11-05 | 108 | C002 | 120 |
2021-11-05 | 110 | P001 | 180 |
2021-11-05 | 106 | P002 | 45 |
2021-11-05 | 107 | E003 | 56 |
问题:请统计回答过教育类问题的用户里有多少用户回答过职场类问题,以上例子的输出结果如下:
num |
1 |
SELECT COUNT(t1.issue_id) AS num
FROM answer_tb t1 JOIN issue_tb t2 ON t1.issue_id = t2.issue_id
WHERE author_id IN (
SELECT author_id FROM answer_tb a JOIN issue_tb b ON a.issue_id = b.issue_id
WHERE issue_type = 'Education'
) AND issue_type = 'Career'
SQL39 某乎问答最大连续回答问题天数大于等于3天的用户及其对应等级
描述: 现有某乎问答创作者信息表author_tb如下(其中author_id表示创作者编号、author_level表示创作者级别,共1-6六个级别、sex表示创作者性别):
author_id | author_level | sex |
101 | 6 | m |
102 | 1 | f |
103 | 1 | m |
104 | 3 | m |
105 | 4 | f |
106 | 2 | f |
107 | 2 | m |
108 | 5 | f |
109 | 6 | f |
110 | 5 | m |
创作者回答情况表answer_tb如下(其中answer_date表示创作日期、author_id指创作者编号、issue_id指回答问题编号、char_len表示回答字数):
answer_date | author_id | issue_id | char_len |
2021-11-01 | 101 | E001 | 150 |
2021-11-01 | 101 | E002 | 200 |
2021-11-01 | 102 | C003 | 50 |
2021-11-01 | 103 | P001 | 35 |
2021-11-01 | 104 | C003 | 120 |
2021-11-01 | 105 | P001 | 125 |
2021-11-01 | 102 | P002 | 105 |
2021-11-02 | 101 | P001 | 201 |
2021-11-02 | 110 | C002 | 200 |
2021-11-02 | 110 | C001 | 225 |
2021-11-02 | 110 | C002 | 220 |
2021-11-03 | 101 | C002 | 180 |
2021-11-04 | 109 | E003 | 130 |
2021-11-04 | 109 | E001 | 123 |
2021-11-05 | 108 | C001 | 160 |
2021-11-05 | 108 | C002 | 120 |
2021-11-05 | 110 | P001 | 180 |
2021-11-05 | 106 | P002 | 45 |
2021-11-05 | 107 | E003 | 56 |
问题:请统计最大连续回答问题的天数大于等于3天的用户及其等级(若有多条符合条件的数据,按author_id升序排序),以上例子的输出结果如下:
author_id | author_level | days_cnt |
101 | 6 | 3 |
SELECT author_id, author_level, days_cnt
FROM (SELECT author_id, count(diff) AS days_cnt
FROM (SELECT *, answer_date-t_rank AS diff
FROM (
SELECT DISTINCT DISTINCT answer_date,author_id, dense_rank() over(PARTITION BY author_id ORDER BY answer_date) AS t_rank
FROM answer_tb) t2) t3 GROUP BY author_id, diff HAVING(days_cnt) >= 3) t4 JOIN author_tb USING(author_id)