1. SQL大厂面试题SQL156:各个视频的平均完播率
1.1 题目;
描述
用户-视频互动表tb_user_video_log
id | uid | video_id | start_time | end_time | if_follow | if_like | if_retweet | comment_id |
1 | 101 | 2001 | 2021-10-01 10:00:00 | 2021-10-01 10:00:30 | 0 | 1 | 1 | NULL |
2 | 102 | 2001 | 2021-10-01 10:00:00 | 2021-10-01 10:00:24 | 0 | 0 | 1 | NULL |
3 | 103 | 2001 | 2021-10-01 11:00:00 | 2021-10-01 11:00:34 | 0 | 1 | 0 | 1732526 |
4 | 101 | 2002 | 2021-09-01 10:00:00 | 2021-9-01 10:00:42 | 1 | 0 | 1 | NULL |
5 | 102 | 2002 | 2021-10-01 11:00:00 | 2021-10-01 10:00:30 | 1 | 0 | 1 | NULL |
(uid-用户ID, video_id-视频ID, start_time-开始观看时间, end_time-结束观看时间, if_follow-是否关注, if_like-是否点赞, if_retweet-是否转发, comment_id-评论ID)
短视频信息表tb_video_info
id | video_id | author | tag | duration | release_time |
1 | 2001 | 901 | 影视 | 30 | 2021-01-01 07:00:00 |
2 | 2002 | 901 | 美食 | 60 | 2021-01-01 07:00:00 |
3 | 2003 | 902 | 旅游 | 90 | 2021-01-01 07:00:00 |
(video_id-视频ID, author-创作者ID, tag-类别标签, duration-视频时长(秒), release_time-发布时间)
问题:计算2021年里有播放记录的每个视频的完播率(结果保留三位小数),并按完播率降序排序
注:视频完播率是指完成播放次数占总播放次数的比例。简单起见,结束观看时间与开始播放时间的差>=视频时长时,视为完成播放。
输出示例:
示例数据的结果如下:
video_id | avg_comp_play_rate |
2001 | 0.667 |
2002 | 0.000 |
解释:
视频2001在2021年10月有3次播放记录,观看时长分别为30秒、24秒、34秒,视频时长30秒,因此有两次是被认为完成播放了的,故完播率为0.667;
视频2002在2021年9月和10月共2次播放记录,观看时长分别为42秒、30秒,视频时长60秒,故完播率为0.000。
示例1
输入:
DROP TABLE IF EXISTS tb_user_video_log, tb_video_info;
CREATE TABLE tb_user_video_log (
id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
uid INT NOT NULL COMMENT '用户ID',
video_id INT NOT NULL COMMENT '视频ID',
start_time datetime COMMENT '开始观看时间',
end_time datetime COMMENT '结束观看时间',
if_follow TINYINT COMMENT '是否关注',
if_like TINYINT COMMENT '是否点赞',
if_retweet TINYINT COMMENT '是否转发',
comment_id INT COMMENT '评论ID'
) CHARACTER SET utf8 COLLATE utf8_bin;
CREATE TABLE tb_video_info (
id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
video_id INT UNIQUE NOT NULL COMMENT '视频ID',
author INT NOT NULL COMMENT '创作者ID',
tag VARCHAR(16) NOT NULL COMMENT '类别标签',
duration INT NOT NULL COMMENT '视频时长(秒数)',
release_time datetime NOT NULL COMMENT '发布时间'
)CHARACTER SET utf8 COLLATE utf8_bin;
INSERT INTO tb_user_video_log(uid, video_id, start_time, end_time, if_follow, if_like, if_retweet, comment_id) VALUES
(101, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:30', 0, 1, 1, null),
(102, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:24', 0, 0, 1, null),
(103, 2001, '2021-10-01 11:00:00', '2021-10-01 11:00:34', 0, 1, 0, 1732526),
(101, 2002, '2021-09-01 10:00:00', '2021-09-01 10:00:42', 1, 0, 1, null),
(102, 2002, '2021-10-01 11:00:00', '2021-10-01 11:00:30', 1, 0, 1, null);
INSERT INTO tb_video_info(video_id, author, tag, duration, release_time) VALUES
(2001, 901, '影视', 30, '2021-01-01 7:00:00'),
(2002, 901, '美食', 60, '2021-01-01 7:00:00'),
(2003, 902, '旅游', 90, '2021-01-01 7:00:00');
复制输出:
2001|0.667
2002|0.000
1.2 思路:
看注释。
1. 3 题解:
with tep1 as (
select video_id,
-- 设计ans字段,如果完播的记录设置为1,否则为0
if(timestampdiff(second, start_time, end_time) >= (
select duration
from tb_video_info t2
where t1.video_id = t2.video_id
), 1, 0) ans
from tb_user_video_log t1
where substring(start_time, 1, 4) = '2021'
)
select video_id,
round(
-- ans字段为1的记录都是完播记录
(select count(*) from tep1 t3 where t3.video_id = t1.video_id and ans = 1)
/
-- 此处是video_id的全部记录
(select count(*) from tep1 t2 where t1.video_id = t2.video_id), 3) avg_comp_play_rate
from tep1 t1
group by video_id
order by avg_comp_play_rate desc
2. SQL大厂面试题SQL157:平均播放进度大于60%的视频类别
2.1 题目:
描述
用户-视频互动表tb_user_video_log
id | uid | video_id | start_time | end_time | if_follow | if_like | if_retweet | comment_id |
1 | 101 | 2001 | 2021-10-01 10:00:00 | 2021-10-01 10:00:30 | 0 | 1 | 1 | NULL |
2 | 102 | 2001 | 2021-10-01 10:00:00 | 2021-10-01 10:00:21 | 0 | 0 | 1 | NULL |
3 | 103 | 2001 | 2021-10-01 11:00:50 | 2021-10-01 11:01:20 | 0 | 1 | 0 | 1732526 |
4 | 102 | 2002 | 2021-10-01 11:00:00 | 2021-10-01 11:00:30 | 1 | 0 | 1 | NULL |
5 | 103 | 2002 | 2021-10-01 10:59:05 | 2021-10-01 11:00:05 | 1 | 0 | 1 | NULL |
(uid-用户ID, video_id-视频ID, start_time-开始观看时间, end_time-结束观看时间, if_follow-是否关注, if_like-是否点赞, if_retweet-是否转发, comment_id-评论ID)
短视频信息表tb_video_info
id | video_id | author | tag | duration | release_time |
1 | 2001 | 901 | 影视 | 30 | 2021-01-01 07:00:00 |
2 | 2002 | 901 | 美食 | 60 | 2021-01-01 07:00:00 |
3 | 2003 | 902 | 旅游 | 90 | 2021-01-01 07:00:00 |
(video_id-视频ID, author-创作者ID, tag-类别标签, duration-视频时长, release_time-发布时间)
问题:计算各类视频的平均播放进度,将进度大于60%的类别输出。
注:
- 播放进度=播放时长÷视频时长*100%,当播放时长大于视频时长时,播放进度均记为100%。
- 结果保留两位小数,并按播放进度倒序排序。
输出示例:
示例数据的输出结果如下:
tag | avg_play_progress |
影视 | 90.00% |
美食 | 75.00% |
解释:
影视类视频2001被用户101、102、103看过,播放进度分别为:30秒(100%)、21秒(70%)、30秒(100%),平均播放进度为90.00%(保留两位小数);
美食类视频2002被用户102、103看过,播放进度分别为:30秒(50%)、60秒(100%),平均播放进度为75.00%(保留两位小数);
示例1
输入:
DROP TABLE IF EXISTS tb_user_video_log, tb_video_info;
CREATE TABLE tb_user_video_log (
id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
uid INT NOT NULL COMMENT '用户ID',
video_id INT NOT NULL COMMENT '视频ID',
start_time datetime COMMENT '开始观看时间',
end_time datetime COMMENT '结束观看时间',
if_follow TINYINT COMMENT '是否关注',
if_like TINYINT COMMENT '是否点赞',
if_retweet TINYINT COMMENT '是否转发',
comment_id INT COMMENT '评论ID'
) CHARACTER SET utf8 COLLATE utf8_bin;
CREATE TABLE tb_video_info (
id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
video_id INT UNIQUE NOT NULL COMMENT '视频ID',
author INT NOT NULL COMMENT '创作者ID',
tag VARCHAR(16) NOT NULL COMMENT '类别标签',
duration INT NOT NULL COMMENT '视频时长(秒数)',
release_time datetime NOT NULL COMMENT '发布时间'
)CHARACTER SET utf8 COLLATE utf8_bin;
INSERT INTO tb_user_video_log(uid, video_id, start_time, end_time, if_follow, if_like, if_retweet, comment_id) VALUES
(101, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:30', 0, 1, 1, null),
(102, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:21', 0, 0, 1, null),
(103, 2001, '2021-10-01 11:00:50', '2021-10-01 11:01:20', 0, 1, 0, 1732526),
(102, 2002, '2021-10-01 11:00:00', '2021-10-01 11:00:30', 1, 0, 1, null),
(103, 2002, '2021-10-01 10:59:05', '2021-10-01 11:00:05', 1, 0, 1, null);
INSERT INTO tb_video_info(video_id, author, tag, duration, release_time) VALUES
(2001, 901, '影视', 30, '2021-01-01 7:00:00'),
(2002, 901, '美食', 60, '2021-01-01 7:00:00'),
(2003, 902, '旅游', 90, '2020-01-01 7:00:00');
复制输出:
影视|90.00%
美食|75.00%
2.2 思路:
先计算完播率,然后求平均值,然后concat加上%号。
2.3 题解:
with tep1 as (
select
video_id,
-- case when实现计算逻辑,如果小于等于60
-- 则计算的播放进度,否则100
case when
timestampdiff(second, start_time, end_time) /
(select duration from tb_video_info t2
where t1.video_id = t2.video_id
) * 100
<= 100
then
timestampdiff(second, start_time, end_time) /
(select duration from tb_video_info t2
where t1.video_id = t2.video_id
) * 100
else 100
end ans
from tb_user_video_log t1
), tep2 as (
select tag
, round(avg(ans), 2) avg_play_progress
from tep1 t1
join tb_video_info t2
on t1.video_id = t2.video_id
group by tag
order by avg_play_progress desc
)
-- 过滤掉平均播放进度不足60的记录,并使用concat得到%
select tag, concat(avg_play_progress, '%') avg_play_progress
from tep2
where avg_play_progress > 60
3. 力扣1607:没有卖出的卖家
3.1 题目:
表: Customer
+---------------+---------+ | Column Name | Type | +---------------+---------+ | customer_id | int | | customer_name | varchar | +---------------+---------+ customer_id 是该表具有唯一值的列。 该表的每行包含网上商城的每一位顾客的信息。
表: Orders
+---------------+---------+ | Column Name | Type | +---------------+---------+ | order_id | int | | sale_date | date | | order_cost | int | | customer_id | int | | seller_id | int | +---------------+---------+ order_id 是该表具有唯一值的列。 该表的每行包含网上商城的所有订单的信息. sale_date 是顾客 customer_id 和卖家 seller_id 之间交易的日期.
表: Seller
+---------------+---------+ | Column Name | Type | +---------------+---------+ | seller_id | int | | seller_name | varchar | +---------------+---------+ seller_id 是该表主具有唯一值的列。 该表的每行包含每一位卖家的信息.
写一个解决方案, 报告所有在 2020
年度没有任何卖出的卖家的名字。
返回结果按照 seller_name
升序排列。
查询结果格式如下例所示。
示例 1:
输入: Customer
表: +--------------+---------------+ | customer_id | customer_name | +--------------+---------------+ | 101 | Alice | | 102 | Bob | | 103 | Charlie | +--------------+---------------+Orders
表: +-------------+------------+--------------+-------------+-------------+ | order_id | sale_date | order_cost | customer_id | seller_id | +-------------+------------+--------------+-------------+-------------+ | 1 | 2020-03-01 | 1500 | 101 | 1 | | 2 | 2020-05-25 | 2400 | 102 | 2 | | 3 | 2019-05-25 | 800 | 101 | 3 | | 4 | 2020-09-13 | 1000 | 103 | 2 | | 5 | 2019-02-11 | 700 | 101 | 2 | +-------------+------------+--------------+-------------+-------------+Seller
表: +-------------+-------------+ | seller_id | seller_name | +-------------+-------------+ | 1 | Daniel | | 2 | Elizabeth | | 3 | Frank | +-------------+-------------+输出:
+-------------+ |seller_name
| +-------------+ | Frank | +-------------+解释:
Daniel 在 2020 年 3 月卖出 1 次。 Elizabeth 在 2020 年卖出 2 次, 在 2019 年卖出 1 次。 Frank 在 2019 年卖出 1 次, 在 2020 年没有卖出。
3.2 思路:
写个easy放松一下。
3.3 题解:
with tep1 as (
select distinct seller_id
from Orders
where substring(sale_date, 1, 4) = '2020'
)
select seller_name
from Seller
where seller_id not in (select * from tep1)
order by seller_name