题目: 获取每个班级中,以数学成绩排序,取后20%(成绩从低到高)的学生信息
(1)准备工作
数据:studentid,classid ,course, score
001,001,math,15
001,002,math,20
001,003,math,35
001,004,math,40
001,005,math,48
001,006,math,60
001,007,math,69
001,008,math,80
001,009,math,89
001,010,math,100
001,001,english,99
001,002,english,100
001,003,english,87
001,004,english,10
001,005,english,50
001,006,english,30
001,007,english,58
001,008,english,68
001,009,english,78
001,010,english,89
002,001,math,15
002,002,math,20
002,003,math,35
002,004,math,40
002,005,math,48
002,006,math,60
002,007,math,69
002,008,math,80
002,009,math,89
002,010,math,100
002,001,english,99
002,002,english,100
002,003,english,87
002,004,english,10
002,005,english,50
002,006,english,30
002,007,english,58
002,008,english,68
002,009,english,78
002,010,english,89
建表:
create table student(
classId int,
stuId int,
course string,
score int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
上传数据:
LOAD DATA LOCAL INPATH '/home/appweb/stu.csv' OVERWRITE INTO TABLE student;
(2).查询
方法一: cume_dist,分区后,对字段值按百分比划分,最准确
select studentid,classid,course,score,percent_part
from ( select studentid,classid,course,score,cume_dist() over (partition by
classId order by score) as percent_part from student where course = 'math' )tmp
where tmp.percent_part > 0.8 ;
结果:
方法二: ntile分桶/分箱,分成5个桶,但是会存在最后一个或多个桶为空的情况,不够精确
select studentid,classid,course,score
from ( select studentid,classid,course,score,ntile(5) over (partition by classId order by score) as bucket from student where course = 'math' ) tmp
where tmp.bucket = 5;