1、背景
有两个大表,都是6kw级别上下的,通过SR然后包装了一个接口对外提供查询,当前的问题是,这样大的join查询会导致BE直接宕机。并且这个sql很有代表性,我截图如下:
这个表是个单分区,所以直接查全表没啥问题。最后其实是要输出的是limit 10 。
那么这个limit 10能否下推呢?
2、解决方案:
方案一:
Select
*
From
(
Select
*
From
dws_d_topic_realname_vehicle_sim_info_sna
Limit 100) t
Join dws_d_topic_realname_vehicle_sim_info_sna t2 On
t.vin = t2.vin
Limit 100;
我这里是模拟两个表之间的join操作,看下profile
- RawRowsRead: 101.496M (101496209)
- __MAX_OF_RawRowsRead: 359.008K (359008)
- __MIN_OF_RawRowsRead: 253.240K (253240)
- ReadPagesNum: 45.454K (45454)
- __MAX_OF_ReadPagesNum: 149
- __MIN_OF_ReadPagesNum: 113
- RowsRead: 200
- __MAX_OF_RowsRead: 100
- __MIN_OF_RowsRead: 0
查询不到一秒出结果。
方案二:
Select
*
From
(
Select
*
From
dws_d_topic_realname_vehicle_sim_info_sna
Limit 100) t
Join (
Select
*
From
dws_d_topic_realname_vehicle_sim_info_sna
Limit 100) t2
Limit 100;
查询不到一秒出结果。
看profile就更炸裂了
- RawRowsRead: 3.300K (3300)
- __MAX_OF_RawRowsRead: 100
- __MIN_OF_RawRowsRead: 100
- ReadPagesNum: 2.248K (2248)
- __MAX_OF_ReadPagesNum: 70
- __MIN_OF_ReadPagesNum: 68
- RowsRead: 3.300K (3300)
- __MAX_OF_RowsRead: 100
- __MIN_OF_RowsRead: 100
扫描的数据大大减少,还是很牛的,欢迎交流。
附上最原始的查询做下对比:
- 查询语句:
Select
*
From
(
Select
*
From
dws_d_topic_realname_vehicle_sim_info_sna) t
Join dws_d_topic_realname_vehicle_sim_info_sna t2 On
t.vin = t2.vin
Limit 100;
2. profile日志
- RawRowsRead: 101.496M (101496209)
- __MAX_OF_RawRowsRead: 359.008K (359008)
- __MIN_OF_RawRowsRead: 253.240K (253240)
- ReadPagesNum: 299.036K (299036)
- __MAX_OF_ReadPagesNum: 1.170K (1170)
- __MIN_OF_ReadPagesNum: 597
- RowsRead: 101.496M (101496192)
- __MAX_OF_RowsRead: 359.008K (359008)
- __MIN_OF_RowsRead: 253.237K (253237)
- ScanTime: 927.950ms
- __MAX_OF_ScanTime: 1s351ms
- __MIN_OF_ScanTime: 467.955ms