Bootstrap

StarRocks-join优化

1、背景

        有两个大表,都是6kw级别上下的,通过SR然后包装了一个接口对外提供查询,当前的问题是,这样大的join查询会导致BE直接宕机。并且这个sql很有代表性,我截图如下:

这个表是个单分区,所以直接查全表没啥问题。最后其实是要输出的是limit 10 。

那么这个limit 10能否下推呢?

2、解决方案:

方案一:

Select
	*
From
	(
	Select
		*
	From
		dws_d_topic_realname_vehicle_sim_info_sna
	Limit 100) t  
Join dws_d_topic_realname_vehicle_sim_info_sna t2 On
	t.vin = t2.vin
Limit 100;

我这里是模拟两个表之间的join操作,看下profile

- RawRowsRead: 101.496M (101496209)
               - __MAX_OF_RawRowsRead: 359.008K (359008)
               - __MIN_OF_RawRowsRead: 253.240K (253240)
             - ReadPagesNum: 45.454K (45454)
               - __MAX_OF_ReadPagesNum: 149
               - __MIN_OF_ReadPagesNum: 113
             - RowsRead: 200
               - __MAX_OF_RowsRead: 100
               - __MIN_OF_RowsRead: 0

查询不到一秒出结果。

方案二:

Select
	*
From
	(
	Select
		*
	From
		dws_d_topic_realname_vehicle_sim_info_sna
	Limit 100) t
Join (
	Select
		*
	From
		dws_d_topic_realname_vehicle_sim_info_sna
	Limit 100) t2 
Limit 100;

查询不到一秒出结果。

看profile就更炸裂了

- RawRowsRead: 3.300K (3300)
               - __MAX_OF_RawRowsRead: 100
               - __MIN_OF_RawRowsRead: 100
             - ReadPagesNum: 2.248K (2248)
               - __MAX_OF_ReadPagesNum: 70
               - __MIN_OF_ReadPagesNum: 68
             - RowsRead: 3.300K (3300)
               - __MAX_OF_RowsRead: 100
               - __MIN_OF_RowsRead: 100

扫描的数据大大减少,还是很牛的,欢迎交流。

附上最原始的查询做下对比:

  1. 查询语句:
Select
	*
From
	(
	Select
		*
	From
		dws_d_topic_realname_vehicle_sim_info_sna) t
Join dws_d_topic_realname_vehicle_sim_info_sna t2 On
	t.vin = t2.vin
Limit 100;

2. profile日志  

- RawRowsRead: 101.496M (101496209)
               - __MAX_OF_RawRowsRead: 359.008K (359008)
               - __MIN_OF_RawRowsRead: 253.240K (253240)
             - ReadPagesNum: 299.036K (299036)
               - __MAX_OF_ReadPagesNum: 1.170K (1170)
               - __MIN_OF_ReadPagesNum: 597
             - RowsRead: 101.496M (101496192)
               - __MAX_OF_RowsRead: 359.008K (359008)
               - __MIN_OF_RowsRead: 253.237K (253237)
             - ScanTime: 927.950ms
               - __MAX_OF_ScanTime: 1s351ms
               - __MIN_OF_ScanTime: 467.955ms

;