Bootstrap

Spark SQL----SORT BY子句

一、描述

SORT BY子句用于返回按用户指定的顺序在每个分区中排序的结果行。当有多个分区时,SORT BY可能返回部分排序(partially ordered)的结果。这与ORDER BY子句不同,后者保证输出的总顺序。

二、语法

SORT BY { expression [ sort_direction | nulls_sort_order ] [ , ... ] }

三、参数

  • SORT BY
    指定一个逗号分隔的表达式列表,以及用于对每个分区内的行进行排序的可选参数sort_direction和nulls_sort_order。
  • sort_direction
    (可选)指定是按升序还是降序对行进行排序。排序方向的有效值为ASC表示升序,DESC表示降序。如果未明确指定排序方向,则默认情况下按升序对行进行排序。
    语法:
 [ ASC | DESC ]
  • nulls_sort_order
    可选地指定是否在非NULL值之前/之后返回NULL值。如果未指定null_sort_order,则如果排序顺序为ASC,NULLs将首先排序,如果排序顺序是DESC,则NULLs将最后排序。
    如果指定了NULLS FIRST,则无论排序顺序如何,都会首先返回NULL值。
    如果指定了NULLS LAST,那么不管排序顺序如何,最后都会返回NULL值。
    语法:
[ NULLS { FIRST | LAST } ]

四、例子

CREATE TABLE person (zip_code INT, name STRING, age INT);
INSERT INTO person VALUES
    (94588, 'Zen Hui', 50),
    (94588, 'Dan Li', 18),
    (94588, 'Anil K', 27),
    (94588, 'John V', NULL),
    (94511, 'David K', 42),
    (94511, 'Aryan B.', 18),
    (94511, 'Lalit B.', NULL);

-- Use `REPARTITION` hint to partition the data by `zip_code` to
-- examine the `SORT BY` behavior. This is used in rest of the
-- examples.

-- Sort rows by `name` within each partition in ascending manner
SELECT /*+ REPARTITION(zip_code) */ name, age, zip_code FROM person SORT BY name;
+--------+----+--------+
|    name| age|zip_code|
+--------+----+--------+
|  Anil K|  27|   94588|
|  Dan Li|  18|   94588|
|  John V|null|   94588|
| Zen Hui|  50|   94588|
|Aryan B.|  18|   94511|
| David K|  42|   94511|
|Lalit B.|null|   94511|
+--------+----+--------+

-- Sort rows within each partition using column position.
SELECT /*+ REPARTITION(zip_code) */ name, age, zip_code FROM person SORT BY 1;
+--------+----+--------+
|    name| age|zip_code|
+--------+----+--------+
|  Anil K|  27|   94588|
|  Dan Li|  18|   94588|
|  John V|null|   94588|
| Zen Hui|  50|   94588|
|Aryan B.|  18|   94511|
| David K|  42|   94511|
|Lalit B.|null|   94511|
+--------+----+--------+

-- Sort rows within partition in ascending manner keeping null values to be last.
SELECT /*+ REPARTITION(zip_code) */ age, name, zip_code FROM person SORT BY age NULLS LAST;
+----+--------+--------+
| age|    name|zip_code|
+----+--------+--------+
|  18|  Dan Li|   94588|
|  27|  Anil K|   94588|
|  50| Zen Hui|   94588|
|null|  John V|   94588|
|  18|Aryan B.|   94511|
|  42| David K|   94511|
|null|Lalit B.|   94511|
+----+--------+--------+

-- Sort rows by age within each partition in descending manner, which defaults to NULL LAST.
SELECT /*+ REPARTITION(zip_code) */ age, name, zip_code FROM person SORT BY age DESC;
+----+--------+--------+
| age|    name|zip_code|
+----+--------+--------+
|  50| Zen Hui|   94588|
|  27|  Anil K|   94588|
|  18|  Dan Li|   94588|
|null|  John V|   94588|
|  42| David K|   94511|
|  18|Aryan B.|   94511|
|null|Lalit B.|   94511|
+----+--------+--------+

-- Sort rows by age within each partition in descending manner keeping null values to be first.
SELECT /*+ REPARTITION(zip_code) */ age, name, zip_code FROM person SORT BY age DESC NULLS FIRST;
+----+--------+--------+
| age|    name|zip_code|
+----+--------+--------+
|null|  John V|   94588|
|  50| Zen Hui|   94588|
|  27|  Anil K|   94588|
|  18|  Dan Li|   94588|
|null|Lalit B.|   94511|
|  42| David K|   94511|
|  18|Aryan B.|   94511|
+----+--------+--------+

-- Sort rows within each partition based on more than one column with each column having
-- different sort direction.
SELECT /*+ REPARTITION(zip_code) */ name, age, zip_code FROM person
    SORT BY name ASC, age DESC;
+--------+----+--------+
|    name| age|zip_code|
+--------+----+--------+
|  Anil K|  27|   94588|
|  Dan Li|  18|   94588|
|  John V|null|   94588|
| Zen Hui|  50|   94588|
|Aryan B.|  18|   94511|
| David K|  42|   94511|
|Lalit B.|null|   94511|
+--------+----+--------+
;