Hive函数——分析函数
ROW_NUMBER
ROW_NUMBER() –从1开始,按照顺序,生成分组内记录的序列
ROW_NUMBER() 的应用场景非常多,再比如,获取分组内排序第一的记录;
SELECT id, order_date,update_date, state, fee, ROW_NUMBER() OVER(PARTITION BY id ORDER BY update_date) AS num FROM tmp_order
NTILE
NTILE(n),用于将分组数据按照顺序切分成n片,返回当前切片值
NTILE不支持ROWS BETWEEN,比如 NTILE(2) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)
如果切片不均匀,默认增加第一个切片的分布
窗口内分为2组
SELECT id, order_date, update_date, state, fee, NTILE(2) OVER(PARTITION BY id ORDER BY update_date) AS rn1 FROM tmp_order
| id | order_date | update_date | state | fee | rn1 |
---|---|---|---|---|---|---|
1 | 1001 | 20170101 | 20170101 | 创建 | 150 | 1 |
2 | 1001 | 20170101 | 20170104 | 支付 | 150 | 1 |
3 | 1001 | 20170101 | 20170106 | 支付 | 150 | 1 |
4 | 1001 | 20170101 | 20170107 | 完成 | 150 | 2 |
5 | 1001 | 20170101 | 20170108 | 完成 | 150 | 2 |
6 | 1005 | 20170101 | 20170101 | 创建 | 270 | 1 |
7 | 1005 | 20170101 | 20170102 | 支付 | 270 | 1 |
8 | 1005 | 20170101 | 20170103 | 支付 | 270 | 1 |
9 | 1005 | 20170101 | 20170104 | 支付 | 270 | 2 |
10 | 1005 | 20170101 | 20170105 | 取消 | 270 | 2 |
11 | 1005 | 20170101 | 20170109 | 取消 | 270 | 2 |
12 | 1008 | 20170102 | 20170102 | 创建 | 300 | 1 |
13 | 1008 | 20170102 | 20170105 | 取消 | 300 | 2 |
所有数据在一个窗口内,内分为3组
SELECT id, order_date,update_date, state, fee, NTILE(3) OVER(ORDER BY update_date) AS rn2 FROM tmp_order
| id | order_date | update_date | state | fee | rn2 |
---|---|---|---|---|---|---|
1 | 1001 | 20170101 | 20170101 | 创建 | 150 | 1 |
2 | 1005 | 20170101 | 20170101 | 创建 | 270 | 1 |
3 | 1005 | 20170101 | 20170102 | 支付 | 270 | 1 |
4 | 1008 | 20170102 | 20170102 | 创建 | 300 | 1 |
5 | 1005 | 20170101 | 20170103 | 支付 | 270 | 1 |
6 | 1001 | 20170101 | 20170104 | 支付 | 150 | 2 |
7 | 1005 | 20170101 | 20170104 | 支付 | 270 | 2 |
8 | 1005 | 20170101 | 20170105 | 取消 | 270 | 2 |
9 | 1008 | 20170102 | 20170105 | 取消 | 300 | 2 |
10 | 1001 | 20170101 | 20170106 | 支付 | 150 | 3 |
11 | 1001 | 20170101 | 20170107 | 完成 | 150 | 3 |
12 | 1001 | 20170101 | 20170108 | 完成 | 150 | 3 |
13 | 1005 | 20170101 | 20170109 | 取消 | 270 | 3 |
RANK、DENSE_RANK
—RANK() 生成数据项在分组中的排名,排名相等会在名次中留下空位
—DENSE_RANK() 生成数据项在分组中的排名,排名相等会在名次中不会留下空位
SELECT id, order_date, update_date, state, fee, RANK() OVER(PARTITION BY id ORDER BY state) AS rn1, DENSE_RANK() OVER(PARTITION BY id ORDER BY state) AS rn2, ROW_NUMBER() OVER(PARTITION BY id ORDER BY state) AS rn3 FROM tmp_order ORDER BY id, rn1
id | order_date | update_date | state | fee | rn1 | rn2 | rn3 | |
---|---|---|---|---|---|---|---|---|
1 | 1001 | 20170101 | 20170101 | 创建 | 150 | 1 | 1 | 1 |
2 | 1001 | 20170101 | 20170107 | 完成 | 150 | 2 | 2 | 2 |
3 | 1001 | 20170101 | 20170108 | 完成 | 150 | 2 | 2 | 3 |
4 | 1001 | 20170101 | 20170104 | 支付 | 150 | 4 | 3 | 4 |
5 | 1001 | 20170101 | 20170106 | 支付 | 150 | 4 | 3 | 5 |
6 | 1005 | 20170101 | 20170101 | 创建 | 270 | 1 | 1 | 1 |
7 | 1005 | 20170101 | 20170105 | 取消 | 270 | 2 | 2 | 2 |
8 | 1005 | 20170101 | 20170109 | 取消 | 270 | 2 | 2 | 3 |
9 | 1005 | 20170101 | 20170102 | 支付 | 270 | 4 | 3 | 4 |
10 | 1005 | 20170101 | 20170103 | 支付 | 270 | 4 | 3 | 5 |
11 | 1005 | 20170101 | 20170104 | 支付 | 270 | 4 | 3 | 6 |
12 | 1008 | 20170102 | 20170102 | 创建 | 300 | 1 | 1 | 1 |
13 | 1008 | 20170102 | 20170105 | 取消 | 300 | 2 | 2 | 2 |
读后有收获可以支付宝请作者喝咖啡
