odps函数

云计算

常用函数

系统自带函数

coalesce()：返回列表中第一个非NULL的值，如果列表中所有的值都是NULL则返回NULL;

eg:

concat()：字符串连接函数;

eg:

least()：返回输入参数中最小的一个

greatest()：返回输入参数中最大的一个(var1，var2可以为bigint，double，datetime或者string。若所有值都为NULL则返回NULL。

返回值:输入参数中的最大值，当不存在隐式转换时返回同输入参数类型。NULL为最小值。当输入参数类型不同时，double，bigint，string之间的比较转为double；string，datetime的比较转为datetime。不允许其它的隐式转换)

decode()：实现分支选择的功能

eg:select decode(customer_id,

1, \’Taobao\’,

2, \’Alipay\’,

3, \’Aliyun\’,

NULL, \’N/A\’,

\’Others\’) as result

from sale_detail;

上面的decode函数实现了下面if-then-else语句中的功能：

if customer_id = 1 then

result := \’Taobao\’;

elsif customer_id = 2 then

result := \’Alipay\’;

elsif customer_id = 3 then

result := \’Aliyun\’;

…

else

result := \’Others\’;

end if;

if函数：if(逻辑条件,coumn1,coumn2)表示满足条件则输出1，否则输出2的值

eg:if(cap_direction not in(\’0\’,\’1\’),null, cast(cap_direction as bigint));

substr():返回字符串str从start_position开始长度为length的子串

eg: substr(abc, 2) = bc;substr(abc, 2, 1) = b;

to_char():将Boolean类型、bigint类型、decimal类型或者double类型转为对应的string类型表示

eg:to_char(123) = \’123\’;to_char(true) = \’TRUE\’;to_char(1.23) = \’1.23\’;to_char(null) = NULL;

to_char():Datetime类型，要转换的日期值，若输入为string类型会隐式转换为datetime类型后参与运算，其它类型抛异常。

eg:to_char(getdate(),\’yyyymmdd\’)

concat（coumn1,\’,\’,coumn2）：字符串连接函数

匹配两位精度：

concat(substr(to_char(lng),1,6),\’,\’,substr(to_char(lat),1,5)) like \’120.08,30.28\’;

regexp_extract(coumn,\’\’,number):字符串拆分函数

如：临东路与火神塘路交叉口

regexp_extract(inter_name,\'(.*?)(路)\’,1) =临东

regexp_extract(inter_name,\’与(.*?)(交叉口)\’,1)=火神塘路

regexp_replace:字符串替换函数

regexp_replace(round_name,\’-\’,\’\’,1)表示吧-替换成null

split_part字符串拆分函数

split_part(\’环北-密渡桥\’,\’-\’,2)=密渡桥

instr：计算一个子串str2在字符串str1中的位置

instr(\’Tech on the net\’, \’e\’) = 2；instr(\’Tech on the net\’, \’e\’, 1, 1) = 2

cast

coors_convert(lng,lat,1)：谷歌转高德coors_convert(120.2334214,30.21829241,1)

WHERE judge_location(split_part(coors_convert(a.lng,a.lat,1),\’,\’,1),split_part(coors_convert(a.lng,a.lat,1),\’,\’,2))=1

窗口函数

统计量：count,sum,avg,max/min,median,stddev,stddev_samp

排名：row_unmber,rank,dense_rank,percent_rank

其他类：lag,lead,cluster_sample

——————–

基本用法;把数据按照一定条件分成多组称为开窗，每个组称为一个窗口

partition by部分用来指定开窗的列

分区列的值相同的行被视为在同一个窗口内

order by用来指定数据在一个窗口内如何排序

使用限制：只能出现在select子句中

窗口函数中不要嵌套使用窗口函数和聚合函数

不可以和同级别的聚合函数一起使用

一个odps sql语句中，可以使用至多5个窗口函数

Partition开窗时，同一窗口内最多包含1亿行数据

用rows开窗时，x,y必须大于等于0的整数常量，限定范围0-10000，值为0时表示当前行

必须使用order by才可以用rows方式指定窗口范围

并非所有的窗口函数都可以用rows指定开窗方式，支持这种用法的窗口函数有avg,count,max,min,stddev和sum

———————-

举个栗子

select *,rank() over(partition by monitor_id order by distance) as mindistance_monitor_id from()

自定义函数

基于阿里云odps制作相应的自定义函数

说明：本例子中由于odps版本过低：所以创建的时候没有采用阿里云example一步一步来maven打包，而是采用自己打包，是由于采用例子的一步一步来出来的jar回有问题(出来的jar没有类资源，只有配置文件资源)。

名词解释：

UDF：用户自定义标量值函数(user defined scalar function),其输入与输出是一对一的关系，读入一行数据(可以有多个参数)，写出一条输出值

UDTF：自定义表值函数(user defined table valued function),是用来解决一次函数调用输出多行数据场景的，也是唯一能返回多个字段的自定义函数

UDAF：自定义聚合函数(user defined aggregation function)，其输入和输出是多对一的关系，将多条输入记录聚合成一条输出值(可以和group by语句联用)

Published by

风君子

发表回复取消回复

Published by

风君子

发表回复 取消回复

发表回复取消回复