hive题目


蚂蚁金服题目

第一题:

/*问题:查询user_low_carbon表中每日流水记录,条件为:
用户在2017年,连续三天(或以上)的天数里,
每天减少碳排放(low_carbon)都超过100g的用户低碳流水。
需要查询返回满足以上条件的user_low_carbon表中的记录流水。
例如用户u_002符合条件的记录如下,因为2017/1/2~2017/1/5
连续四天的碳排放量之和都大于等于100g:
seq(key) user_id data_dt  low_carbon
xxxxx10    u_002  2017/1/2  150
xxxxx11    u_002  2017/1/2  70
xxxxx12    u_002  2017/1/3  30
xxxxx13    u_002  2017/1/3  80
xxxxx14    u_002  2017/1/4  150
xxxxx14    u_002  2017/1/5  101
备注:统计方法不限于sql、procedure、python,java等*/

with
t1
as
(
	select user_id,
	date_format(regexp_replace(data_dt,'/','-'),"yyyy-MM-dd") date_f_dt,low_carbon
	from user_low_carbon
	where substring(data_dt,1,5) = '2017' && low_carbon >= 100
),
t2 as
(
	select t1.user_id,t1.date_f_dt,t1.low_carbon,
	row_number over(distribute by t1.user_id  sort by t1.date_f_dt desc) rank_dt
	from t1

),
t3 as
(
	select t2.user_id,t2.date_f_dt,t2.low_carbon,
	date_add(t2.date_f_dt,rank_dt) dt_sum,t2.rank_dt
	from t2
),
t4 as
(
	select t3.user_id,t3.date_f_dt,t3.low_carbon,
	count(1) over(distribute by t3.user_id,t3.dt_sum) same_num
	from t3
)
select *
from t4
where t4.same_num < 3;

第二题:

一张表 user,包含两个字段:id,username
找出除id外的,其他都重复的所有信息。

with
t1 as
(
	select id,username
	from user
	group by username
	having count(1) > 1
)
select u.id,u.username
from user u
where (u.username) in t1;

文章作者: tzkTangXS
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 tzkTangXS !
  目录