Hive UDF编程

zhiquanliu

浏览: 27500 次
性别:
来自: 深圳

最近访客更多访客>>

jiawei28888

idea_zhenjiang

redsnower

VincentBoy

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Hive

编写一个类继承 org.apache.hadoop.hive.ql.exec.UDF

在该类中加入 evaluate 方法

"evaluate" should never be a void method. However it can return "null" if * needed.

public class UDFLastDay extends UDF{
	private final SimpleDateFormat inputFormatter = new SimpleDateFormat("yyyy-MM-dd");
	private final SimpleDateFormat outFormatter = new SimpleDateFormat("yyyy-MM-dd");
	
	private final Calendar calendar = Calendar.getInstance();
	
	
	Text result = new Text();
	
	//  2015-03-01  ==> 2015-03-31
	public Text evaluate(Text input) {
		
		if(null == input || StringUtils.isBlank(input.toString())) {
			return null;
		}
		
		try {
			calendar.setTime(inputFormatter.parse(input.toString()));
			int lastDate = calendar.getActualMaximum(Calendar.DATE);  //获得到月份最大的天数
			calendar.set(Calendar.DATE, lastDate);
			
			result.set(outFormatter.format(calendar.getTime()));
			
			return result;
		} catch (ParseException e) {
			e.printStackTrace();
			return null;
		}
	}
}

打包放到 linux 某个目录下例如： /home/hadoop/software/lib/udf.jar
如何将UDF加入到hive中使用？

方式一：（当前session有效）

add jar /home/hadoop/software/lib/udf.jar ;

create temporary function getLastDay as 'com.cloudyhadoop.bigdata.udf.UDFLastDay';

show functions;

select empno, ename, hiredate, getLastDay(hiredate) last_day from emp;

方式二：（全局有效）

hive-site.xml中添加如下配置信息：

<value>file:///home/hadoop/software/lib/udf.jar</value>

</property>

启动hive之后，就不需要再：add jar /home/hadoop/software/lib/udf.jar ;

create temporary function getLastDay as 'com.cloudyhadoop.bigdata.udf.UDFLastDay';

temporary: current session，退出或者重启之后函数丢失

如何做到全局有效？

1、https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/DropFunction

CREATE FUNCTION [db_name.]function_name AS class_name

[USING JAR|FILE|ARCHIVE 'file_uri' [, JAR|FILE|ARCHIVE 'file_uri'] ];

2、修改源代码

https://github.com/cloudera/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java

registerUDF("getLastDay", UDFLastDay.class, false);

重新编译、部署

分享到：

hadoop 提交 mapreduce假死的问题 | hbase 压缩配置

2015-08-16 17:25
浏览 689
评论(0)
分类:互联网
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Hive UDF编程

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Hive UDF编程

评论

发表评论

相关推荐

最近访客更多访客>>