The latest service changes have not yet been reflected in this content. We will update the content as soon as possible. Please refer to the Korean version for information on the latest updates.
Available in VPC
Hive user-defined functions (UDFs) allow you to run your own code within a Hive query, and are used when the desired query is difficult to express using only built-in functions.
Typically, you can write and use UDFs to handle data from specific fields, such as search logs and transaction history.
There are 3 types of UDFs depending on the number of input rows that are passed to the function and the number of output rows that are returned by the function. Each type of function has a different interface to implement.
-
UDF
A UDF is a function that receives a single input row and returns a single output value per row.
Most mathematical and string functions, such as ROUND and REPLACE, are of the UDF type. -
UDAF
A UDAF is a function that receives multiple input rows and returns a single output row.
Aggregate functions, such as COUNT and MAX, are examples of UDAFs. -
UDTF
A UDTF is a function that receives a single input row and returns multiple output rows (table).
EXPLODE is an example of a UDTF.
This guide describes how to implement the org.apache.hadoop.hive.ql.exec.UDF Hive UDF interface and use it in Cloud Hadoop.
To use Hive UDF in Cloud Hadoop, follow the steps in order:
You must implement UDFs in Java. To use other programming languages, you can write a user-defined script (MapReduce script) and use it with the SELECT TRANSFORM syntax.
1. Create project
-
Create a Gradle Project using IntelliJ.
- package:
com.naverncp.hive



- package:
-
Add dependency settings in
build.gradleunder the project root as follows:- The same version of the components installed in Cloud Hadoop 2.0 is used in the example.
plugins { id 'java' } group 'com.naverncp' version '1.0-SNAPSHOT' repositories { mavenCentral() maven { url "<http://conjars.org/repo>" } } dependencies { compile group: 'org.apache.hadoop', name: 'hadoop-client', version: '3.1.1' compile group: 'org.apache.hive', name: 'hive-exec', version: '3.1.2' compile group: 'org.apache.commons', name: 'commons-lang3', version: '3.9' testCompile group: 'junit', name: 'junit', version: '4.12' }
2. Implement interface
-
Implement a UDF that meets the following conditions:
- A UDF must extend
org.apache.hadoop.hive.ql.exec.UDF. - A UDF must implement at least one
evaluate()method.
- A UDF must extend
Because the evaluate() method is not defined in the org.apache.hadoop.hive.ql.exec.UDF interface, the number and types of arguments that the function will receive cannot be known in advance.
// Strip.java
package com.naverncp.hive;
import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;
@Description(
name = "Strip",
value = "returns a stripped text",
extended = "stripping characters from the ends of strings"
)
public class Strip extends UDF {
private Text result = new Text();
public Text evaluate(Text str){
if (str == null){
return null;
}
result.set(StringUtils.strip(str.toString()));
return result;
}
public Text evaluate(Text str, String stripChar){
if (str == null){
return null;
}
result.set(StringUtils.strip(str.toString(), stripChar));
return result;
}
}
Two evaluate methods are implemented in the class above.
- 1st method: Remove spaces at the beginning and end of strings.
- 2nd method: Remove specified characters from the end of strings.
-
To use UDFs in Hive, first package the Java class into
.jar.- The following example shows a case where
.jaris uploaded underhdfs:///user/example.
$ ./gradlew clean $ ./gradlew build $ scp -i ~/Downloads/example-home.pem ~/IdeaProjects/hive/build/libs/hive-1.0-SNAPSHOT.jar sshuser@pub-4rrsj.hadoop.ntruss.com:~/ $ ssh -i ~/Downloads/example-home.pem sshuser@pub-4rrsj.hadoop.ntruss.com [sshuser@e-001-example-0917-hd ~]$ hadoop fs -copyFromLocal hive-1.0-SNAPSHOT.jar /user/example/ - The following example shows a case where
3. Use Hive
-
Run Hive CLI using the following commands:
- You don't need to grant any options because HiveServer is installed on the edge node.
[sshuser@e-001-example-0917-hd ~]$ hive 20/11/06 16:04:39 WARN conf.HiveConf: HiveConf of name hive.server2.enable.doAs.property does not exist log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender. Logging initialized using configuration in file:/etc/hive/2.6.5.0-292/0/hive-log4j.properties hive> -
Register a function in
Metastoreas follows:- Set a name using the CREATE FUNCTION syntax.
- Hive Metastore: A space where metadata related to tables and partitions is stored.
hive> CREATE FUNCTION strip AS 'com.naverncp.hive.Strip' > USING JAR 'hdfs:///user/example/hive-1.0-SNAPSHOT.jar'; converting to local hdfs:///user/example/hive-1.0-SNAPSHOT.jar Added [/tmp/99c3d137-f58e-4fab-8a2a-98361e3e59a1_resources/hive-1.0-SNAPSHOT.jar] to class path Added resources: [hdfs:///user/example/hive-1.0-SNAPSHOT.jar] OK Time taken: 17.786 secondsTo use a function only during a Hive session without storing it permanently in the metastore, use
TEMPORARY keywordas follows:ADD JAR 'hdfs:///user/example'; CREATE TEMPORARY FUNCTION strip AS 'com.naverncp.hive.Strip' -
Check whether the built
stripfunction works properly. You can verify that the spaces are removed.hive> select strip(' bee '); converting to local hdfs:///user/example/hive-1.0-SNAPSHOT.jar Added [/tmp/70e2e17a-ecca-41ff-9fe6-48417b8ef797_resources/hive-1.0-SNAPSHOT.jar] to class path Added resources: [hdfs:///user/example/hive-1.0-SNAPSHOT.jar] OK bee Time taken: 0.967 seconds, Fetched: 1 row(s) hive> select strip('banana', 'ab'); OK nan Time taken: 0.173 seconds, Fetched: 1 row(s)
You can delete a function as follows:
DROP FUNCTION strip;
If you create UDFs for frequently used logic based on data characteristics, you can easily view the data using SQL syntax.