Public Hive

Available in VPC

Describe how to create and use Hive databases and tables in Data Forest.

Note

UDF and Storage Handler features are under preparation.

Run Hive commands

If you are connecting to HiveServer for the first time, see Use Hive.
For Kerberos authentication, see Create and Manage Account.

Hive commands can be executed in Dev app as follows.

$ kinit example -kt df.example.keytab # verify Kerberos
$ beeline -u "jdbc:hive2://zk1.kr.df.naverncp.com:2181,zk2.kr.df.naverncp.com:2181,zk3.kr.df.naverncp.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2" -n example -p 'PASSWORD'

Create database

Note

Data Forest users cannot use Hive's default database. A new user database must be created according to rules.

In order to create a Hive database in Data Forest, the user must follow the rules below.

A database name should follow the {USER}__db_* format.
If the user's account name is example, then the DB names can be designated as below.
```
example__db_foo
example__db_bar
example__db_hello_foo
example__db_hello_foo_bar
```
A database name can't contain - (hyphen). If the user name contains - (hyphen), then replace it with _ (underscore). If the user's account name is example-ex, then the DB names can be designated as below.
```
example_ex__db_foo
```
The following are examples of DB names that violate the rule.
```
example_db_foo  # when the number of _(underscore) does not match 
example__DB_foo # when db is capitalized
example-ex__db_foo # when the -(hyphen)is not replaced with _(underscore)
```
Hive table data is stored in HDFS. When creating a database, the path must be designated with the LOCATION settings. This path is limited to warehouse subordinate paths under the user's home directory.
- First, create the warehouse directory under the user's home directory.
```
$ hadoop fs -mkdir warehouse
$ hadoop fs -ls
Found 4 items
...
drwx------   - example hdfs          0 2021-03-29 16:18 warehouse
```
- You can create a database with the Beeline command as follows.
```
# create database
CREATE DATABASE example__db_foo LOCATION   '/user/example/warehouse/example__db_foo.db';
# use database
USE example__db_foo;
```

Caution

You can't see other users' databases when viewing databases with the Beeline command. However, you can have the entire database list returned with Spark SQL. Make sure to not include personal information in a database name.

Create table

There is no naming restrictions when creating tables. The LOCATION option is not required, but if you'd like to specify one separately, specify it to be the path under /user/example/warehouse.

CREATE TABLE `table1` (`id` BIGINT) LOCATION '/user/example/warehouse/table1';

Command limitations

SET, and TRANSFORM clauses cannot be used. An update is planned where a dedicated queue is adopted so that the SET command can be used.

Set Spark-Hive access

You should use LLAP to read Hive's Managed Table data from Spark. The following explains how to set up to access Hive through Interactive Hive.

Check access information

The following access information can be set:

Property	Value
spark.sql.hive.hiveserver2.jdbc.url (Interactive)	jdbc:hive2://zk1.kr.df.naverncp.com:2181,zk2.kr.df.naverncp.com:2181,zk3.kr.df.naverncp.com:2181/;serviceDiscoveryMode=zooKeeperHA;zooKeeperNamespace=df-interactive
spark.sql.hive.hiveserver2.jdbc.url (Batch)	jdbc:hive2://zk1.kr.df.naverncp.com:2181,zk2.kr.df.naverncp.com:2181,zk3.kr.df.naverncp.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
spark.datasource.hive.warehouse.metastoreUri	thrift://hms1.kr.df.naverncp.com:9083,thrift://hms2.kr.df.naverncp.com:9083
spark.datasource.hive.warehouse.load.staging.dir	/user/{dataforest account name}/tmp
spark.hadoop.hive.zookeeper.quorum	zk1.kr.df.naverncp.com:2181,zk2.kr.df.naverncp.com:2181,zk3.kr.df.naverncp.com:2181
spark.hadoop.hive.llap.daemon.service.hosts	@df-llap
principal	{dataforest account name}@KR.DF.NAVERNCP.COM
keytab	{dataforest account keytab file path}
jars	{hwc jar file path}

You can see that there are two kinds of connection strings, interactive and batch, in spark.sql.hive.hiveserver2.jdbc.url from the table content.
You should use an interactive connection string to conduct tasks in the Interactive Hive. You can conduct tasks by using the same settings value in Spark shell and PySpark.
Principals and keytabs use Kerberos accounts from the Data Forest with the setup required to run Spark Executors on YARN.
spark.sql.hive.hiveserver2.jdbc.url's user and password are authentication information to access the Interactive Hive within Executor. Use the Data Forest account name (same as Kerberos ID) and Ranger password.

The example of conducting tasks in Interactive Hive

The following is the example of working in Interactive Hive via spark-shell.

spark-shell \
  --master local[2] \
  --conf spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://zk1.kr.df.naverncp.com:2181,zk2.kr.df.naverncp.com:2181,zk3.kr.df.naverncp.com:2181/;serviceDiscoveryMode=zooKeeperHA;zooKeeperNamespace=df-interactive" \
  --conf spark.datasource.hive.warehouse.metastoreUri="thrift://hms1.kr.df.naverncp.com:9083,thrift://hms2.kr.df.naverncp.com:9083" \
  --conf spark.datasource.hive.warehouse.load.staging.dir=/user/example/tmp \
  --conf spark.hadoop.hive.zookeeper.quorum="zk1.kr.df.naverncp.com:2181,zk2.kr.df.naverncp.com:2181,zk3.kr.df.naverncp.com:2181" \
  --conf spark.hadoop.hive.llap.daemon.service.hosts=@df-llap \
  --conf spark.security.credentials.hiveserver2.enabled=false \
  --conf spark.sql.hive.hiveserver2.jdbc.url.principal="hive/_HOST@KR.DF.NAVERNCP.COM" \
  --principal example@KR.DF.NAVERNCP.COM \
  --keytab ./df.example.keytab \
  --jars /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar

The screen appears as follows once the login is complete.

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://ambari1.kr.df.naverncp.com:4040
Spark context available as 'sc' (master = local[2], app id = local-1624438130183).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.3.2.3.1.0.0-78
      /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112)
Type in expressions to have them evaluated.
Type :help for more information.

scala> import com.hortonworks.hwc.HiveWarehouseSession
import com.hortonworks.hwc.HiveWarehouseSession

scala> import com.hortonworks.hwc.HiveWarehouseSession._
import com.hortonworks.hwc.HiveWarehouseSession._

scala>  val hive1 = HiveWarehouseSession.session(spark).build()
hive1: com.hortonworks.spark.sql.hive.llap.HiveWarehouseSessionImpl = com.hortonworks.spark.sql.hive.llap.HiveWarehouseSessionImpl@3da61af2