- Print
- PDF
Public Hive
- Print
- PDF
Available in VPC
Describe how to create and use Hive databases and tables in Data Forest.
UDF and Storage Handler features are under preparation.
Run Hive commands
If you are connecting to HiveServer for the first time, see Use Hive.
For Kerberos authentication, see Create and Manage Account.
Hive commands can be executed in Dev app as follows.
$ kinit example -kt df.example.keytab # verify Kerberos
$ beeline -u "jdbc:hive2://zk1.kr.df.naverncp.com:2181,zk2.kr.df.naverncp.com:2181,zk3.kr.df.naverncp.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2" -n example -p 'PASSWORD'
Create database
Data Forest users cannot use Hive's default database. A new user database must be created according to rules.
In order to create a Hive database in Data Forest, the user must follow the rules below.
A database name should follow the
{USER}__db_*
format.
If the user's account name isexample
, then the DB names can be designated as below.example__db_foo example__db_bar example__db_hello_foo example__db_hello_foo_bar
A database name can't contain
-
(hyphen). If the user name contains-
(hyphen), then replace it with_
(underscore). If the user's account name isexample-ex
, then the DB names can be designated as below.example_ex__db_foo
The following are examples of DB names that violate the rule.
example_db_foo # when the number of _(underscore) does not match example__DB_foo # when db is capitalized example-ex__db_foo # when the -(hyphen)is not replaced with _(underscore)
Hive table data is stored in HDFS. When creating a database, the path must be designated with the LOCATION settings. This path is limited to warehouse subordinate paths under the user's home directory.
- First, create the
warehouse
directory under the user's home directory.$ hadoop fs -mkdir warehouse $ hadoop fs -ls Found 4 items ... drwx------ - example hdfs 0 2021-03-29 16:18 warehouse
- You can create a database with the Beeline command as follows.
# create database CREATE DATABASE example__db_foo LOCATION '/user/example/warehouse/example__db_foo.db'; # use database USE example__db_foo;
- First, create the
You can't see other users' databases when viewing databases with the Beeline command. However, you can have the entire database list returned with Spark SQL. Make sure to not include personal information in a database name.
Create table
There is no naming restrictions when creating tables. The LOCATION
option is not required, but if you'd like to specify one separately, specify it to be the path under /user/example/warehouse.
CREATE TABLE `table1` (`id` BIGINT) LOCATION '/user/example/warehouse/table1';
Command limitations
SET
, and TRANSFORM
clauses cannot be used. An update is planned where a dedicated queue is adopted so that the SET
command can be used.
Set Spark-Hive access
You should use LLAP to read Hive's Managed Table data from Spark. The following explains how to set up to access Hive through Interactive Hive.
Check access information
The following access information can be set:
Property | Value |
---|---|
spark.sql.hive.hiveserver2.jdbc.url (Interactive) | jdbc:hive2://zk1.kr.df.naverncp.com:2181,zk2.kr.df.naverncp.com:2181,zk3.kr.df.naverncp.com:2181/;serviceDiscoveryMode=zooKeeperHA;zooKeeperNamespace=df-interactive |
spark.sql.hive.hiveserver2.jdbc.url (Batch) | jdbc:hive2://zk1.kr.df.naverncp.com:2181,zk2.kr.df.naverncp.com:2181,zk3.kr.df.naverncp.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2 |
spark.datasource.hive.warehouse.metastoreUri | thrift://hms1.kr.df.naverncp.com:9083,thrift://hms2.kr.df.naverncp.com:9083 |
spark.datasource.hive.warehouse.load.staging.dir | /user/{dataforest account name}/tmp |
spark.hadoop.hive.zookeeper.quorum | zk1.kr.df.naverncp.com:2181,zk2.kr.df.naverncp.com:2181,zk3.kr.df.naverncp.com:2181 |
spark.hadoop.hive.llap.daemon.service.hosts | @df-llap |
principal | {dataforest account name}@KR.DF.NAVERNCP.COM |
keytab | {dataforest account keytab file path} |
jars | {hwc jar file path} |
- You can see that there are two kinds of connection strings, interactive and batch, in
spark.sql.hive.hiveserver2.jdbc.url
from the table content. - You should use an interactive connection string to conduct tasks in the Interactive Hive. You can conduct tasks by using the same settings value in Spark shell and PySpark.
- Principals and keytabs use Kerberos accounts from the Data Forest with the setup required to run Spark Executors on YARN.
spark.sql.hive.hiveserver2.jdbc.url
's user and password are authentication information to access the Interactive Hive within Executor. Use the Data Forest account name (same as Kerberos ID) and Ranger password.
The example of conducting tasks in Interactive Hive
The following is the example of working in Interactive Hive via spark-shell.
spark-shell \
--master local[2] \
--conf spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://zk1.kr.df.naverncp.com:2181,zk2.kr.df.naverncp.com:2181,zk3.kr.df.naverncp.com:2181/;serviceDiscoveryMode=zooKeeperHA;zooKeeperNamespace=df-interactive" \
--conf spark.datasource.hive.warehouse.metastoreUri="thrift://hms1.kr.df.naverncp.com:9083,thrift://hms2.kr.df.naverncp.com:9083" \
--conf spark.datasource.hive.warehouse.load.staging.dir=/user/example/tmp \
--conf spark.hadoop.hive.zookeeper.quorum="zk1.kr.df.naverncp.com:2181,zk2.kr.df.naverncp.com:2181,zk3.kr.df.naverncp.com:2181" \
--conf spark.hadoop.hive.llap.daemon.service.hosts=@df-llap \
--conf spark.security.credentials.hiveserver2.enabled=false \
--conf spark.sql.hive.hiveserver2.jdbc.url.principal="hive/_HOST@KR.DF.NAVERNCP.COM" \
--principal example@KR.DF.NAVERNCP.COM \
--keytab ./df.example.keytab \
--jars /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar
The screen appears as follows once the login is complete.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://ambari1.kr.df.naverncp.com:4040
Spark context available as 'sc' (master = local[2], app id = local-1624438130183).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.3.2.3.1.0.0-78
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112)
Type in expressions to have them evaluated.
Type :help for more information.
scala> import com.hortonworks.hwc.HiveWarehouseSession
import com.hortonworks.hwc.HiveWarehouseSession
scala> import com.hortonworks.hwc.HiveWarehouseSession._
import com.hortonworks.hwc.HiveWarehouseSession._
scala> val hive1 = HiveWarehouseSession.session(spark).build()
hive1: com.hortonworks.spark.sql.hive.llap.HiveWarehouseSessionImpl = com.hortonworks.spark.sql.hive.llap.HiveWarehouseSessionImpl@3da61af2