Public Hive
    • PDF

    Public Hive

    • PDF

    Article Summary

    Available in VPC

    Describe how to create and use Hive databases and tables in Data Forest.

    Note

    UDF and Storage Handler features are under preparation.

    Run Hive commands

    If you are connecting to HiveServer for the first time, see Use Hive.
    For Kerberos authentication, see Create and Manage Account.

    Hive commands can be executed in Dev app as follows.

    $ kinit example -kt df.example.keytab # verify Kerberos
    $ beeline -u "jdbc:hive2://zk1.kr.df.naverncp.com:2181,zk2.kr.df.naverncp.com:2181,zk3.kr.df.naverncp.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2" -n example -p 'PASSWORD'
    

    Create database

    Note

    Data Forest users cannot use Hive's default database. A new user database must be created according to rules.

    In order to create a Hive database in Data Forest, the user must follow the rules below.

    • A database name should follow the {USER}__db_* format.
      If the user's account name is example, then the DB names can be designated as below.

      example__db_foo
      example__db_bar
      example__db_hello_foo
      example__db_hello_foo_bar
      
    • A database name can't contain - (hyphen). If the user name contains - (hyphen), then replace it with _ (underscore). If the user's account name is example-ex, then the DB names can be designated as below.

      example_ex__db_foo
      

      The following are examples of DB names that violate the rule.

      example_db_foo  # when the number of _(underscore) does not match 
      example__DB_foo # when db is capitalized
      example-ex__db_foo # when the -(hyphen)is not replaced with _(underscore)
      
    • Hive table data is stored in HDFS. When creating a database, the path must be designated with the LOCATION settings. This path is limited to warehouse subordinate paths under the user's home directory.

      • First, create the warehouse directory under the user's home directory.
        $ hadoop fs -mkdir warehouse
        $ hadoop fs -ls
        Found 4 items
        ...
        drwx------   - example hdfs          0 2021-03-29 16:18 warehouse
        
      • You can create a database with the Beeline command as follows.
        # create database
        CREATE DATABASE example__db_foo LOCATION   '/user/example/warehouse/example__db_foo.db';
        # use database
        USE example__db_foo;
        
    Caution

    You can't see other users' databases when viewing databases with the Beeline command. However, you can have the entire database list returned with Spark SQL. Make sure to not include personal information in a database name.

    Create table

    There is no naming restrictions when creating tables. The LOCATION option is not required, but if you'd like to specify one separately, specify it to be the path under /user/example/warehouse.

    CREATE TABLE `table1` (`id` BIGINT) LOCATION '/user/example/warehouse/table1';
    

    Command limitations

    SET, and TRANSFORM clauses cannot be used. An update is planned where a dedicated queue is adopted so that the SET command can be used.

    Set Spark-Hive access

    You should use LLAP to read Hive's Managed Table data from Spark. The following explains how to set up to access Hive through Interactive Hive.

    Check access information

    The following access information can be set:

    PropertyValue
    spark.sql.hive.hiveserver2.jdbc.url (Interactive)jdbc:hive2://zk1.kr.df.naverncp.com:2181,zk2.kr.df.naverncp.com:2181,zk3.kr.df.naverncp.com:2181/;serviceDiscoveryMode=zooKeeperHA;zooKeeperNamespace=df-interactive
    spark.sql.hive.hiveserver2.jdbc.url (Batch)jdbc:hive2://zk1.kr.df.naverncp.com:2181,zk2.kr.df.naverncp.com:2181,zk3.kr.df.naverncp.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
    spark.datasource.hive.warehouse.metastoreUrithrift://hms1.kr.df.naverncp.com:9083,thrift://hms2.kr.df.naverncp.com:9083
    spark.datasource.hive.warehouse.load.staging.dir/user/{dataforest account name}/tmp
    spark.hadoop.hive.zookeeper.quorumzk1.kr.df.naverncp.com:2181,zk2.kr.df.naverncp.com:2181,zk3.kr.df.naverncp.com:2181
    spark.hadoop.hive.llap.daemon.service.hosts@df-llap
    principal{dataforest account name}@KR.DF.NAVERNCP.COM
    keytab{dataforest account keytab file path}
    jars{hwc jar file path}
    • You can see that there are two kinds of connection strings, interactive and batch, in spark.sql.hive.hiveserver2.jdbc.url from the table content.
    • You should use an interactive connection string to conduct tasks in the Interactive Hive. You can conduct tasks by using the same settings value in Spark shell and PySpark.
    • Principals and keytabs use Kerberos accounts from the Data Forest with the setup required to run Spark Executors on YARN.
    • spark.sql.hive.hiveserver2.jdbc.url's user and password are authentication information to access the Interactive Hive within Executor. Use the Data Forest account name (same as Kerberos ID) and Ranger password.

    The example of conducting tasks in Interactive Hive

    The following is the example of working in Interactive Hive via spark-shell.

    spark-shell \
      --master local[2] \
      --conf spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://zk1.kr.df.naverncp.com:2181,zk2.kr.df.naverncp.com:2181,zk3.kr.df.naverncp.com:2181/;serviceDiscoveryMode=zooKeeperHA;zooKeeperNamespace=df-interactive" \
      --conf spark.datasource.hive.warehouse.metastoreUri="thrift://hms1.kr.df.naverncp.com:9083,thrift://hms2.kr.df.naverncp.com:9083" \
      --conf spark.datasource.hive.warehouse.load.staging.dir=/user/example/tmp \
      --conf spark.hadoop.hive.zookeeper.quorum="zk1.kr.df.naverncp.com:2181,zk2.kr.df.naverncp.com:2181,zk3.kr.df.naverncp.com:2181" \
      --conf spark.hadoop.hive.llap.daemon.service.hosts=@df-llap \
      --conf spark.security.credentials.hiveserver2.enabled=false \
      --conf spark.sql.hive.hiveserver2.jdbc.url.principal="hive/_HOST@KR.DF.NAVERNCP.COM" \
      --principal example@KR.DF.NAVERNCP.COM \
      --keytab ./df.example.keytab \
      --jars /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar
    

    The screen appears as follows once the login is complete.

    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
    Spark context Web UI available at http://ambari1.kr.df.naverncp.com:4040
    Spark context available as 'sc' (master = local[2], app id = local-1624438130183).
    Spark session available as 'spark'.
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _\ \/ _ \/ _ `/ __/  '_/
       /___/ .__/\_,_/_/ /_/\_\   version 2.3.2.3.1.0.0-78
          /_/
    
    Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112)
    Type in expressions to have them evaluated.
    Type :help for more information.
    
    scala> import com.hortonworks.hwc.HiveWarehouseSession
    import com.hortonworks.hwc.HiveWarehouseSession
    
    scala> import com.hortonworks.hwc.HiveWarehouseSession._
    import com.hortonworks.hwc.HiveWarehouseSession._
    
    scala>  val hive1 = HiveWarehouseSession.session(spark).build()
    hive1: com.hortonworks.spark.sql.hive.llap.HiveWarehouseSessionImpl = com.hortonworks.spark.sql.hive.llap.HiveWarehouseSessionImpl@3da61af2
    

    Was this article helpful?

    What's Next
    Changing your password will log you out immediately. Use the new password to log back in.
    First name must have atleast 2 characters. Numbers and special characters are not allowed.
    Last name must have atleast 1 characters. Numbers and special characters are not allowed.
    Enter a valid email
    Enter a valid password
    Your profile has been successfully updated.