Integrate Apache Hive with Data Catalog

Prev Next

Available in VPC

This guide describes how to configure Data Catalog as Metastore for Apache Hive.

Caution

These settings are available only with the main account of NAVER Cloud Platform.

Preparations

A self-managed environment where Hive is operational must already be set up.

Note

NAVER Cloud Platform's Cloud Hadoop allows you to integrate Data Catalog with Hive Metastore storage through the configuration during the cluster creation process.

1. Install after applying Apache Hive patch

  1. Clone Apache Hive.

    git clone https://github.com/apache/hive.git
    
  2. Download the branch_3.1.patch file and apply the patch to the Apache Hive 3.1 version. After that, proceed with a new build.

    • Download link: https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore/blob/branch-3.4.0/branch_3.1.patch
    cd <your local hive source path>
    git checkout branch-3.1
    git apply -3 branch_3.1.patch
    mvn clean install -DskipTests
    
  3. After building, place hive-exec-3.1.3.jar and hive-common-3.1.3.jar files in Hive's CLASSPATH.

Note
  • Hive's CLASSPATH is typically specified as {HIVE_HOME}/lib/.

2. Download Hive Client for Data Catalog

  • Download Hive Client for Data Catalog.
  • Place the jar files in Hive's CLASSPATH.
Note

Hive's CLASSPATH is typically specified as {HIVE_HOME}/lib/.

3. Download Object Storage-related library

wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.2.4/hadoop-aws-3.2.4.jar
wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.11.375/aws-java-sdk-bundle-1.11.375.jar
  • Place the jar files in Hive's CLASSPATH.
Note

Hive's CLASSPATH is typically specified as {HIVE_HOME}/lib/.

4. Change hive-site.xml

To use NAVER Cloud Platform's Data Catalog and Object Storage buckets with Hive, add the following to hive-site.xml:

<configuration>

    <!-- Data Catalog settings-->
    <property>
        <name>hive.metastore.client.factory.class</name>
        <value>com.navercorp.ncp.catalog.metastore.NCPCatalogMetastoreClientFactory</value>
    </property>
    <property>
        <name>hive.metastore.api.endpoint</name>
        <value>https://datacatalog.apigw.ntruss.com</value>
    </property>
    
    <!-- Object Storage settings-->
    <property>
        <name>fs.s3a.endpoint</name>
        <value>http://kr.objectstorage.ncloud.com</value>
    </property>
    <property>
        <name>fs.s3a.access.key</name>
        <value>{your-access-key}</value>
    </property>
    <property>
        <name>fs.s3a.secret.key</name>
        <value>{your-secret-key}</value>
    </property>
    <property>
        <name>fs.s3a.connection.ssl.enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>fs.s3a.impl</name>
        <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
    </property>
</configuration>
Note

The hive-site.xml is typically located under {HIVE_HOME}/conf/.

Verify integrations

Run the Hive CLI and check if the commands are working properly.

Note
  • Integrations with DBMS, such as MySQL, MSSQL, and PostgreSQL, are not supported.
  • Integration with Iceberg tables is not supported either.