Using Hive
    • PDF

    Using Hive

    • PDF

    Article Summary

    Available in Classic

    This guide describes a method to save data in an Object Storage bucket and run a simple Hive query through Hue and Beeline.

    Architecture example

    Save data that needs to be kept in an Object Storage bucket, and use the Cloud Hadoop cluster only when necessary.

    chadoop-4-5-007.png

    When running the Hive query, the query is run to the next step.

    1. Submit query to Hive server in Cloud Hadoop cluster in Hive client
    2. The server processes the query and requests metadata from the Hive metadate DB (MySQL) installed on the master server
    3. Server loads the data stored in Object Storage bucket
    4. Hive server returns result to the client

    Create Hive table

    Examples) Upload a sample data file to the Object Storage bucket of NAVER Cloud Platform, and create a Hive External Table to allow the use of the data from Hive

    Note

    To use data in Object Storage bucket, the following configuration is needed in hive-site.xml.

    fs.s3a.access.key=<API-ACCESS-KEY>
    fs.s3a.connection.ssl.enabled=false
    fs.s3a.endpoint=http://kr.objectstorage.ncloud.com
    fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
    fs.s3a.secret.key=<API-SECRET-KEY>
    

    The method to create a Hive table is as follows:

    1. Download sample data, unzip it, and upload the AllstarFull.csv file to the Object Storage bucket.

      • Since Hive reads the location of the data file by folder, save one data item per folder. (Recommended)
      Note

      The provided sample data is a portion of Lahman's Baseball Database Version 2012, and all copyrights of the data belong to Sean Lahman.

      chadoop-4-5-001_en.png

    2. In Ambari Hive View2.0 or Hue Hive Editor, create a Hive External Table with the following phrase:

      • location: designate a bucket path with a data set file saved
      DROP table allstarfull;
      
      CREATE external TABLE if not EXISTS `allstarfull` (
              `playerID` VARCHAR(20),
              `yearID` INT,
              `gameNum` INT,
              `gameID` VARCHAR(30),
              `teamID` VARCHAR(4),
              `lgID` VARCHAR(4),
              `GP` INT,
              `startingPos` INT
      )
      ROW FORMAT DELIMITED
      FIELDS TERMINATED BY ','
      location 's3a://deepdrive-hue/input/lahman2012/allstarfull';
      

    Run Hive query

    You can run a Hive query using various tools provided by Cloud Hadoop. This guide describes how to run a Hive query with the following <Example of Hive query running tools>.

    • <Example of Hive query running tools>
      • Hive interpreter of Hue
      • Beeline based on SQLLine CLI among Hive clients

    Hive interpreter of Hue

    The method to run Hive query using Hive interpreter of Hue is as follows:

    1. Log in to Hue and run HiveQL query in Hive interpreter.

      SELECT * FROM allstarfull LIMIT 10;
      
    2. Check if the file uploaded to s3 is connected to the Hive table through the result page.
      chadoop-4-5-002_C_en.png

    Beeline

    The method to run Hive query using Beeline is as follows:

    1. Directly access the host with the Hive client and start a Beeline session with the following command:

      • Enter the master node address (m-00x) into [HIVE-SERVER2-SERVER].
      beeline -u "jdbc:hive2://[HIVE-SERVER2-SERVER]:10000" -n hive
      
      Note

      For how to access the host, see the Connecting to cluster nodes through SSH guide.

    2. When the Beeline prompt is displayed, run HiveQL query.

      • The following query does not print the result but saves it in Object Storage.
      beeline> INSERT OVERWRITE DIRECTORY 's3a://deepdrive-hue/output/' ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
      SELECT playerid, sum(gp) from allstarfull group by playerid;
      
      Note

      Before saving the query, make sure the specified location (directory) exists in the Object Storage bucket.

    3. Run the following command to check if the result was saved successfully.

      hadoop fs -cat s3a://deepdrive-hue/output/000000_0
      

      chadoop-4-5-003_en.png

      The result is as follows:

      aaronha01,24
      aasedo01,1
      abreubo01,2
      adamsac01,0
      adcocjo01,2
      ageeto01,2
      aguilri01,3
      aguirha01,1
      alexado01,0
      alfoned01,1
      allendi01,6
      allenjo02,1
      alleyge01,1
      allisbo01,2
      alomaro01,12
      ...
      

    Activation of LLAP of Hue

    The method to activate LLAP in the editor list of Hue is as follows:

    Note

    You can select the cluster created in Cloud Hadoop console and access Ambari Web UI through [View by Application]. For more information, see the Ambaru UI guide.

    1. Access Ambari and activate Hive LLAP in [Hive] > [CONFIGS] > [SETTINGS]. Add settings, press [SAVE] to save, and press [RESTART] to restart.

    chadoop-4-5-008_ko.png

    1. Activate Hue Hive LLAP Module in [Hue] > [CONFIGS] > [Hue Service Module]. Add settings, press [SAVE] to save, and press [RESTART] to restart.

    chadoop-4-5-009_ko.png

    1. You can see the activated LLAP editor in the editor list of Hue.

    chadoop-4-5-010_ko.png


    Was this article helpful?

    Changing your password will log you out immediately. Use the new password to log back in.
    First name must have atleast 2 characters. Numbers and special characters are not allowed.
    Last name must have atleast 1 characters. Numbers and special characters are not allowed.
    Enter a valid email
    Enter a valid password
    Your profile has been successfully updated.