Linking CSV file with Hive external table
    • PDF

    Linking CSV file with Hive external table

    • PDF

    Article Summary

    Available in Classic

    Hive external table refers to the tables created with the external keyword when creating tables in Hive. Hive external table does not store data in the directory specified by the hive.metastore.warehouse.dir property, but it stores data in the path set when the table is created.

    Since tables are created based on data that already exists in Hadoop, you can create them by simply setting the schema.

    Note

    You can also see examples related to Hive external tables in Using Hive.

    This guide describes how to use CSV data stored in Object Storage by linking it to an external table in Hive provided by Cloud Hadoop.

    Prepare sample data

    Download the temperature data provided by the Korea Meteorological Administration Meteorological Data Open Portal as sample data to test Hive external table.

    The following describes how to prepare sample data.

    1. Access Korea Meteorological Administration Meteorological Data Open Portal, and then click Climate Statistical Analysis > Statistical Analysis > Statistics by Condition from the main page,, in that order.

    2. Click the [Search] button after setting the search conditions as follows.

      • Category: ground
      • Region/Branch: Seoul or Busan
      • Component: temperature
      • Period: After selecting days, set as 2011-2021
      • Condition: Mark the checkboxes for Month and Day
        hadoop-chadoop-use-ex10_1-1_en
    3. Click the [CSV] button at the bottom right of the result graph to download the CSV file.
      hadoop-chadoop-use-ex10_1-2_en

    4. Repeat Steps 2 and 3 above to download the temperature data in Seoul and Busan.

    5. Check the content of the downloaded CSV file. You can see that from the top, the header content starts at Line 12, and the temperature data starts at Line 13.
      hadoop-chadoop-use-ex10_2-3_en

    Note

    Problems with viewing Korean characters can be solved by changing the encoding. For details, check the notes about problems with viewing Korean characters in Create Hive external table.

    1. Remove the unnecessary header part from the downloaded CSV file and rename the file using the following command.
      $ sed 1,12d extremum_20211021101126.csv > data1.csv
      $ sed 1,12d extremum_20211020212220.csv > data2.csv
      

    Upload data to Object Storage

    The following describes how to upload data to Object Storage.

    1. Create the live-test-bucket bucket in Object Storage.
    2. Create the hivedata directory in the created bucket, and upload the sample data (CSV) file inside.
      hadoop-chadoop-use-ex10_2-4_en
    Note

    Please refer to Object Storage Guide for more information about creating buckets.

    Create Hive external table

    The following describes how to create a Hive external table.

    1. SSH login to the edge server of the Cloud Hadoop cluster.

    2. Create an external table using the Hive client command.

      • The weather table has been created, and you can check the data by executing the SELECT query.
      CREATE EXTERNAL TABLE weather
      (
      no   STRING,
      area STRING,
      day  STRING,
      avg  FLOAT,
      max  FLOAT,
      maxTime STRING,
      min  FLOAT,
      minTime STRING,
      diff FLOAT
      ) 
      ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
      STORED AS TEXTFILE
      LOCATION 's3a://live-test-bucket/hivedata';
      
      SELECT count(*) FROM weather;
      +-------+
      |  _c0  |
      +-------+
      | 7913  |
      +-------+
      
      SELECT * FROM weather WHERE day = '2011-01-01';
      +-------------+---------------+--------------+--------------+--------------+------------------+--------------+------------------+---------------+
      | weather.no  | weather.area  | weather.day  | weather.avg  | weather.max  | weather.maxtime  | weather.min  | weather.mintime  | weather.diff  |
      +-------------+---------------+--------------+--------------+--------------+------------------+--------------+------------------+---------------+
      | 		159       | Busan           | 2011-01-01   | -1.1         | 4.1          | 14:55            | -5.8         | 06:40            | 9.9           |
      | 		108       | Seoul          | 2011-01-01   | -6.8         | -2.9         | 14:57            | -10.4        | 01:54            | 7.5           |
      +-------------+---------------+--------------+--------------+--------------+------------------+--------------+------------------+---------------+
      
    Note

    Problems with viewing Korean characters
    The Korean characters can't be viewed correctly because the data provided by the Korea Meteorological Administration is encoded with euc-kr rather than utf-8. If you change the encoding of the Hive external table to euc-kr as shown below, then the characters will be shown correctly.

    ALTER TABLE weather SET TBLPROPERTIES('serialization.encoding'='euc-kr');
    
    SELECT * FROM weather WHERE day = '2011-01-01';
    +-------------+---------------+--------------+--------------+--------------+------------------+--------------+------------------+---------------+
    | weather.no  | weather.area  | weather.day  | weather.avg  | weather.max  | weather.maxtime  | weather.min  | weather.mintime  | weather.diff  |
    +-------------+---------------+--------------+--------------+--------------+------------------+--------------+------------------+---------------+
    | 		159       | Busan            | 2011-01-01   | -1.1         | 4.1          | 14:55            | -5.8         | 06:40            | 9.9           |
    | 		108       | Seoul            | 2011-01-01   | -6.8         | -2.9         | 14:57            | -10.4        | 01:54            | 7.5           |
    +-------------+---------------+--------------+--------------+--------------+------------------+--------------+------------------+---------------+
    

    Delete Hive external table

    The following describes how to delete a Hive external table.

    1. Delete the Hive external table created using the following command.

      • Below is an example of deleting the Hive external table (weather) created earlier.
      DROP TABLE weather;
      
    2. Run the SELECT query on the deleted table. You can see that the table has been successfully deleted, even though an error has occurred.

    SELECT * FROM weahter;
      
    Error: Error while compiling statement: FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'weahter' (state=42S02,code=10001)
    

    cloudhadoop-objectstorage-filefolder_en

    Note

    Even if you delete the Hive external table, the CSV file in Object Storage is not deleted, but stored safely.


    Was this article helpful?

    Changing your password will log you out immediately. Use the new password to log back in.
    First name must have atleast 2 characters. Numbers and special characters are not allowed.
    Last name must have atleast 1 characters. Numbers and special characters are not allowed.
    Enter a valid email
    Enter a valid password
    Your profile has been successfully updated.