Integrate CSV files with Hive External Table

Prev Next

Available in VPC

Hive External Table are the tables created with the external keyword when creating tables in Hive. Hive External Table does not store data in the directory specified by the hive.metastore.warehouse.dir property, but it stores data in the path set when the table is created.

Since tables are created based on data that already exists in Hadoop, you can create them by simply setting the schema.

Note

You can also see examples related to Hive External Tables in the Using Hive guide.

This guide explains how to use CSV data stored in Object Storage by integrating it with Hive External Table provided by Cloud Hadoop.

Prepare sample data

Use the temperature data provided by the Korea Meteorological Administration Meteorological Data Open Portal as sample data to test Hive External Table.

To prepare sample data:

  1. Download Seoul data and Busan data.
  • The search condition information of the sample data is as follows:
    • Category: Ground.
    • Region/Branch: Seoul or Busan.
    • Component: Temperature.
    • Period: Select Day and set the range to "2011–2021."
    • Condition: Mark the checkboxes for Month and Day.
  1. Check the content of the downloaded CSV file. You can see that from the top, the header content starts at Line 12, and the temperature data starts at Line 13.
    hadoop-chadoop-use-ex10_2-3_ko
Note

Problems with viewing Korean characters can be solved by changing the encoding. For details, check the notes about problems with viewing Korean characters in Create Hive External Table.

  1. Remove the unnecessary header part from the downloaded CSV file and rename the file using the following command.
    $ sed 1,12d extremum_20230620155750.csv > data1.csv
    $ sed 1,12d extremum_20230620155821.csv > data2.csv
    

Upload data to Object Storage

To upload data to Object Storage:

  1. Create a live-test-bucket bucket in Object Storage.
  2. Create the hivedata directory in the created bucket, and upload the sample data (CSV) file inside.
    hadoop-chadoop-use-ex10_2-4_ko
Note

For more information about creating buckets, see Object Storage user guide.

Create Hive External Table

To create Hive External Tables:

  1. Connect to the edge node of the Cloud Hadoop cluster through SSH.
  2. Run Hive CLI using the following commands:
    $ hive 
    
  3. Create an External Table using the Hive client command.
    • weather Once the table is created, you can verify the data by executing a SELECT query.

      CREATE EXTERNAL TABLE weather
      (
      no   STRING,
      area STRING,
      day  STRING,
      avg  FLOAT,
      max  FLOAT,
      maxTime STRING,
      min  FLOAT,
      minTime STRING,
      diff FLOAT
      ) 
      ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
      STORED AS TEXTFILE
      LOCATION 's3a://live-test-bucket/hivedata';
      
      SELECT count(*) FROM weather;
      +-------+
      |  _c0  |
      +-------+
      | 7913  |
      +-------+
      
      SELECT * FROM weather WHERE day = '2011-01-01';
      +-------------+---------------+--------------+--------------+--------------+------------------+--------------+------------------+---------------+
      | weather.no  | weather.area  | weather.day  | weather.avg  | weather.max  | weather.maxtime  | weather.min  | weather.mintime  | weather.diff  |
      +-------------+---------------+--------------+--------------+--------------+------------------+--------------+------------------+---------------+
      | 		159       | Busan           | 2011-01-01   | -1.1         | 4.1          | 14:55            | -5.8         | 06:40            | 9.9           |
      | 		108       | Seoul          | 2011-01-01   | -6.8         | -2.9         | 14:57            | -10.4        | 01:54            | 7.5           |
      +-------------+---------------+--------------+--------------+--------------+------------------+--------------+------------------+---------------+
      
      Note
      • Problems with viewing Korean characters
        The Korean characters can't be viewed correctly because the data provided by the Korea Meteorological Administration is encoded with euc-kr rather than utf-8. If you change the encoding of the Hive external table to euc-kr as shown below, then the characters is shown correctly.
      ALTER TABLE weather SET TBLPROPERTIES('serialization.encoding'='euc-kr');
      
      SELECT * FROM weather WHERE day = '2011-01-01';
      +-------------+---------------+--------------+--------------+--------------+------------------+--------------+------------------+---------------+
      | weather.no  | weather.area  | weather.day  | weather.avg  | weather.max  | weather.maxtime  | weather.min  | weather.mintime  | weather.diff  |
      +-------------+---------------+--------------+--------------+--------------+------------------+--------------+------------------+---------------+
      | 		159       | Busan            | 2011-01-01   | -1.1         | 4.1          | 14:55            | -5.8         | 06:40            | 9.9           |
      | 		108       | Seoul            | 2011-01-01   | -6.8         | -2.9         | 14:57            | -10.4        | 01:54            | 7.5           |
      +-------------+---------------+--------------+--------------+--------------+------------------+--------------+------------------+---------------+
      

Delete Hive External Table

To delete Hive External Tables:

  1. Delete the Hive External Table created using the following command:
    • Below is an example of deleting the Hive External Table ("weather") created earlier.
DROP TABLE weather;
  1. Run the SELECT query on the deleted table. You can see that the table has been successfully deleted, even though an error has occurred.
SELECT * FROM weahter;

Error: Error while compiling statement: FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'weahter' (state=42S02,code=10001)

cloudhadoop-objectstorage-filefolder_ko

Note

Even if the Hive External Table is deleted, the CSV file in Object Storage is not deleted, but stored safely.