Using iceberg
    • PDF

    Using iceberg

    • PDF

    Article Summary

    Available in VPC

    Iceberg is an open table type system for a vast analysis data set that adds a table which uses a high-performance SQL table for Presto and Spark.

    Iceberg components

    NiFi is composed of 3 components that constitutes a hierarchy which are iceberg catalog, metadata later, and data layer.
    chadoop-iceberg-1-1_ko

    • Iceberg Catalog Layer
      Used to identify the location or to read the data for the specified table. Iceberg Catalog helps to find the table metadata from current point-in-time. It is also used to find the metadata file that is needed to execute the query.
    • Metadata Layer
      It is composed of metadata file, manifest list, and manifest file. The metadata file contains information of snapshot, partition, and schema of the table which are needed to quickly find the required data from the query.
    • Data Layer
      It is used to store the actual data file and uses the meta information of the manifest file to access the required data file.

    Using iceberg

    The following describes how to use iceberg.

    Caution

    The description of the following example is based on version 1.2.1.

    Test using hive shell

    1. Access hive.
    [hive@dev-nch023-ncl ~]$ hive
    Hive Session ID = cca75225-f55c-423b-b6c8-d8fb0
    hive> set hive.vectorized.execution.enabled=false;
    hive> set iceberg.engine.hive.lock-enabled=false;
    hive> set tez.mrreader.config.update.properties=hive.io.file.readcolumn.names,hive.io.file.readcolumn.ids;
    hive> set hive.execution.engine=mr;
    
    1. Create database.
    hive> create database test;
    OK
    Time taken: 2.182 seconds
    
    1. Select database.
    hive> use test;
    OK
    Time taken: 0.278 seconds
    Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    
    1. Create table.
    hive> CREATE EXTERNAL TABLE test_tbl (id int) STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler';
    OK
    Time taken: 2.796 seconds
    
    1. Use the iceberg library using add jar.
    hive> add jar /usr/nch/3.1.0.0-78/hive/lib/iceberg-hive-runtime-1.2.1.jar;
    Added [/usr/nch/3.1.0.0-78/hive/lib/iceberg-hive-runtime-1.2.1.jar] to class path
    Added resources: [/usr/nch/3.1.0.0-78/hive/lib/iceberg-hive-runtime-1.2.1.jar]
    
    1. Use the libfb library using add jar.
    hive> add jar /usr/nch/3.1.0.0-78/hive/lib/libfb303-0.9.3.jar;
    Added [/usr/nch/3.1.0.0-78/hive/lib/libfb303-0.9.3.jar] to class path
    Added resources: [/usr/nch/3.1.0.0-78/hive/lib/libfb303-0.9.3.jar]
    
    1. Enter the data using insert.
    hive> INSERT INTO test_tbl values (1);
    Query ID = hive_20231012143056_a80b-fe72-472a-8773-4e7589
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks is set to 0 since there's no reduce operator
    23/10/12 14:30:57 INFO client.AHSProxy: Connecting to Application History server at dev-nch023-ncl.nfra.io/10.168.142.23:10200
    23/10/12 14:30:57 INFO client.AHSProxy: Connecting to Application History server at dev-nch023-ncl.nfra.io/10.168.142.23:10200
    Starting Job = job_1696850670798_0017, Tracking URL = http://dev-nch2-ncl.nfra.io:8088/proxy/application_1696850670798_0017/
    Kill Command = /usr/nch/3.1.0.0-78/hadoop/bin/mapred job  -kill job_1696850670798_0017
    Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 0
    2023-10-12 14:31:07,818 Stage-2 map = 0%,  reduce = 0%
    2023-10-12 14:31:16,035 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 5.33 sec
    MapReduce Total cumulative CPU time: 5 seconds 330 msec
    Ended Job = job_16968506_0017
    MapReduce Jobs Launched:
    Stage-Stage-2: Map: 1   Cumulative CPU: 5.33 sec   HDFS Read: 173742 HDFS Write: 2611 SUCCESS
    Total MapReduce CPU Time Spent: 5 seconds 330 msec
    OK
    Time taken: 22.507 seconds
    
    1. Check the data with select.
    hive> select * from test_tbl;
    OK
    1
    Time taken: 0.493 seconds, Fetched: 1 row(s)
    
    1. Check the table schema.
    hive> show create table test_tbl;
    OK
    CREATE EXTERNAL TABLE `test_tbl`(
      `id` int COMMENT 'from deserializer')
    ROW FORMAT SERDE
      'org.apache.iceberg.mr.hive.HiveIcebergSerDe'
    STORED BY
      'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
    
    LOCATION
      'hdfs://test-test/warehouse/tablespace/managed/hive/test.db/test_tbl'
    TBLPROPERTIES (
      'bucketing_version'='2',
      'current-schema'='{"type":"struct","schema-id":0,"fields":[{"id":1,"name":"id","required":false,"type":"int"}]}',
      'current-snapshot-id'='128779159509',
      'current-snapshot-summary'='{"added-data-files":"1","added-records":"1","added-files-size":"407","changed-partition-count":"1","total-records":"1","total-files-size":"407","total-data-files":"1","total-delete-files":"0","total-position-deletes":"0","total-equality-deletes":"0"}',
      'current-snapshot-timestamp-ms'='1697088677165',
      'engine.hive.enabled'='true',
      'external.table.purge'='TRUE',
      'last_modified_by'='hive',
      'last_modified_time'='1697088657',
      'metadata_location'='hdfs://test-test/warehouse/tablespace/managed/hive/test.db/test_tbl/metadata/00001-33b09b82-b9b9-4005-a804-3f7970fc23ec.metadata.json',
      'previous_metadata_location'='hdfs://test-test/warehouse/tablespace/managed/hive/test.db/test_tbl/metadata/00000-5a7c11d1-b12b-45a-a75a8c975f85.metadata.json',
      'snapshot-count'='1',
      'table_type'='ICEBERG',
      'transient_lastDdlTime'='1697088657',
      'uuid'='95dffef0-97e6-4ca2-ae01-b5bfde8')
    Time taken: 0.315 seconds, Fetched: 25 row(s)
    

    Was this article helpful?

    What's Next
    Changing your password will log you out immediately. Use the new password to log back in.
    First name must have atleast 2 characters. Numbers and special characters are not allowed.
    Last name must have atleast 1 characters. Numbers and special characters are not allowed.
    Enter a valid email
    Enter a valid password
    Your profile has been successfully updated.