Analyze Hive warehouse data with Presto (Trino)

Prev Next

Available in VPC

This guide explains how to use Hive Connector of Presto (Trino) to analyze data stored in a Hive data warehouse.

Note
  • Up to Cloud Hadoop 1.9, it was used under the name Presto, and in Cloud Hadoop 2.0, it is used under the name Trino.

Architecture

Presto (Trino) is mainly for interactive queries, while Hive is used for batch jobs. You can schedule batch jobs using applications such as Oozie and Airflow.

Presto (Trino) not only enables access to a variety of data sources using connectors, but it also allows you to query multiple data sources in a single query. When you use Hive Connector, Presto (Trino) only uses data stored in Hive metadata and Hive warehouse. It does not use HiveQL or Hive's query execution engine (MapReduce).

chadoop-4-8-001_ko

Configure Presto (Trino) connector

You can change configuration settings related to Presto (Trino) connector in the Ambari UI page.
For more information about accessing and using the Ambari UI, see the Ambari UI guide.

To change the Presto (Trino) connector configuration:

  1. After you access the Ambari UI, click Trino > [CONFIGS] > Advanced trino.connectors.properties.
  2. On the configuration settings interface, enter the connector in connectors.to.add if you want to add a connector or in connectors.to.delete if you want to delete it, then click [Save].
    chadoop-4-8-002_ko
  • Presto (Trino) requires a config file {connector명}.properties under /etc/trino/conf/catalog to use each connector object. Therefore, to integrate multiple Hive clusters with Presto (Trino), you need to configure the config file for each cluster.

    To create connector-name-1.properties and connector-name-2.properties files, configure connectors.to.add as follows:

    {"connector-name-1":["key1=value1",
         "key2=value2",
         "key3=value3"],
    "connector-name-2": ["key1=value1"]
    }
    
  • The defined configuration is created as a {connector명}.properties file in the server. To verify the file contents, run the following command:

    cd ../../etc/trino/conf/catalog
    vi {connector name}.properties
    
  • In this guide example, you need to add Hive Connector, so enter the following in connectors.to.add:

    {
      "hive": [
        "connector.name=hive",
        "hive.security=allow-all",
        "hive.recursive-directories=true",
        "hive.non-managed-table-writes-enabled=true",
        "hive.non-managed-table-creates-enabled=true",
        "hive.metastore.uri=thrift://metastore.datacatalog-svc.naverncp.com:30773",
        "hive.config.resources=/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml",
        "hive.translate-hive-views=true",
        "hive.legacy-hive-view-translation=true",
        "hive.s3.aws-access-key=<API-ACCESS-KEY>",
        "hive.s3.aws-secret-key=<API-SECRET-KEY>",
        "hive.s3.endpoint=kr.object.private.ncloudstorage.com"
      ]
    }
    
  1. Click [ACTIONS] > Restart All. Click [CONFIRM RESTART ALL] in the popup window and then restart the service to apply the new configuration.
    chadoop-4-8-002-1_ko
Note

Hadoop configuration files (/etc/hadoop/conf/core-site.xml and /etc/hadoop/conf/hdfs-site.xml) must exist on the node running Presto (Trino).

Note

For more information about Presto (Trino) configuration, see Presto (Trino) documentation.

Run Hive table queries

In this guide, we executed queries on the allstarfull table created in the Using Hive guide.

To run the Hive table queries:

  1. After you access the node with the Trino CLI components installed, run the Trino CLI using the following commands:
    • Enter the Private IP of the edge node in <COORDINATOR-HOST-IP>.
/home1/cdp/usr/nch/3.1.0.0-78/trino/bin/trino-cli --server <COORDINATOR-HOST-IP>:8285 --catalog hive --schema default
  1. Execute the query on tables in Hive databases and then check the result as follows:
trino:default> describe allstarfull;

chadoop-4-8-003_ko

trino:default> SELECT playerid, sum(gp) from allstarfull group by playerid;

chadoop-4-8-004_ko