Using Presto (Trino)

Prev Next

Available in VPC

Presto (Trino) is a tool that allows you to analyze data in terabytes and petabytes using distributed queries.
Presto can read data from various sources, including HDFS, Hive warehouse, and RDBMS.

Unlike Hive and Pig, where queries are executed as MapReduce Job, Presto (Trino) has a separate query execution engine. Since Presto is designed to deliver data from memory to memory, without writing the results of each step on a disk, it can analyze data stored in HDFS faster and more interactively than Hive. Therefore, Presto (Trino) is more suitable than Hive for integration with a BI Tool such as Tableau.

Note
  • Up to Cloud Hadoop 1.9, it was used under the name Presto, and in Cloud Hadoop 2.0, it is used under the name Trino.
  • Presto (Trino), like Hive and Pig, is designed to process OLAP queries. Therefore, it cannot replace transaction-based, traditional RDBMS.

Presto (Trino) components

The Presto (Trino) service consists of 2 components: coordinator and worker.
It can have 1 coordinator as the master, and multiple workers as the slaves. Communication between coordinators and workers and among worker nodes relies on REST APIs.

  • Coordinator
    The coordinator is the hub of the Presto (Trino) service and is responsible for the following:

    • Receiving requests from the client
    • Conducting SQL syntax parsing and query planning
    • Adjusting worker nodes when running queries and tracking the activities of worker nodes
  • Worker
    Workers perform tasks received from the coordinator and process data. The task execution result is transferred directly from the worker to the client.

Query execution process

To execute the query: (See the image below)

  1. Start the Presto (Trino) worker process and register with the coordinator's discovery server.
    • Queries must be registered on the discovery server so the coordinator can assign workers to the task.
  2. The client transfers queries to the coordinator through HTTP.
  3. The coordinator creates a query plan and requests schema data from the connector plugin.
  4. The coordinator sends the task to be executed to the worker.
  5. The worker reads data from data sources through the connector plugin.
  6. The worker executes the task in memory.
  7. The worker returns the results to the client.

chadoop-4-7-001

Data sources

  • Connector
    In Presto (Trino), the connector functions like a driver in a database. In other words, it connects the data sources with the coordinator or worker so that the data can be read from the data source.
    By default, Presto (Trino) provides connectors for various data sources such as Hive, MySQL, and Kafka.

  • Catalog
    The catalog is a mount point for the connector. Every catalog is associated with a specific connector. Presto (Trino) can access data sources through the connector mounted on the catalog. For example, to access a Hive warehouse with the Hive Connector, you need to configure /etc/presto/catalog the following Hive catalog (hive.properties):
    Presto (Trino) queries can accommodate one or more catalogs. In other words, you can use multiple data sources within a single query.
    Catalogs are defined in config.properties under the Presto (Trino) configuration directory (/etc/presto/).

  • Schema
    A schema is a way to organize your tables.
    You can define a table set, which can be queried at once, using 1 catalog and schema.
    When accessing Hive or RDBMS with Presto (Trino), the schema is equivalent to the concept of a database.
    In other data sources, tables can be organized to create a schema.

  • Table
    The table concept of RDBMS is applied identically here.
    When referencing a table in Presto (Trino), it must be fully-qualified, meaning that the catalog, schema, and table name must be specified and separated by periods (.).
    (Example: hive.samples.t1)

Note

You can control access permissions for the catalog, schema, and table levels by using Ranger. For more information, see Trino access permissions management.

Using Presto (Trino) cluster

Create cluster

From the NAVER Cloud Platform console, create the Cloud Hadoop cluster.
For more information about creating clusters, see Create cluster.

Note
  • Starting with Cloud Hadoop 1.3, you can use clusters with Presto v0.240 installed.

    cloudhadoop-create_ko

  • In Cloud Hadoop 1.3, even if you did not create a cluster as a Presto (Trino) type, you can still add Presto (Trino) using Ambari Add Service.

Check Presto (Trino) service from Ambari UI

After installing Presto (Trino), you can see the service in the Ambari UI as follows: You can start and stop each component of the service from this page.

chadoop-4-7-003_ko

  • Summary: Checks hosts where components are installed.
  • Configs: Changes configuration of the Presto service.
  • Quick Links: Presto Discovery UI

Key configurations

  • jvm.properties
    Enter the JVM option used in the coordinator and worker servers. You can adjust the JVM heap with the -Xmx option. Since the coordinator node and the worker node specifications may be different, the memory settings are applied separately. jvm.properties configuration is divided by role as shown below. Following the /etc/presto/conf path of the actual server, jvm.properties, which is the same file name, exists.
    chadoop-4-8-005_ko

  • config.properties
    It is a rare case, but if needed, you can also configure memory settings differently by roles of coordinators and workers for config.properties. The definition of key components is as follows:

Item Default value Description
query.max-memory-per-node 1G
  • Maximum value of user memory that a single query can use in a worker.
  • If any of the memory values assigned to each worker by a query exceeds this limit, the query is canceled.
query.max-memory 20G
  • Maximum value of memory that a single query can use across the entire cluster.
  • If the sum of user memory allocated to all worker nodes through a specific query exceeds this limit, the query is canceled.
query.max-total-memory-per-node 2G
  • Maximum value of user and system memory that a single query can use across the entire cluster.
  • If the sum of user and system memory allocated to all worker nodes through a specific query exceeds this limit, the query is canceled.

chadoop-4-8-006_ko

Caution

To change http-server.http.port, you must change the coordinator's http-server.http.port and worker's http-server.http.port to the same value. If you specify different ports, they cannot communicate with each other.

  • node.properties
    You can set the log directory, pid directory, and others used by the Presto (Trino) daemon. To change this directory, you need to check the owner and permissions of the directory on each server.
    You can specify the environment name currently in use in node.environment.
    chadoop-4-8-007_ko

Trino CLI

Trino CLI provides a responsive shell to run queries.
You can use shells on all hosts assigned the Trino Clis role.
For more information about Trino CLI, see Trino CLI documentation.

  • To use Trino CLI, first connect via SSH to the edge node where the environment is configured.
/home1/cdp/usr/nch/3.1.0.0-78/trino/bin/trino-cli --server <COORDINATOR-HOST-IP>:8285
Note

When accessing the Presto (Trino) Coordinator server, <COORDINATOR-HOST-IP> is the private IP address of the edge node (e-001). You can check it in the Ambari UI > Hosts menu.

  • View available catalogs
trino> show catalogs;
Catalog
---------
hive
system
(2 rows)

Query 20260130_085922_00000_8gfjs, FINISHED, 2 nodes
Splits: 12 total, 12 done (100.00%)
1.41 [0 rows, 0B] [0 rows/s, 0B/s]
Note

You can find more information about how to execute queries by adding data sources in the Analyze Hive warehouse data with Presto (Trino) guide.

Access Presto (Trino) Discovery UI

You can access the Presto (Trino) Discovery UI through [View by application] on the Cloud Hadoop console. For more information, see View by application.
cloudhadoop-access-webui_ko

You can see the overall status of Presto (Trino) services on the Presto (Trino) Discovery UI page. You can also view query history.
chadoop-4-7-005_ko