Using Presto(Trino)
    • PDF

    Using Presto(Trino)

    • PDF

    Article Summary

    Available in VPC

    Presto is a tool that allows you to analyze data in terabytes and petabytes using distributed queries.
    Presto can read data from various sources, including HDFS, Hive warehouse, and RDBMS.

    Unlike Hive and Pig, where queries are executed as MapReduce Job, Presto has a separate query execution engine. Since Presto is designed to deliver data from memory to memory, without writing the results of each step on a disk, it can analyze data stored in HDFS faster and more interactively than Hive. Therefore, Presto is more suitable than Hive for integration with BI Tools such as Tableau.

    Note
    • Up to Cloud Hadoop 1.9, it was used under the name Presto, and in Cloud Hadoop 2.0, it is used under the name Trino.
    • Presto, like Hive and Pig, is designed to process OLAP queries. Therefore, it cannot replace transaction-based RDBMS.

    Presto components

    The Presto service consists of two components: coordinator and worker.
    It can have 1 coordinator as the master, and multiple workers as the slaves. Communication between coordinators and workers and among worker nodes relies on REST APIs.

    • Coordinator
      The coordinator is the hub of the Presto service and responsible for the following:

      • Receiving requests from the client
      • Conducting SQL syntax parsing and query planning
      • Adjusting worker nodes when running queries and tracking activities of worker nodes
    • Worker
      Workers perform tasks received from the coordinator and process data. The task execution result is transferred directly from worker to client.

    Query execution process

    The process of a query execution is as follows: (see the image below)

    1. Start the Presto worker process and register with the coordinator's discovery server
      • Queries must be registered on the discovery server so the coordinator can assign workers to the task
    2. The client transfers queries to the coordinator through HTTP
    3. The coordinator creates a query plan and requests schema data from the connector plugin
    4. The coordinator sends the task to be executed to the worker
    5. Ther worker reads data from data sources through the connector plugin
    6. The worker executes the task in the memory
    7. The worker returns the results to the client

    chadoop-4-7-001.png

    Data sources

    • Connector
      In Presto, the connector functions like a driver in a database. In other words, it connects the data sources with the coordinator or worker so the data can be read from the data source.
      By default, Presto provides connectors for various data sources such as Hive, MySQL, and Kafka.

    • Catalog
      The catalog is a mount point for the connector. Evey catalog is associated with a specific connector. Presto can access data sources through the connector mounted on the catalog. For example, to access a Hive warehouse with the Hive Connector, you need to configure /etc/presto/catalog the following Hive catalog (hive.properties).

      Presto queries can accommodate one or more catalogs. In other words, you can use multiple data sources within a single query.

      Catalogs are defined in config.properties under the Presto configuration directory (/etc/presto/).

    • Schema
      A schema is a way to organize your tables.
      You can define a table set, which can be queried at once, using one catalog and schema.

      When accessing Hive or RDBMS with Presto, the schema is equivalent to the concept of a database.
      In other data sources, tables can be organized to create a schema.

    • Table
      The table concept of RDBMS is applied identically here.
      When referencing a table in Presto, it must be fully-qualified, meaning that the catalog, schema, and table name must be specified and separated by periods (.).
      (e.g., hive.samples.t1)

    Using Presto clusters

    Create cluster

    From NAVER Cloud Platform console, create the Cloud Hadoop cluster.
    For more information on creating clusters, see Create cluster.

    Note
    • Staring with Cloud Hadoop 1.3, you can use clusters with Presto v0.240 installed.

      cloudhadoop-create_ko

    • In Cloud Hadoop 1.3, even if you didn't create the cluster as a Presto type, you can still add Presto using Ambari Add Service.

    Check Presto service in Ambari UI

    After installing Presto, you can see the service in the Ambari UI. You can start and stop each component of the service from this page.

    chadoop-4-7-003_en.png

    • Summary: checks hosts with components installed
    • Configs: changes configurations of Presto service
    • Quick Links: Presto Discovery UI

    Key configurations

    • jvm.properties
      Enter the JVM option used by the coordinator or worker server. You can adjust the JVM heap with the -Xmx option. Since the coordinator node and the worker node specifications may be different, the memory settings are applied separately. jvm.properties configuration is divided by role as shown below. Following the /etc/presto/conf path of the actual server, jvm.properties, which is the same file name, exists.
      chadoop-4-8-005_en.png

    • config.properties
      It's rare, but if needed, you can configure memory settings differently by roles of coordinators and workers for config.properties. The definition of key components are as follows:

    ItemDefault valueDescription
    query.max-memory-per-node1G- Maximum value of user memory that a single query can use in a worker
    - If any of the memory values assigned to each worker by a query exceeds this limit, the query is canceled
    query.max-memory20G- Maximum value of memory that a single query can use across the entire cluster
    - If the sum of user memory allocated to all worker nodes through a specific query exceeds this limit, the query is canceled
    query.max-total-memory-per-node2G- Maximum value of user and system memory that a single query can use across the entire cluster
    - If the sum of user and system memory allocated to all worker nodes through a specific query exceeds this limit, the query is canceled

    chadoop-4-8-006_en.png

    Caution

    To change http-server.http.port, you must change the coordinator's http-server.http.port and worker's http-server.http.port to the same value. If you specify different ports, they cannot communicate with each other.

    • node.properties
      You can set the log directory, pid directory, and others used by the Presto daemon. To change this directory, you need to check the owner and permissions of the directory on each server.
      You can specify the environment name currently in use in node.environment.
      chadoop-4-8-007_en.png

    Presto CLI

    Presto CLI provides a responsive shell to run queries.
    You can use shells on all hosts assigned the Presto Clis role.
    For a detailed explanation on using Presto CLI, see Presto CLI Documentation.

    • Connect to Presto Coordinator server
    /usr/lib/presto/bin/presto-cli --server <COORDINATOR-HOST-IP>:8285
    
    Note

    When accessing the Presto Coordinator server, <COORDINATOR-HOST-IP> is the Private IP address of the edge node (e-001). You can check it in the Ambari UI > Hosts menu.

    • View available catalogs
    presto> show catalogs;
     Catalog
    ---------
     system
    (1 row)
    
    Query 20190430_020419_00001_j79dc, FINISHED, 2 nodes
    Splits: 36 total, 36 done (100.00%)
    0:07 [0 rows, 0B] [0 rows/s, 0B/s]
    
    Note

    You can find more information about how to execute queries by adding data sources in the Analyzing Hive warehouse data with Presto guide.

    Access Presto Discovery UI

    You can access the Presto Discovery UI through [View by application] on the Cloud Hadoop console. For more information, see View by application.
    cloudhadoop-access-webui_ko

    You can see the overall status of Presto services on the Presto Discovery UI page. You can also view query history.
    chadoop-4-7-005_en.png


    Was this article helpful?

    Changing your password will log you out immediately. Use the new password to log back in.
    First name must have atleast 2 characters. Numbers and special characters are not allowed.
    Last name must have atleast 1 characters. Numbers and special characters are not allowed.
    Enter a valid email
    Enter a valid password
    Your profile has been successfully updated.