- Print
- PDF
Linking Presto to Cloud Data Streaming Service
- Print
- PDF
Available in VPC
This guide introduces how to link NAVER Cloud Platform's Cloud Hadoop and Cloud Data Streaming Service(CDSS).
This guide is based on the Kafka Connector Tutorial guide provided by the official Presto guide.
Preparations
Create Cloud Hadoop cluster.
- Please refer to Getting started with Cloud Hadoop for more information about creating Cloud Hadoop.
Create Cloud Data Streaming Service.
- Please refer to the Cloud Data Streaming Service Guide for more information about creating Cloud Data Streaming Service.
Create and set up a VM to use Cloud Data Streaming Service.
- Please refer to the Cloud Data Streaming Service Guide for more information about creating and setting up VMs.
Set ACG up.
- Port 9092 must be allowed for Cloud Hadoop to access the broker node of Cloud Data Streaming Service.
- Add the Cloud Hadoop subnet band to the Broker node ACG access source of Cloud Data Streaming Service.
We recommended you create Cloud Hadoop and Cloud Data Streaming Service within the same subnet, where they can communicate within the same VPC.
Upload data to CDSS (Kafka)
Run Kafka in the Cloud Data Streaming Service VM.
[root@s17e27e0cf6c]# cd kafka_2.12-2.4.0
[root@s17e27e0cf6c kafka_2.12-2.4.0]# ./bin/kafka-server-start.sh -daemon config/server.properties
Download the data to Kafka.
[root@s17e27e0cf6c kafka_2.12-2.4.0]# curl -o kafka-tpch https://repo1.maven.org/maven2/de/softwareforge/kafka_tpch_0811/1.0/kafka_tpch_0811-1.0.sh
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 21.6M 100 21.6M 0 0 7948k 0 0:00:02 0:00:02 --:--:-- 7947k
Upload the data to Kafka.
[root@s17e27e0cf6c kafka_2.12-2.4.0]# chmod 755 kafka-tpch
[root@s17e27e0cf6c kafka_2.12-2.4.0]# ./kafka-tpch load --brokers 172.16.2.6:9092 --prefix tpch. --tpch-type tiny
2022-02-07T10:30:09.426+0900 INFO main io.airlift.log.Logging Logging to stderr
2022-02-07T10:30:09.448+0900 INFO main de.softwareforge.kafka.LoadCommand Processing tables: [customer, orders, lineitem, part, partsupp, supplier, nation, region]
2022-02-07T10:30:09.859+0900 INFO pool-1-thread-1 de.softwareforge.kafka.LoadCommand Loading table 'customer' into topic 'tpch.customer'...
2022-02-07T10:30:09.859+0900 INFO pool-1-thread-2 de.softwareforge.kafka.LoadCommand Loading table 'orders' into topic 'tpch.orders'...
2022-02-07T10:30:09.859+0900 INFO pool-1-thread-3 de.softwareforge.kafka.LoadCommand Loading table 'lineitem' into topic 'tpch.lineitem'...
2022-02-07T10:30:09.860+0900 INFO pool-1-thread-4 de.softwareforge.kafka.LoadCommand Loading table 'part' into topic 'tpch.part'...
2022-02-07T10:30:09.860+0900 INFO pool-1-thread-5 de.softwareforge.kafka.LoadCommand Loading table 'partsupp' into topic 'tpch.partsupp'...
2022-02-07T10:30:09.860+0900 INFO pool-1-thread-6 de.softwareforge.kafka.LoadCommand Loading table 'supplier' into topic 'tpch.supplier'...
2022-02-07T10:30:09.860+0900 INFO pool-1-thread-7 de.softwareforge.kafka.LoadCommand Loading table 'nation' into topic 'tpch.nation'...
2022-02-07T10:30:09.865+0900 INFO pool-1-thread-8 de.softwareforge.kafka.LoadCommand Loading table 'region' into topic 'tpch.region'...
2022-02-07T10:30:13.079+0900 INFO pool-1-thread-7 de.softwareforge.kafka.LoadCommand Generated 25 rows for table 'nation'.
2022-02-07T10:30:13.175+0900 INFO pool-1-thread-6 de.softwareforge.kafka.LoadCommand Generated 100 rows for table 'supplier'.
2022-02-07T10:30:13.514+0900 INFO pool-1-thread-8 de.softwareforge.kafka.LoadCommand Generated 5 rows for table 'region'.
2022-02-07T10:30:13.711+0900 INFO pool-1-thread-1 de.softwareforge.kafka.LoadCommand Generated 1500 rows for table 'customer'.
2022-02-07T10:30:14.168+0900 INFO pool-1-thread-4 de.softwareforge.kafka.LoadCommand Generated 2000 rows for table 'part'.
2022-02-07T10:30:14.895+0900 INFO pool-1-thread-5 de.softwareforge.kafka.LoadCommand Generated 8000 rows for table 'partsupp'.
2022-02-07T10:30:15.078+0900 INFO pool-1-thread-2 de.softwareforge.kafka.LoadCommand Generated 15000 rows for table 'orders'.
2022-02-07T10:30:16.335+0900 INFO pool-1-thread-3 de.softwareforge.kafka.LoadCommand Generated 60175 rows for table 'lineitem'.
Add connector to Presto
In the Ambari UI, add connectors.to.add to Presto > [CONFIGS] > Advanced connectors.properties as shown below, and then click the [SAVE] button.
{"kafka":["connector.name=kafka",
"kafka.nodes=172.16.2.6:9092",
"kafka.table-names=tpch.customer,tpch.orders,tpch.lineitem,tpch.part,tpch.partsupp,tpch.supplier,tpch.nation,tpch.region",
"kafka.hide-internal-columns=false"] }
A restart is required for the changed configuration to take effect. Click [ACTIONS] > Restart All in the upper right corner, and then click the [CONFIRM RESTART ALL] button in the pop-up window.
Search tables in Presto
Run Presto by accessing the Cloud Hadoop edge node.
- Set catalog to kafka, and schema to tpch.
[sshuser@e-001-example-pzt-hd ~]$ /usr/lib/presto/bin/presto-cli --server http://pub-210ab.hadoop.beta.ntruss.com:8285 --catalog kafka --schema tpch
presto:tpch> SHOW TABLES;
Table
----------
customer
lineitem
nation
orders
part
partsupp
region
supplier
(8 rows)
Query 20220128_064417_00003_96n53, FINISHED, 3 nodes
Splits: 36 total, 36 done (100.00%)
0:00 [8 rows, 166B] [57 rows/s, 1.16KB/s]
Check the content through a simple query.
presto:tpch> DESCRIBE customer;
Column | Type | Extra | Comment
-------------------+---------+-------+------------------------------------------
_partition_id | bigint | | Partition Id
_partition_offset | bigint | | Offset for the message within the partiti
_message_corrupt | boolean | | Message data is corrupt
_message | varchar | | Message text
_message_length | bigint | | Total number of message bytes
_key_corrupt | boolean | | Key data is corrupt
_key | varchar | | Key text
_key_length | bigint | | Total number of key bytes
_timestamp | bigint | | Offset Timestamp
(9 rows)
presto:tpch> SELECT _message FROM customer LIMIT 5;
--------------------------------------------------------------------------------
{"rowNumber":1,"customerKey":1,"name":"Customer#000000001","address":"IVhzIApeR
{"rowNumber":4,"customerKey":4,"name":"Customer#000000004","address":"XxVSJsLAG
{"rowNumber":7,"customerKey":7,"name":"Customer#000000007","address":"TcGe5gaZN
{"rowNumber":10,"customerKey":10,"name":"Customer#000000010","address":"6LrEaV6
{"rowNumber":13,"customerKey":13,"name":"Customer#000000013","address":"nsXQu0o
(5 rows)
Please refer to the Kafka Connector Tutorial for more information about the utilization of Presto and Kafka.