- Print
- PDF
MySQL 8.0 migration using Sqoop and Oozie Workflow
- Print
- PDF
Available in VPC
SQL to Hadoop (Sqoop) is open-source software designed to transfer data between Hadoop and relational databases.
With a simple CLI, you can easily move RDBMS-specific tables such as Oracle and MySQL, or data that meets specific conditions, to HDFS, and you can directly move them to Hive, Pig, HBase, etc. Conversely, you can move data stored in HDFS to an RDBMS.
Oozie is a workflow scheduling system designed for conveniently managing multiple jobs in Hadoop. It consists of many actions, such as starting, processing, branching and ending of jobs, and enables you to easily use Oozie workflow in Hue (Hadoop User Experience) provided on Cloud Hadoop (VPC).
This guide shows how to use Sqoop in Oozie workflow on Hue to load Cloud DB for MySQL (VPC) data, create a Hive Table and then use a Hive Query to create an exact copy of the Hive Table. Finally, it explains how to transfer the copy Hive Table back to Cloud DB for MySQL (VPC) using Sqoop.
You can set user-defined notifications via mail or messages for alerts generated in Oozie Workflow and Ambari on Hue. For more information on this, see the guide titled Setting user-defined notifications.
Preparations
To use Sqoop in Oozie, the following prerequisites are required.
Item | Description | Guide |
---|---|---|
Set MySQL 8 version connector | Install a connector to link to MySQL and set it up in Oozie | |
Prepare MySQL table | Prepare the Source Table with original data and the Target Table to receive the results of Hive Query execution | |
Set ACG | Set ACG of Cloud Hadoop (VPC) and Cloud DB for MySQL (VPC) | Firewall settings (ACG) |
1. Set MySQL 8 version connector
In order to access MySQL with Sqoop, you need to configure the connector according to the version. Install and configure the MySQL connector in the following order.
a. Set the MySQL 8 version connector on the edge node
- Connect to the Cloud Hadoop (VPC) edge node via ssh. For instructions on how to access the edge node, see the guide titled Connecting to cluster nodes vis SSH.
- Execute the command below to download the MySQL 8 version connector.
wget http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-8.0.26.tar.gz
- Execute the command below to extract the downloaded MySQL 8 version connector.
tar -xvzf mysql-connector-java-8.0.26.tar.gz
- Execute the command below to overwrite the existing MySQL connector with the newly downloaded MySQL 8 version connector.
sudo cp /home1/sshuser/mysql-connector-java-8.0.26/mysql-connector-java-8.0.26.jar /usr/share/java/mysql-connector-java-8.0.26.jar sudo ln -Tfs /usr/share/java/mysql-connector-java-8.0.26.jar /usr/hdp/current/sqoop-client/lib/mysql-connector-java.jar
- Paste the mysql-connector-java-8.0.26.jar file into
/user/oozie/share/lib/lib_{timestamp}/sqoop
. If the path/user/oozie/share/lib/lib_{timestamp}/sqoop
has more than onelib_{timestamp}
file, paste it into the newest directory.sudo -u oozie hadoop fs -put /home1/sshuser/mysql-connector-java-8.0.26/mysql-connector-java-8.0.26.jar /user/oozie/share/lib/lib_{timestamp}/sqoop/mysql-connector-java-8.0.26.jar
b. Set the MySQL 8 version connector in Hue
- Make sure that the mysql-connector-java-8.0.26.jar file is copied to the
/user/oozie/share/lib/lib_{timestamp}/sqoop
path.
- To update Oozie's library, connect to the edge node and execute the command below.
sudo -u oozie oozie admin -oozie http://{hostname of the first master node}:11000/oozie -sharelibupdate sudo -u oozie oozie admin -oozie http://{hostname of the second master node}:11000/oozie -sharelibupdate
You can find the hostname on the Hosts page in the Ambari UI. For more information on Ambari UI, see the guide titled Ambari UI.
- Access Ambari and, on the left menu, click [Oozie] > [ACTIONS] > [Restart All] to restart Oozie. For how to access Ambari, see the guide titled Ambari UI.
2. Prepare MySQL table
To create a workflow using Oozie in Hue, prepare a source table with the original data and a target table to receive the Hive Query execution results. This guide uses the SQOOP_TEST1 and SQOOP_TEST2 tables created via the SQL below.
CREATE TABLE SQOOP_TEST1
(
gp_sum INT PRIMARY KEY,
player VARCHAR(30)
);
INSERT INTO SQOOP_TEST1(gp_sum, player)
VALUES(1,'test_player');
CREATE TABLE SQOOP_TEST2
(
gp_sum INT PRIMARY KEY,
player VARCHAR(30)
);
3. Set ACG
The following describes how to set up Access Control Group (ACG) for MySQL servers from the NAVER Cloud Platform console.
In this section, we will briefly introduce how to set up ACG. For more information on how to set up ACG, see ACG user guide.
- After selecting the ACG of the Cloud DB for MySQL (VPC) you want to access, enter the following 4 information and add 2 ACG Rules.
Protocol: TCP
Access source: ACG name of Cloud Hadoop (VPC) to access
Allowed port: 3306
Note (optional)
Protocol: TCP
Access source: PC IP of the user to access (click the [myip] button)
Allowed port: 3306
Note (optional)
- After selecting the ACG of the Cloud Hadoop (VPC) you want to access, enter the following four information and add an ACG Rule.
- Protocol: TCP
- Access source: PC IP of the user to access (click the [myip] button)
- Allowed port: 1-65535
- Note (optional)
Proceed with the following steps in order after completing the preparations.
- 1. Import data to Sqoop
- 2. Create a new table with Hive Query
- 3. Transfer Hive data to MySQL with Sqoop
- 4. Check Hive and MySQL table
1. Import Data to Sqoop
- Click [Scheduler] > [Workflow] on the left menu of Hue.
- Click [Document] > [Tasks].
- Drag the [Sqoop 1] button to Place Work Here.
- Enter the Sqoop command below and click the [Add] button.
sqoop import --connect jdbc:mysql://{Cloud DB for MySQL (VPC) Private domain}/{DB name} --username {DB account} --password "{DB account password}" --table SQOOP_TEST1 --hive-import --hive-database default --hive-table sqoop_test1 -m 1 --external-table-dir "{Object Storage bucket URI}/sqoop_test1"
- Click the [Save] button on the upper right to save the created Workflow.
2. Create a new table with Hive Query
Click [Editor] > [Hive] on the left menu of Hue.
Please enter the example Hive Query below.
USE default; CREATE EXTERNAL TABLE sqoop_test2 STORED AS TEXTFILE LOCATION '{Object Storage bucket URI}/sqoop_test2' AS SELECT * FROM sqoop_test1;
Click the [Save] button on the upper right corner, set a name, and save the Hive Query you have created.
Click [Scheduler] > [Workflow] on the left menu of Hue, click the [Document] button, and select the Workflow created above.
Click [Edit] at the top right, then click [Tasks] > [Document].
Drag the [Hive Query] button to the space below Sqoop 1, click Document Search..., select the Hive Query created above, and click the [Add] button.
3. Transfer Hive data to MYSQL with Sqoop
Click [Document] > [Tasks].
Drag the [Sqoop 1] button to the space below Hive.
Please enter the example Sqoop command below.
sqoop export --connect jdbc:mysql://{Cloud DB for MySQL (VPC) Private domain}/{DB name} --username {DB account} --password "{DB account password}" --table SQOOP_TEST2 --export-dir "{Object Storage bucket URI}/sqoop_test2" -m 1 --input-fields-terminated-by "\001"
Click the [Save] button on the upper right corner to save the created Workflow, and then click the [Start] button to execute the created Workflow.
4. Check Hive and MySQL table
- Check if the table is saved properly through Hive.
- For instructions on how to use Hive, see the guide titled Using Hive.
SELECT * FROM SQOOP_TEST1;
SELECT * FROM SQOOP_TEST2;
- Please check if the table is saved properly through MySQL.
SELECT * FROM SQOOP_TEST1;
SELECT * FROM SQOOP_TEST2;