MySQL 8.0 migration using Sqoop and Oozie Workflow
    • PDF

    MySQL 8.0 migration using Sqoop and Oozie Workflow

    • PDF

    Article Summary

    Available in VPC

    SQL to Hadoop (Sqoop) is open-source software designed to transfer data between Hadoop and relational databases.
    With a simple CLI, you can easily move RDBMS-specific tables such as Oracle and MySQL, or data that meets specific conditions, to HDFS, and you can directly move them to Hive, Pig, HBase, etc. Conversely, you can move data stored in HDFS to an RDBMS.

    Oozie is a workflow scheduling system designed for conveniently managing multiple jobs in Hadoop. It consists of many actions, such as starting, processing, branching and ending of jobs, and enables you to easily use Oozie workflow in Hue (Hadoop User Experience) provided on Cloud Hadoop (VPC).

    This guide shows how to use Sqoop in Oozie workflow on Hue to load Cloud DB for MySQL (VPC) data, create a Hive Table and then use a Hive Query to create an exact copy of the Hive Table. Finally, it explains how to transfer the copy Hive Table back to Cloud DB for MySQL (VPC) using Sqoop.

    Note

    You can set user-defined notifications via mail or messages for alerts generated in Oozie Workflow and Ambari on Hue. For more information on this, see the guide titled Setting user-defined notifications.

    Preparations

    To use Sqoop in Oozie, the following prerequisites are required.

    ItemDescriptionGuide
    Set MySQL 8 version connectorInstall a connector to link to MySQL and set it up in Oozie
    Prepare MySQL tablePrepare the Source Table with original data and the Target Table to receive the results of Hive Query execution
    Set ACGSet ACG of Cloud Hadoop (VPC) and Cloud DB for MySQL (VPC)Firewall settings (ACG)

    1. Set MySQL 8 version connector

    In order to access MySQL with Sqoop, you need to configure the connector according to the version. Install and configure the MySQL connector in the following order.

    a. Set the MySQL 8 version connector on the edge node

    1. Connect to the Cloud Hadoop (VPC) edge node via ssh. For instructions on how to access the edge node, see the guide titled Connecting to cluster nodes vis SSH.
    2. Execute the command below to download the MySQL 8 version connector.
      wget http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-8.0.26.tar.gz
      
    3. Execute the command below to extract the downloaded MySQL 8 version connector.
      tar -xvzf mysql-connector-java-8.0.26.tar.gz
      
    4. Execute the command below to overwrite the existing MySQL connector with the newly downloaded MySQL 8 version connector.
      sudo cp /home1/sshuser/mysql-connector-java-8.0.26/mysql-connector-java-8.0.26.jar /usr/share/java/mysql-connector-java-8.0.26.jar
      sudo ln -Tfs /usr/share/java/mysql-connector-java-8.0.26.jar /usr/hdp/current/sqoop-client/lib/mysql-connector-java.jar
      
    5. Paste the mysql-connector-java-8.0.26.jar file into /user/oozie/share/lib/lib_{timestamp}/sqoop. If the path /user/oozie/share/lib/lib_{timestamp}/sqoop has more than one lib_{timestamp} file, paste it into the newest directory.
      sudo -u oozie hadoop fs -put /home1/sshuser/mysql-connector-java-8.0.26/mysql-connector-java-8.0.26.jar /user/oozie/share/lib/lib_{timestamp}/sqoop/mysql-connector-java-8.0.26.jar
      

    b. Set the MySQL 8 version connector in Hue

    1. Make sure that the mysql-connector-java-8.0.26.jar file is copied to the /user/oozie/share/lib/lib_{timestamp}/sqoop path.
      hadoop-vpc-28_hue3_ko
    2. To update Oozie's library, connect to the edge node and execute the command below.
      sudo -u oozie oozie admin -oozie http://{hostname of the first master node}:11000/oozie -sharelibupdate
      sudo -u oozie oozie admin -oozie http://{hostname of the second master node}:11000/oozie -sharelibupdate
      
    Note

    You can find the hostname on the Hosts page in the Ambari UI. For more information on Ambari UI, see the guide titled Ambari UI.
    chadoop-3-5-03_ko

    1. Access Ambari and, on the left menu, click [Oozie] > [ACTIONS] > [Restart All] to restart Oozie. For how to access Ambari, see the guide titled Ambari UI.
      hadoop-vpc-28_hue4_ko

    2. Prepare MySQL table

    To create a workflow using Oozie in Hue, prepare a source table with the original data and a target table to receive the Hive Query execution results. This guide uses the SQOOP_TEST1 and SQOOP_TEST2 tables created via the SQL below.

    CREATE TABLE SQOOP_TEST1
    (
        gp_sum INT PRIMARY KEY,
        player VARCHAR(30)
    );
    
    INSERT INTO SQOOP_TEST1(gp_sum, player)
    VALUES(1,'test_player');
    
    CREATE TABLE SQOOP_TEST2
    (
        gp_sum INT PRIMARY KEY,
        player VARCHAR(30)
    );
    

    3. Set ACG

    The following describes how to set up Access Control Group (ACG) for MySQL servers from the NAVER Cloud Platform console.

    Note

    In this section, we will briefly introduce how to set up ACG. For more information on how to set up ACG, see ACG user guide.

    1. After selecting the ACG of the Cloud DB for MySQL (VPC) you want to access, enter the following 4 information and add 2 ACG Rules.
      • Protocol: TCP

      • Access source: ACG name of Cloud Hadoop (VPC) to access

      • Allowed port: 3306

      • Note (optional)

      • Protocol: TCP

      • Access source: PC IP of the user to access (click the [myip] button)

      • Allowed port: 3306

      • Note (optional)

    2. After selecting the ACG of the Cloud Hadoop (VPC) you want to access, enter the following four information and add an ACG Rule.
      • Protocol: TCP
      • Access source: PC IP of the user to access (click the [myip] button)
      • Allowed port: 1-65535
      • Note (optional)

    Proceed with the following steps in order after completing the preparations.

    1. Import Data to Sqoop

    1. Click [Scheduler] > [Workflow] on the left menu of Hue.
      hadoop-vpc-28_hue5_ko
    2. Click [Document] > [Tasks].
      hadoop-vpc-28_hue6_ko
    3. Drag the [Sqoop 1] button to Place Work Here.
      hadoop-vpc-28_hue7_ko
    4. Enter the Sqoop command below and click the [Add] button.
      sqoop import --connect jdbc:mysql://{Cloud DB for MySQL (VPC) Private domain}/{DB name} --username {DB account} --password "{DB account password}" --table SQOOP_TEST1 --hive-import --hive-database default --hive-table sqoop_test1 -m 1 --external-table-dir "{Object Storage bucket URI}/sqoop_test1"
      

    hadoop-vpc-28_hue8_ko

    1. Click the [Save] button on the upper right to save the created Workflow.

    hadoop-vpc-28_hue9_ko

    2. Create a new table with Hive Query

    1. Click [Editor] > [Hive] on the left menu of Hue.
      hadoop-vpc-28_hue10_ko

    2. Please enter the example Hive Query below.

      USE default;
      CREATE EXTERNAL TABLE sqoop_test2
      STORED AS TEXTFILE
      LOCATION '{Object Storage bucket URI}/sqoop_test2'
      AS SELECT * FROM sqoop_test1;
      
    3. Click the [Save] button on the upper right corner, set a name, and save the Hive Query you have created.
      hadoop-vpc-28_hue11_ko

      hadoop-vpc-28_hue12_ko

    4. Click [Scheduler] > [Workflow] on the left menu of Hue, click the [Document] button, and select the Workflow created above.
      hadoop-vpc-28_hue5_ko

      hadoop-vpc-28_hue13_ko

    5. Click [Edit] at the top right, then click [Tasks] > [Document].
      hadoop-vpc-28_hue14_ko

      hadoop-vpc-28_hue15_ko

    6. Drag the [Hive Query] button to the space below Sqoop 1, click Document Search..., select the Hive Query created above, and click the [Add] button.
      hadoop-vpc-28_hue16_ko

      hadoop-vpc-28_hue17_ko

    3. Transfer Hive data to MYSQL with Sqoop

    1. Click [Document] > [Tasks].

    2. Drag the [Sqoop 1] button to the space below Hive.

    3. Please enter the example Sqoop command below.

      sqoop export --connect jdbc:mysql://{Cloud DB for MySQL (VPC) Private domain}/{DB name} --username {DB account} --password "{DB account password}" --table SQOOP_TEST2 --export-dir "{Object Storage bucket URI}/sqoop_test2" -m 1 --input-fields-terminated-by "\001"
      
    4. Click the [Save] button on the upper right corner to save the created Workflow, and then click the [Start] button to execute the created Workflow.

    hadoop-vpc-28_hue18_ko

    4. Check Hive and MySQL table

    1. Check if the table is saved properly through Hive.
    • For instructions on how to use Hive, see the guide titled Using Hive.
    SELECT * FROM SQOOP_TEST1;
    SELECT * FROM SQOOP_TEST2;
    

    hadoop-vpc-28_hue20_ko

    1. Please check if the table is saved properly through MySQL.
    SELECT * FROM SQOOP_TEST1;
    SELECT * FROM SQOOP_TEST2;
    

    hadoop-vpc-28_hue21


    Was this article helpful?

    What's Next
    Changing your password will log you out immediately. Use the new password to log back in.
    First name must have atleast 2 characters. Numbers and special characters are not allowed.
    Last name must have atleast 1 characters. Numbers and special characters are not allowed.
    Enter a valid email
    Enter a valid password
    Your profile has been successfully updated.