Transferring CDB MySQL data
    • PDF

    Transferring CDB MySQL data

    • PDF

    Article Summary

    Available in Classic

    Sqoop(SQL to Hadoop) is a tool that supports efficient data transfer between relational databases and Hadoop.
    You can import the data saved in RDBMS to HDFS for analysis in HDFS, and also export the analysis results to RDBMS. You can also send the data to HDFS, Hive, HBase, Accumulo, etc.

    This guide introduces the method for migrating the data of NAVER Cloud Platform's Cloud DB for MySQL to Cloud Hadoop using Sqoop.
    Please refer to Using Sqoop for more information about using Sqoop and its commands.

    Preparations

    1. Create Object Storage.
    2. Create a Cloud Hadoop cluster.
    3. Create a Cloud DB for MySQL.

    Data migration using Sqoop

    The Sqoop client is installed by default in master node and data node servers, excluding the edge nodes. You can proceed with the migration without installing the Sqoop client separately.

    Access the server by referring to Connecting to cluster nodes through SSH, and proceed with the following steps in order.

    1. Create a Sqoop account to access Cloud DB for MySQL

    Use the Cloud DB for MySQL console's Manage DB user menu to enter settings information to create a Cloud DB for MySQL account.
    The IP addresses of Cloud Hadoop servers can be viewed from the Ambari web UI.


    DB settings information

    • User_ID: mig_acct
    • Host (IP): IP range of the Cloud Hadoop server
      10.10.10.%
    • Password: Set arbitrarily

    hadoop-chadoop-use-ex4_1-1_en

    2. Set ACG

    Set ACG to enable the Cloud Hadoop master node server 2 to access the Cloud DB for MySQL.

    • Add the Cloud Hadoop master node server 2's IP ACG to the access source of the Cloud-DB ACG settings.
      cloudhadoop-set-acg1_en.png

    3. Create sampling table and data

    Enter the example sampling table and data in the Cloud DB for MySQL created.

    Note

    MySQL access command

    mysql -u [Cluster Administrator Account Name] -p
    
    CREATE TABLE `mig_test_tb` (
    `col1` int(11) NOT NULL,
    `col2` char(2) NOT NULL,
    `col3` char(15) DEFAULT NULL,
    `col4` char(4) DEFAULT NULL,
    `col5` char(1) DEFAULT NULL,
    PRIMARY KEY (`col1`,`col2`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
    insert into mig_test_tb values
    (1,'a','mig-test1','run','y'),(2,'a','mig-test1','init','n'),
    (3,'b','mig-test1','run','y'),(4,'b','mig-test1','run','y'),
    (5,'c','test-mysql','init','n'),(6,'c','test-mysql','run','n'),
    (7,'d','test-mysql','init','n'),(8,'d','test-mysql','run','y'),
    (9,'e','sqoop-test','run','n'),(10,'e','sqoop-test','run','y'),
    (11,'f','sqoop-test','run','n'),(12,'f','sqoop-test','init','y'),
    (13,'g','mysql-sqoop','run','n'),(14,'g','mysql-sqoop','init','y'),
    (15,'h','mysql-sqoop','run','y'),(16,'h','mysql-sqoop','init','n'),
    (17,'i','sqp-to-my','run','n'),(18,'i','sqp-to-my','init','n'),
    (19,'j','sqp-to-my','init','y'),(20,'j','sqp-to-my','init','y'),
    (21,'k','my-mig-sq','run','n'),(22,'k','my-mig-sq','run','y'),
    (23,'i','my-mig-sq','run','y'),(24,'i','my-mig-sq','run','n');
    
    Note

    Check the section with small quotation marks ('') in case of a syntax error in creating tables.

    4. Migrate data with the Sqoop command

    1. Use the import-all-tables option of Sqoop CLI to migrate data to Hive.
      Cloud DB for MySQL's private domain and port is used for the jdbc information.

      sqoop import-all-tables --connect jdbc:mysql://test-db.mysql.domain.com:3306/migdb --username mig_acct -P --hive-import
      
      Note

      Check whether the primary key exists in case of error.

      The migration will proceed as registered as a resource manager job.

      20/11/30 16:27:10 INFO mapreduce.Job: Job job_1606446027726_0007 completed successfully
      20/11/30 16:27:10 INFO mapreduce.Job: Counters: 30
      File System Counters
      FILE: Number of bytes read=0
      FILE: Number of bytes written=685116
      FILE: Number of read operations=0
      FILE: Number of large read operations=0
      FILE: Number of write operations=0
      HDFS: Number of bytes read=414
      HDFS: Number of bytes written=520
      HDFS: Number of read operations=16
      HDFS: Number of large read operations=0
      HDFS: Number of write operations=8
      Job Counters
      Launched map tasks=4
      Other local map tasks=4
      Total time spent by all maps in occupied slots (ms)=13269
      Total time spent by all reduces in occupied slots (ms)=0
      Total time spent by all map tasks (ms)=13269
      Total vcore-milliseconds taken by all map tasks=13269
      Total megabyte-milliseconds taken by all map tasks=9049458
      Map-Reduce Framework
      Map input records=24
      Map output records=24
      Input split bytes=414
      Spilled Records=0
      Failed Shuffles=0
      Merged Map outputs=0
      GC time elapsed (ms)=393
      CPU time spent (ms)=4780
      Physical memory (bytes) snapshot=813985792
      Virtual memory (bytes) snapshot=10053660672
      Total committed heap usage (bytes)=341311488
      File Input Format Counters
      Bytes Read=0
      File Output Format Counters
      Bytes Written=520
      20/11/30 16:27:10 INFO mapreduce.ImportJobBase: Transferred 520 bytes in 15.7176 seconds (33.084 bytes/sec)
      20/11/30 16:27:10 INFO mapreduce.ImportJobBase: Retrieved 24 records.
      20/11/30 16:27:10 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners
      20/11/30 16:27:10 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `mig_test_tb` AS t LIMIT 1
      20/11/30 16:27:10 INFO hive.HiveImport: Loading uploaded data into Hive
      20/11/30 16:27:10 WARN conf.HiveConf: HiveConf of name hive.server2.enable.doAs.property does not exist
      20/11/30 16:27:10 WARN conf.HiveConf: HiveConf of name hive.server2.enable.doAs.property does not exist
      Logging initialized using configuration in jar:file:/home1/cdp/usr/hdp/2.6.5.0–292/hive/lib/hive-common-1.2.1000.2.6.5.0–292.jar!/hive-log4j.properties
      OK
      Time taken: 1.796 seconds
      Loading data to table default.mig_test_tb
      Table default.mig_test_tb stats: [numFiles=4, numRows=0, totalSize=520, rawDataSize=0]
      OK
      Time taken: 0.418 seconds
      
    2. Check if the migration has been successfully completed with the OK message.

      • You can view the tables and data migrated to Hive from Hive Viewer 2.0 or HUE UI. Please refer to Using Hive for more information about using Hive.
        cloudhadoop-mysql-hive_en.png

    Was this article helpful?

    Changing your password will log you out immediately. Use the new password to log back in.
    First name must have atleast 2 characters. Numbers and special characters are not allowed.
    Last name must have atleast 1 characters. Numbers and special characters are not allowed.
    Enter a valid email
    Enter a valid password
    Your profile has been successfully updated.