Transferring CDB MySQL data

release/20240425
English

Transferring CDB MySQL data

Article Summary

Share feedback

Thanks for sharing your feedback!

Available in Classic

Sqoop(SQL to Hadoop) is a tool that supports efficient data transfer between relational databases and Hadoop.
You can import the data saved in RDBMS to HDFS for analysis in HDFS, and also export the analysis results to RDBMS. You can also send the data to HDFS, Hive, HBase, Accumulo, etc.

This guide introduces the method for migrating the data of NAVER Cloud Platform's Cloud DB for MySQL to Cloud Hadoop using Sqoop.
Please refer to Using Sqoop for more information about using Sqoop and its commands.

Preparations

Create Object Storage.
- Please refer to the Object Storage Guide for more information about creating Object Storage.
Create a Cloud Hadoop cluster.
- Please refer to Getting started with Cloud Hadoop for more information about creating Cloud Hadoop cluster.
Create a Cloud DB for MySQL.
- Please refer to Getting started with Cloud DB for MySQL for more information about creating Cloud DB for MySQL.

Data migration using Sqoop

The Sqoop client is installed by default in master node and data node servers, excluding the edge nodes. You can proceed with the migration without installing the Sqoop client separately.

Access the server by referring to Connecting to cluster nodes through SSH, and proceed with the following steps in order.

1. Create a Sqoop account to access Cloud DB for MySQL
2. Set ACG
3. Create sampling table and data
4. Data migration with Sqoop command

1. Create a Sqoop account to access Cloud DB for MySQL

Use the Cloud DB for MySQL console's Manage DB user menu to enter settings information to create a Cloud DB for MySQL account.
The IP addresses of Cloud Hadoop servers can be viewed from the Ambari web UI.

DB settings information

User_ID: mig_acct
Host (IP): IP range of the Cloud Hadoop server
10.10.10.%
Password: Set arbitrarily

hadoop-chadoop-use-ex4_1-1_en

2. Set ACG

Set ACG to enable the Cloud Hadoop master node server 2 to access the Cloud DB for MySQL.

Add the Cloud Hadoop master node server 2's IP ACG to the access source of the Cloud-DB ACG settings.

3. Create sampling table and data

Enter the example sampling table and data in the Cloud DB for MySQL created.

Note

MySQL access command

mysql -u [Cluster Administrator Account Name] -p

CREATE TABLE `mig_test_tb` (
`col1` int(11) NOT NULL,
`col2` char(2) NOT NULL,
`col3` char(15) DEFAULT NULL,
`col4` char(4) DEFAULT NULL,
`col5` char(1) DEFAULT NULL,
PRIMARY KEY (`col1`,`col2`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
insert into mig_test_tb values
(1,'a','mig-test1','run','y'),(2,'a','mig-test1','init','n'),
(3,'b','mig-test1','run','y'),(4,'b','mig-test1','run','y'),
(5,'c','test-mysql','init','n'),(6,'c','test-mysql','run','n'),
(7,'d','test-mysql','init','n'),(8,'d','test-mysql','run','y'),
(9,'e','sqoop-test','run','n'),(10,'e','sqoop-test','run','y'),
(11,'f','sqoop-test','run','n'),(12,'f','sqoop-test','init','y'),
(13,'g','mysql-sqoop','run','n'),(14,'g','mysql-sqoop','init','y'),
(15,'h','mysql-sqoop','run','y'),(16,'h','mysql-sqoop','init','n'),
(17,'i','sqp-to-my','run','n'),(18,'i','sqp-to-my','init','n'),
(19,'j','sqp-to-my','init','y'),(20,'j','sqp-to-my','init','y'),
(21,'k','my-mig-sq','run','n'),(22,'k','my-mig-sq','run','y'),
(23,'i','my-mig-sq','run','y'),(24,'i','my-mig-sq','run','n');

Note

Check the section with small quotation marks ('') in case of a syntax error in creating tables.

4. Migrate data with the Sqoop command

Use the import-all-tables option of Sqoop CLI to migrate data to Hive.
Cloud DB for MySQL's private domain and port is used for the jdbc information.

sqoop import-all-tables --connect jdbc:mysql://test-db.mysql.domain.com:3306/migdb --username mig_acct -P --hive-import

Note

Check whether the primary key exists in case of error.

The migration will proceed as registered as a resource manager job.

20/11/30 16:27:10 INFO mapreduce.Job: Job job_1606446027726_0007 completed successfully
20/11/30 16:27:10 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=685116
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=414
HDFS: Number of bytes written=520
HDFS: Number of read operations=16
HDFS: Number of large read operations=0
HDFS: Number of write operations=8
Job Counters
Launched map tasks=4
Other local map tasks=4
Total time spent by all maps in occupied slots (ms)=13269
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=13269
Total vcore-milliseconds taken by all map tasks=13269
Total megabyte-milliseconds taken by all map tasks=9049458
Map-Reduce Framework
Map input records=24
Map output records=24
Input split bytes=414
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=393
CPU time spent (ms)=4780
Physical memory (bytes) snapshot=813985792
Virtual memory (bytes) snapshot=10053660672
Total committed heap usage (bytes)=341311488
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=520
20/11/30 16:27:10 INFO mapreduce.ImportJobBase: Transferred 520 bytes in 15.7176 seconds (33.084 bytes/sec)
20/11/30 16:27:10 INFO mapreduce.ImportJobBase: Retrieved 24 records.
20/11/30 16:27:10 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners
20/11/30 16:27:10 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `mig_test_tb` AS t LIMIT 1
20/11/30 16:27:10 INFO hive.HiveImport: Loading uploaded data into Hive
20/11/30 16:27:10 WARN conf.HiveConf: HiveConf of name hive.server2.enable.doAs.property does not exist
20/11/30 16:27:10 WARN conf.HiveConf: HiveConf of name hive.server2.enable.doAs.property does not exist
Logging initialized using configuration in jar:file:/home1/cdp/usr/hdp/2.6.5.0–292/hive/lib/hive-common-1.2.1000.2.6.5.0–292.jar!/hive-log4j.properties
OK
Time taken: 1.796 seconds
Loading data to table default.mig_test_tb
Table default.mig_test_tb stats: [numFiles=4, numRows=0, totalSize=520, rawDataSize=0]
OK
Time taken: 0.418 seconds

Check if the migration has been successfully completed with the OK message.
- You can view the tables and data migrated to Hive from Hive Viewer 2.0 or HUE UI. Please refer to Using Hive for more information about using Hive.

Was this article helpful?

What's Next

Linking Object Storage + Sub Account

Table of contents

Preparations
Data migration using Sqoop