Migrate CDB MySQL data

Available in VPC

SQL to Hadoop (Sqoop) is a tool that supports efficient data transfer between relational databases and Hadoop.
You can import data stored in RDBMS to HDFS for analysis. Conversely, you can export analysis results to the RDBMS, and you can also transfer data to HDFS, Hive, HBase, and Accumulo.

This guide introduces how to migrate data from NAVER Cloud Platform's Cloud DB for MySQL to Cloud Hadoop using Sqoop.
For information on using Sqoop and its commands, see Sqoop user guides.

Preparations

Create Object Storage.
- For more information on how to create Object Storage, see Object Storage user guides.
Create a Cloud Hadoop cluster.
- For more information on how to create a Cloud Hadoop cluster, see Getting started with Cloud Hadoop.
Create Cloud DB for MySQL.
- For more information on how to create Cloud DB for MySQL, see Getting started with Cloud DB for MySQL.

Data migration using Sqoop

Sqoop clients are installed by default on master node and data node servers, but not on the edge node servers. You can proceed with the migration without installing the Sqoop client.

After accessing the server using the Access a cluster node with SSH guide, follow these steps in order:

1. Create a Sqoop account to access Cloud DB for MySQL

Use the Manage DB user menu in Cloud DB for MySQL console to create an account for Cloud DB for MySQL by entering the configuration information.
You can view the private IP of Cloud Hadoop servers in the Hosts menu of Ambari UI

Example:
DB configuration information

User_ID: mig_acct
Host (IP): IP range of the Cloud Hadoop server
Example: 10.10.10.%
Password: any password

2. Set ACG

Set ACG so that the Cloud Hadoop Master Node 2 server can access Cloud DB for MySQL.

Add the Cloud Hadoop Master Node 2 server IP to the access source in the Cloud DB ACG settings.

3. Create a sampling table and data

Enter the example sampling table and data into the created Cloud DB for MySQL.

Note

MySQL access commands

mysql -u [Cluster administrator account name] -p

CREATE TABLE `mig_test_tb` (
`col1` int(11) NOT NULL,
`col2` char(2) NOT NULL,
`col3` char(15) DEFAULT NULL,
`col4` char(4) DEFAULT NULL,
`col5` char(1) DEFAULT NULL,
PRIMARY KEY (`col1`,`col2`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
insert into mig_test_tb values
(1,'a','mig-test1','run','y'),(2,'a','mig-test1','init','n'),
(3,'b','mig-test1','run','y'),(4,'b','mig-test1','run','y'),
(5,'c','test-mysql','init','n'),(6,'c','test-mysql','run','n'),
(7,'d','test-mysql','init','n'),(8,'d','test-mysql','run','y'),
(9,'e','sqoop-test','run','n'),(10,'e','sqoop-test','run','y'),
(11,'f','sqoop-test','run','n'),(12,'f','sqoop-test','init','y'),
(13,'g','mysql-sqoop','run','n'),(14,'g','mysql-sqoop','init','y'),
(15,'h','mysql-sqoop','run','y'),(16,'h','mysql-sqoop','init','n'),
(17,'i','sqp-to-my','run','n'),(18,'i','sqp-to-my','init','n'),
(19,'j','sqp-to-my','init','y'),(20,'j','sqp-to-my','init','y'),
(21,'k','my-mig-sq','run','n'),(22,'k','my-mig-sq','run','y'),
(23,'i','my-mig-sq','run','y'),(24,'i','my-mig-sq','run','n');

Note

If you get a syntax error when creating the table, check the parts with the single quotes (' ').

4. Migrate data with the Sqoop commands

Use the import-all-tables option in Sqoop CLI to migrate data to Hive.
For the jdbc information, use the private domain and port of Cloud DB for MySQL.
sudo su - run the [user account] commands to switch to the user account.
```
sudo su - test 
sqoop import-all-tables --connect jdbc:mysql://test-db.mysql.domain.com:3306/migdb --username mig_acct -P --hive-import
```
Note

If you get an error, check the existence of the primary key.

The migration task is performed when registered as a Resource Manager job.

 20/11/30 16:27:10 INFO mapreduce.Job: Job job_1606446027726_0007 completed successfully
 20/11/30 16:27:10 INFO mapreduce.Job: Counters: 30
 File System Counters
 FILE: Number of bytes read=0
 FILE: Number of bytes written=685116
 FILE: Number of read operations=0
 FILE: Number of large read operations=0
 FILE: Number of write operations=0
 HDFS: Number of bytes read=414
 HDFS: Number of bytes written=520
 HDFS: Number of read operations=16
 HDFS: Number of large read operations=0
 HDFS: Number of write operations=8
 Job Counters
 Launched map tasks=4
 Other local map tasks=4
 Total time spent by all maps in occupied slots (ms)=13269
 Total time spent by all reduces in occupied slots (ms)=0
 Total time spent by all map tasks (ms)=13269
 Total vcore-milliseconds taken by all map tasks=13269
 Total megabyte-milliseconds taken by all map tasks=9049458
 Map-Reduce Framework
 Map input records=24
 Map output records=24
 Input split bytes=414
 Spilled Records=0
 Failed Shuffles=0
 Merged Map outputs=0
 GC time elapsed (ms)=393
 CPU time spent (ms)=4780
 Physical memory (bytes) snapshot=813985792
 Virtual memory (bytes) snapshot=10053660672
 Total committed heap usage (bytes)=341311488
 File Input Format Counters
 Bytes Read=0
 File Output Format Counters
 Bytes Written=520
 20/11/30 16:27:10 INFO mapreduce.ImportJobBase: Transferred 520 bytes in 15.7176 seconds (33.084 bytes/sec)
 20/11/30 16:27:10 INFO mapreduce.ImportJobBase: Retrieved 24 records.
 20/11/30 16:27:10 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners
 20/11/30 16:27:10 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `mig_test_tb` AS t LIMIT 1
 20/11/30 16:27:10 INFO hive.HiveImport: Loading uploaded data into Hive
 20/11/30 16:27:10 WARN conf.HiveConf: HiveConf of name hive.server2.enable.doAs.property does not exist
 20/11/30 16:27:10 WARN conf.HiveConf: HiveConf of name hive.server2.enable.doAs.property does not exist
 Logging initialized using configuration in jar:file:/home1/cdp/usr/hdp/2.6.5.0–292/hive/lib/hive-common-1.2.1000.2.6.5.0–292.jar!/hive-log4j.properties
 OK
 Time taken: 1.796 seconds
 Loading data to table default.mig_test_tb
 Table default.mig_test_tb stats: [numFiles=4, numRows=0, totalSize=520, rawDataSize=0]
 OK
 Time taken: 0.418 seconds

Check if the migration task completes normally with the OK message.
- You can view the tables and data migrated to Hive in Hive Viewer 2.0 or Hue UI. For more information on using Hive, see Hive user guides.

SELECT * FROM mig_test_tb;

cloudhadoop-mysql-hive_ko