- Print
- PDF
Transferring CDB MySQL data
- Print
- PDF
Available in VPC
Sqoop (SQL to Hadoop) is a tool that supports efficient data transfer between relational databases and Hadoop.
You can import the data saved in RDBMS to HDFS for analysis in HDFS, and also export the analysis results to RDBMS. You can also send the data to HDFS, Hive, HBase, Accumulo, etc.
This guide introduces the method for migrating the data of NAVER Cloud Platform's Cloud DB for MySQL to Cloud Hadoop using Sqoop.
Please refer to Using Sqoop for more information about using Sqoop and its commands.
Preparations
- Create Object Storage.
- Please refer to the Object Storage Guide for more information about creating Object Storage.
- Create a Cloud Hadoop cluster.
- Please refer to Getting started with Cloud Hadoop for more information about creating Cloud Hadoop.
- Create a Cloud DB for MySQL.
- Please refer to Getting started with Cloud DB for MySQL for more information about creating Cloud DB for MySQL.
Data migration using Sqoop
The Sqoop client is installed by default in master node and data node servers, excluding the edge nodes. You can proceed with the migration without installing the Sqoop client separately.
Access the server by referring to Connecting to cluster nodes through SSH, and proceed with the following steps in order.
- 1. Create a Sqoop account to access Cloud DB for MySQL
- 2. Set ACG
- 3. Create sampling table and data
- 4. Migrate data with the Sqoop command
1. Create a Sqoop account to access Cloud DB for MySQL
Use the Cloud DB for MySQL console's Manage DB user menu to enter settings information to create a Cloud DB for MySQL account.
The IP addresses of Cloud Hadoop servers can be viewed from the Ambari web UI.
<Example>
DB settings information
- User_ID: mig_acct
- Host (IP): IP range of the Cloud Hadoop server
<Example> 10.10.10.% - Password: Set arbitrarily
2. Set ACG
Set ACG to enable the Cloud Hadoop master node server 2 to access the Cloud DB for MySQL.
- Add the Cloud Hadoop master node server 2's IP ACG to the access source of the Cloud-DB ACG settings.
3. Create sampling table and data
Enter the example sampling table and data in the Cloud DB for MySQL created.
MySQL access command
mysql -u [Cluster Administrator Account Name] -p
CREATE TABLE `mig_test_tb` (
`col1` int(11) NOT NULL,
`col2` char(2) NOT NULL,
`col3` char(15) DEFAULT NULL,
`col4` char(4) DEFAULT NULL,
`col5` char(1) DEFAULT NULL,
PRIMARY KEY (`col1`,`col2`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
insert into mig_test_tb values
(1,'a','mig-test1','run','y'),(2,'a','mig-test1','init','n'),
(3,'b','mig-test1','run','y'),(4,'b','mig-test1','run','y'),
(5,'c','test-mysql','init','n'),(6,'c','test-mysql','run','n'),
(7,'d','test-mysql','init','n'),(8,'d','test-mysql','run','y'),
(9,'e','sqoop-test','run','n'),(10,'e','sqoop-test','run','y'),
(11,'f','sqoop-test','run','n'),(12,'f','sqoop-test','init','y'),
(13,'g','mysql-sqoop','run','n'),(14,'g','mysql-sqoop','init','y'),
(15,'h','mysql-sqoop','run','y'),(16,'h','mysql-sqoop','init','n'),
(17,'i','sqp-to-my','run','n'),(18,'i','sqp-to-my','init','n'),
(19,'j','sqp-to-my','init','y'),(20,'j','sqp-to-my','init','y'),
(21,'k','my-mig-sq','run','n'),(22,'k','my-mig-sq','run','y'),
(23,'i','my-mig-sq','run','y'),(24,'i','my-mig-sq','run','n');
Check the section with small quotation marks ('') in case of a syntax error in creating tables.
4. Migrate data with the Sqoop command
Use the
import-all-tables
option of Sqoop CLI to migrate data to Hive.
Cloud DB for MySQL's private domain and port is used for thejdbc
information.sqoop import-all-tables --connect jdbc:mysql://test-db.mysql.domain.com:3306/migdb --username mig_acct -P --hive-import
Check whether the primary key exists in case of error.
The migration will proceed as registered as a resource manager job.
```
20/11/30 16:27:10 INFO mapreduce.Job: Job job_1606446027726_0007 completed successfully
20/11/30 16:27:10 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=685116
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=414
HDFS: Number of bytes written=520
HDFS: Number of read operations=16
HDFS: Number of large read operations=0
HDFS: Number of write operations=8
Job Counters
Launched map tasks=4
Other local map tasks=4
Total time spent by all maps in occupied slots (ms)=13269
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=13269
Total vcore-milliseconds taken by all map tasks=13269
Total megabyte-milliseconds taken by all map tasks=9049458
Map-Reduce Framework
Map input records=24
Map output records=24
Input split bytes=414
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=393
CPU time spent (ms)=4780
Physical memory (bytes) snapshot=813985792
Virtual memory (bytes) snapshot=10053660672
Total committed heap usage (bytes)=341311488
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=520
20/11/30 16:27:10 INFO mapreduce.ImportJobBase: Transferred 520 bytes in 15.7176 seconds (33.084 bytes/sec)
20/11/30 16:27:10 INFO mapreduce.ImportJobBase: Retrieved 24 records.
20/11/30 16:27:10 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners
20/11/30 16:27:10 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `mig_test_tb` AS t LIMIT 1
20/11/30 16:27:10 INFO hive.HiveImport: Loading uploaded data into Hive
20/11/30 16:27:10 WARN conf.HiveConf: HiveConf of name hive.server2.enable.doAs.property does not exist
20/11/30 16:27:10 WARN conf.HiveConf: HiveConf of name hive.server2.enable.doAs.property does not exist
Logging initialized using configuration in jar:file:/home1/cdp/usr/hdp/2.6.5.0–292/hive/lib/hive-common-1.2.1000.2.6.5.0–292.jar!/hive-log4j.properties
OK
Time taken: 1.796 seconds
Loading data to table default.mig_test_tb
Table default.mig_test_tb stats: [numFiles=4, numRows=0, totalSize=520, rawDataSize=0]
OK
Time taken: 0.418 seconds
```
- Check if the migration has been successfully completed with the
OK
message.- You can view the tables and data migrated to Hive from Hive Viewer 2.0 or HUE UI. For more information about using Hive, refer to Using Hive.
SELECT * FROM mig_test_tb;