Available in VPC
SQL to Hadoop (Sqoop) is a tool that supports efficient data transfer between relational databases and Hadoop.
You can import data stored in RDBMS to HDFS for analysis. Conversely, you can export analysis results to the RDBMS, and you can also transfer data to HDFS, Hive, HBase, and Accumulo.
This guide introduces how to migrate data from NAVER Cloud Platform's Cloud DB for MySQL to Cloud Hadoop using Sqoop.
For information on using Sqoop and its commands, see Sqoop user guides.
Preparations
- Create Object Storage.
- For more information on how to create Object Storage, see Object Storage user guides.
- Create a Cloud Hadoop cluster.
- For more information on how to create a Cloud Hadoop cluster, see Getting started with Cloud Hadoop.
- Create Cloud DB for MySQL.
- For more information on how to create Cloud DB for MySQL, see Getting started with Cloud DB for MySQL.
Data migration using Sqoop
Sqoop clients are installed by default on master node and data node servers, but not on the edge node servers. You can proceed with the migration without installing the Sqoop client.
After accessing the server using the Access a cluster node with SSH guide, follow these steps in order:
- 1. Create a Sqoop account to access Cloud DB for MySQL
- 2. Set ACG
- 3. Create a sampling table and data
- 4. Migrate data with the Sqoop commands
1. Create a Sqoop account to access Cloud DB for MySQL
Use the Manage DB user menu in Cloud DB for MySQL console to create an account for Cloud DB for MySQL by entering the configuration information.
You can view the private IP of Cloud Hadoop servers in the Hosts menu of Ambari UI
Example:
DB configuration information
- User_ID: mig_acct
- Host (IP): IP range of the Cloud Hadoop server
Example: 10.10.10.% - Password: any password

2. Set ACG
Set ACG so that the Cloud Hadoop Master Node 2 server can access Cloud DB for MySQL.
- Add the Cloud Hadoop Master Node 2 server IP to the access source in the Cloud DB ACG settings.

3. Create a sampling table and data
Enter the example sampling table and data into the created Cloud DB for MySQL.
MySQL access commands
mysql -u [Cluster administrator account name] -p
CREATE TABLE `mig_test_tb` (
`col1` int(11) NOT NULL,
`col2` char(2) NOT NULL,
`col3` char(15) DEFAULT NULL,
`col4` char(4) DEFAULT NULL,
`col5` char(1) DEFAULT NULL,
PRIMARY KEY (`col1`,`col2`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
insert into mig_test_tb values
(1,'a','mig-test1','run','y'),(2,'a','mig-test1','init','n'),
(3,'b','mig-test1','run','y'),(4,'b','mig-test1','run','y'),
(5,'c','test-mysql','init','n'),(6,'c','test-mysql','run','n'),
(7,'d','test-mysql','init','n'),(8,'d','test-mysql','run','y'),
(9,'e','sqoop-test','run','n'),(10,'e','sqoop-test','run','y'),
(11,'f','sqoop-test','run','n'),(12,'f','sqoop-test','init','y'),
(13,'g','mysql-sqoop','run','n'),(14,'g','mysql-sqoop','init','y'),
(15,'h','mysql-sqoop','run','y'),(16,'h','mysql-sqoop','init','n'),
(17,'i','sqp-to-my','run','n'),(18,'i','sqp-to-my','init','n'),
(19,'j','sqp-to-my','init','y'),(20,'j','sqp-to-my','init','y'),
(21,'k','my-mig-sq','run','n'),(22,'k','my-mig-sq','run','y'),
(23,'i','my-mig-sq','run','y'),(24,'i','my-mig-sq','run','n');
If you get a syntax error when creating the table, check the parts with the single quotes (' ').
4. Migrate data with the Sqoop commands
-
Use the
import-all-tablesoption in Sqoop CLI to migrate data to Hive.
For the jdbc information, use the private domain and port of Cloud DB for MySQL.
sudo su - run the [user account] commands to switch to the user account.sudo su - test sqoop import-all-tables --connect jdbc:mysql://test-db.mysql.domain.com:3306/migdb --username mig_acct -P --hive-importNoteIf you get an error, check the existence of the primary key.
The migration task is performed when registered as a Resource Manager job.
20/11/30 16:27:10 INFO mapreduce.Job: Job job_1606446027726_0007 completed successfully
20/11/30 16:27:10 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=685116
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=414
HDFS: Number of bytes written=520
HDFS: Number of read operations=16
HDFS: Number of large read operations=0
HDFS: Number of write operations=8
Job Counters
Launched map tasks=4
Other local map tasks=4
Total time spent by all maps in occupied slots (ms)=13269
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=13269
Total vcore-milliseconds taken by all map tasks=13269
Total megabyte-milliseconds taken by all map tasks=9049458
Map-Reduce Framework
Map input records=24
Map output records=24
Input split bytes=414
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=393
CPU time spent (ms)=4780
Physical memory (bytes) snapshot=813985792
Virtual memory (bytes) snapshot=10053660672
Total committed heap usage (bytes)=341311488
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=520
20/11/30 16:27:10 INFO mapreduce.ImportJobBase: Transferred 520 bytes in 15.7176 seconds (33.084 bytes/sec)
20/11/30 16:27:10 INFO mapreduce.ImportJobBase: Retrieved 24 records.
20/11/30 16:27:10 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners
20/11/30 16:27:10 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `mig_test_tb` AS t LIMIT 1
20/11/30 16:27:10 INFO hive.HiveImport: Loading uploaded data into Hive
20/11/30 16:27:10 WARN conf.HiveConf: HiveConf of name hive.server2.enable.doAs.property does not exist
20/11/30 16:27:10 WARN conf.HiveConf: HiveConf of name hive.server2.enable.doAs.property does not exist
Logging initialized using configuration in jar:file:/home1/cdp/usr/hdp/2.6.5.0–292/hive/lib/hive-common-1.2.1000.2.6.5.0–292.jar!/hive-log4j.properties
OK
Time taken: 1.796 seconds
Loading data to table default.mig_test_tb
Table default.mig_test_tb stats: [numFiles=4, numRows=0, totalSize=520, rawDataSize=0]
OK
Time taken: 0.418 seconds
- Check if the migration task completes normally with the
OKmessage.- You can view the tables and data migrated to Hive in Hive Viewer 2.0 or Hue UI. For more information on using Hive, see Hive user guides.
SELECT * FROM mig_test_tb;
