- Print
- PDF
Transferring CDB MySQL data
- Print
- PDF
Available in Classic
Sqoop(SQL to Hadoop) is a tool that supports efficient data transfer between relational databases and Hadoop.
You can import the data saved in RDBMS to HDFS for analysis in HDFS, and also export the analysis results to RDBMS. You can also send the data to HDFS, Hive, HBase, Accumulo, etc.
This guide introduces the method for migrating the data of NAVER Cloud Platform's Cloud DB for MySQL to Cloud Hadoop using Sqoop.
Please refer to Using Sqoop for more information about using Sqoop and its commands.
Preparations
- Create Object Storage.
- Please refer to the Object Storage Guide for more information about creating Object Storage.
- Create a Cloud Hadoop cluster.
- Please refer to Getting started with Cloud Hadoop for more information about creating Cloud Hadoop cluster.
- Create a Cloud DB for MySQL.
- Please refer to Getting started with Cloud DB for MySQL for more information about creating Cloud DB for MySQL.
Data migration using Sqoop
The Sqoop client is installed by default in master node and data node servers, excluding the edge nodes. You can proceed with the migration without installing the Sqoop client separately.
Access the server by referring to Connecting to cluster nodes through SSH, and proceed with the following steps in order.
- 1. Create a Sqoop account to access Cloud DB for MySQL
- 2. Set ACG
- 3. Create sampling table and data
- 4. Data migration with Sqoop command
1. Create a Sqoop account to access Cloud DB for MySQL
Use the Cloud DB for MySQL console's Manage DB user menu to enter settings information to create a Cloud DB for MySQL account.
The IP addresses of Cloud Hadoop servers can be viewed from the Ambari web UI.
DB settings information
- User_ID: mig_acct
- Host (IP): IP range of the Cloud Hadoop server
10.10.10.% - Password: Set arbitrarily
2. Set ACG
Set ACG to enable the Cloud Hadoop master node server 2 to access the Cloud DB for MySQL.
- Add the Cloud Hadoop master node server 2's IP ACG to the access source of the Cloud-DB ACG settings.
3. Create sampling table and data
Enter the example sampling table and data in the Cloud DB for MySQL created.
MySQL access command
mysql -u [Cluster Administrator Account Name] -p
CREATE TABLE `mig_test_tb` (
`col1` int(11) NOT NULL,
`col2` char(2) NOT NULL,
`col3` char(15) DEFAULT NULL,
`col4` char(4) DEFAULT NULL,
`col5` char(1) DEFAULT NULL,
PRIMARY KEY (`col1`,`col2`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
insert into mig_test_tb values
(1,'a','mig-test1','run','y'),(2,'a','mig-test1','init','n'),
(3,'b','mig-test1','run','y'),(4,'b','mig-test1','run','y'),
(5,'c','test-mysql','init','n'),(6,'c','test-mysql','run','n'),
(7,'d','test-mysql','init','n'),(8,'d','test-mysql','run','y'),
(9,'e','sqoop-test','run','n'),(10,'e','sqoop-test','run','y'),
(11,'f','sqoop-test','run','n'),(12,'f','sqoop-test','init','y'),
(13,'g','mysql-sqoop','run','n'),(14,'g','mysql-sqoop','init','y'),
(15,'h','mysql-sqoop','run','y'),(16,'h','mysql-sqoop','init','n'),
(17,'i','sqp-to-my','run','n'),(18,'i','sqp-to-my','init','n'),
(19,'j','sqp-to-my','init','y'),(20,'j','sqp-to-my','init','y'),
(21,'k','my-mig-sq','run','n'),(22,'k','my-mig-sq','run','y'),
(23,'i','my-mig-sq','run','y'),(24,'i','my-mig-sq','run','n');
Check the section with small quotation marks ('') in case of a syntax error in creating tables.
4. Migrate data with the Sqoop command
Use the
import-all-tables
option of Sqoop CLI to migrate data to Hive.
Cloud DB for MySQL's private domain and port is used for thejdbc
information.sqoop import-all-tables --connect jdbc:mysql://test-db.mysql.domain.com:3306/migdb --username mig_acct -P --hive-import
NoteCheck whether the primary key exists in case of error.
The migration will proceed as registered as a resource manager job.
20/11/30 16:27:10 INFO mapreduce.Job: Job job_1606446027726_0007 completed successfully 20/11/30 16:27:10 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=685116 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=414 HDFS: Number of bytes written=520 HDFS: Number of read operations=16 HDFS: Number of large read operations=0 HDFS: Number of write operations=8 Job Counters Launched map tasks=4 Other local map tasks=4 Total time spent by all maps in occupied slots (ms)=13269 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=13269 Total vcore-milliseconds taken by all map tasks=13269 Total megabyte-milliseconds taken by all map tasks=9049458 Map-Reduce Framework Map input records=24 Map output records=24 Input split bytes=414 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=393 CPU time spent (ms)=4780 Physical memory (bytes) snapshot=813985792 Virtual memory (bytes) snapshot=10053660672 Total committed heap usage (bytes)=341311488 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=520 20/11/30 16:27:10 INFO mapreduce.ImportJobBase: Transferred 520 bytes in 15.7176 seconds (33.084 bytes/sec) 20/11/30 16:27:10 INFO mapreduce.ImportJobBase: Retrieved 24 records. 20/11/30 16:27:10 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners 20/11/30 16:27:10 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `mig_test_tb` AS t LIMIT 1 20/11/30 16:27:10 INFO hive.HiveImport: Loading uploaded data into Hive 20/11/30 16:27:10 WARN conf.HiveConf: HiveConf of name hive.server2.enable.doAs.property does not exist 20/11/30 16:27:10 WARN conf.HiveConf: HiveConf of name hive.server2.enable.doAs.property does not exist Logging initialized using configuration in jar:file:/home1/cdp/usr/hdp/2.6.5.0–292/hive/lib/hive-common-1.2.1000.2.6.5.0–292.jar!/hive-log4j.properties OK Time taken: 1.796 seconds Loading data to table default.mig_test_tb Table default.mig_test_tb stats: [numFiles=4, numRows=0, totalSize=520, rawDataSize=0] OK Time taken: 0.418 seconds
Check if the migration has been successfully completed with the
OK
message.- You can view the tables and data migrated to Hive from Hive Viewer 2.0 or HUE UI. Please refer to Using Hive for more information about using Hive.
- You can view the tables and data migrated to Hive from Hive Viewer 2.0 or HUE UI. Please refer to Using Hive for more information about using Hive.