Transfer data from CDB MongoDB to Hive

Available in VPC

This guide introduces how to migrate data from Cloud DB for MongoDB to Hive using NAVER Cloud Platform Object Storage.

You can migrate data from MongoDB to Hive in the following two ways:

Import data to NAVER Cloud Platform's CDB MongoDB.
Import data from Cloud Hadoop Hive to External Table after exporting data from CDB MongoDB to Object Storage.

hadoop-vpc-use-ex9_0-0

Preliminary tasks

Create Object Storage.
- For more information on how to create Object Storage, see Object Storage user guide.
Create a Cloud Hadoop cluster.
- For more information on how to create a Cloud Hadoop cluster, see Getting started with Cloud Hadoop.
Create Cloud DB for MongoDB and application server.
- For more information on creating MongoDB and application servers, see Getting started with Cloud DB for MongoDB.
Check the created MongoDB's private domain, port, user name, and user password.
- For more information, see Cloud DB for MongoDB user guide.

Import data to MongoDB

The following describes you to access the application server, and then access the MongoDB server to import data.

See Getting started with Cloud DB for MongoDB to access the application server and install MongoDB.

Run the following commands in order to install mongoimport, and decompress the file.

# wget https://repo.mongodb.org/yum/redhat/6/mongodb-org/4.2/x86_64/RPMS/mongodb-org-tools-4.2.17-1.el6.x86_64.rpm
# rpm -ivh mongodb-org-tools-4.2.17-1.el6.x86_64.rpm
# wget https://repo.mongodb.org/yum/redhat/6/mongodb-org/4.2/x86_64/RPMS/mongodb-org-shell-4.2.17-1.el6.x86_64.rpm
# rpm -ivh mongodb-org-shell-4.2.17-1.el6.x86_64.rpm

Use the wget command to download the data to import.

Sample data

# wget http://www.barchartmarketdata.com/data-samples/mstf.csv

Run the following command to upload the downloaded data to MongoDB.

# mongoimport mstf.csv --type csv --headerline -d marketdata -c minibars -h <private domain>:<port> -u <username> -p <password> --authenticationDatabase admin

Note

You can also prepare data by creating DBs and collections directly in MongoDB.

Export data from MongoDB to Object Storage

The following describes how to export data uploaded to MongoDB to Object Storage.

Run the following command to install AWS CLI.

Because the object Storage of the NAVER Cloud Platform is compatible with AWS S3, you can use AWS CLI without changes.

# curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
# unzip awscliv2.zip
# ./aws/install

--Confirm installation
# aws --version

Run the following command to export data.

# mongoexport --host=<private domain>:<port> --collection=minibars --db=marketdata --out=marketdata.csv -u <username> -p <password> --authenticationDatabase admin

Check the access key ID, and SecretKey in accordance with Ncloud API.

Set up aws configure using the retrieved authentication key information.

# aws configure
AWS Access Key ID [None]: Enter the access key ID.
AWS Secret Access Key [None]: Enter the secret key.
Default region name [None] :
Default output format [None] :

Run the following commands in order to check the bucket list, upload data, and then check if it was uploaded successfully.

# aws --endpoint-url=http://kr.object.ncloudstorage.com s3 ls
--Bucket list
2021-10-16 18:49:28 cdbbucket
2021-09-29 12:20:58 ex-bucket
2021-10-05 15:24:46 example-5
2021-10-06 10:59:15 example-6

# aws --endpoint-url=http://kr.object.ncloudstorage.com s3 cp marketdata.csv s3://<Bucket name to upload data to>/
--Upload result
upload: ./marketdata.csv to s3://ex-bucket/marketdata.csv

# aws --endpoint-url=http://kr.object.ncloudstorage.com s3 ls s3://ex-bucket/
--Upload check result
2021-10-19 11:05:12   16261296 marketdata.csv

Note

You can also check the result in the NAVER Cloud Platform console's VPC environment by navigating to Services > Storage > Object Storage > Bucket Management.

hadoop-vpc-use-ex9_2-3_ko

Import data uploaded to Object Storage with Hive

The following describes how to load data uploaded to Object Storage using Hive External Table.

Run the following command to create an External Table for Hive to import the data in Object Storage.

For location, enter the bucket location where the data is uploaded.

CREATE external TABLE if not EXISTS `marketdata` (
        id STRUCT<oid:STRING, bsontype:INT>,
    Symbol STRING,
    `Timestamp` STRING,
    Day INT,
    Open DOUBLE,
    High DOUBLE,
    Low DOUBLE,
    Close DOUBLE,
    Volume INT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
location 's3a://<Bucket name where the data was uploaded>/';

Run the following command to check whether the external table and data have been connected.
```
SELECT * FROM marketdata LIMIT 10;
```
Note
Use the following command when you need to edit a CSV file.
- Delete quotation marks in the text field
```
find . -name file name.csv -exec perl -pi -e 's/"//g' {} \;
```
- Delete the first row (column name)
```
sed -e 'id' file name.csv
```