- Print
- PDF
Transferring data from CDB MongoDB to Hive
- Print
- PDF
Available in VPC
This guide introduces how to migrate data from Cloud DB for MongoDB to Hive using NAVER Cloud Platform Object Storage.
you can migrate data from MongoDB to Hive in the following two ways.
- Import data to NAVER Cloud Platform's CDB MongoDB
- Import data from Cloud Hadoop Hive to External Table after exporting data from CDB MongoDB to Object Storage
Preparations
Create Object Storage.
- For more information on creating Object Storage, see Object Storage guide.
Please generate a Cloud Hadoop cluster.
- For more information on creating a Cloud Hadoop cluster, see Getting Started with Cloud Hadoop.
Create Cloud DB for MongoDB and application server.
- For more information on creating MongoDB and application servers, see Getting started with Cloud DB for MongoDB guide.
Check the created MongoDB's private domain, port, user name, and user password.
- For more information, see Use Cloud DB for MongoDB guide.
Import data to MongoDB
The following describes you to access the application server, and then access the MongoDB server to import data.
See Getting started with Cloud DB for MongoDB to access the application server and install MongoDB.
Run the following commands in order to install mongoimport, and decompress the file.
# wget https://repo.mongodb.org/yum/redhat/6/mongodb-org/4.2/x86_64/RPMS/mongodb-org-tools-4.2.17-1.el6.x86_64.rpm # rpm -ivh mongodb-org-tools-4.2.17-1.el6.x86_64.rpm # wget https://repo.mongodb.org/yum/redhat/6/mongodb-org/4.2/x86_64/RPMS/mongodb-org-shell-4.2.17-1.el6.x86_64.rpm # rpm -ivh mongodb-org-shell-4.2.17-1.el6.x86_64.rpm
Use the
wget
command to download the data to import.# wget http://www.barchartmarketdata.com/data-samples/mstf.csv
Run the following command to upload the downloaded data to MongoDB.
# mongoimport mstf.csv --type csv --headerline -d marketdata -c minibars -h <private domain>:<port> -u <username> -p <password> --authenticationDatabase admin
NoteYou can also prepare data by creating DBs and collections directly in MongoDB.
Export data from MongoDB to Object Storage
The following describes how to export data uploaded to MongoDB to Object Storage.
Run the following command to install AWS CLI.
- Because the object Storage of the NAVER Cloud Platform is compatible with AWS S3, you can use AWS CLI without changes.
# curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" # unzip awscliv2.zip # ./aws/install --Confirm installation # aws --version
Run the following command to export data.
# mongoexport --host=<private domain>:<port> --collection=minibars --db=marketdata --out=marketdata.csv -u <username> -p <password> --authenticationDatabase admin
Log in to the NAVER Cloud Platform portal, and then click the My Page > Manage authentication key menus in order to check the access key ID and secret key.
Set up
aws configure
using the retrieved authentication key information.# aws configure AWS Access Key ID [None]: enter the access key ID AWS Secret Access Key [None]: enter the secret key Default region name [None] : Default output format [None] :
Run the following commands in order to check the bucket list, upload data, and then check if it was uploaded successfully.
# aws --endpoint-url=http://kr.object.ncloudstorage.com s3 ls --Bucket list 2021-10-16 18:49:28 cdbbucket 2021-09-29 12:20:58 ex-bucket 2021-10-05 15:24:46 example-5 2021-10-06 10:59:15 example-6 # aws --endpoint-url=http://kr.object.ncloudstorage.com s3 cp marketdata.csv s3://<Bucket name to upload data to>/ --Upload result upload: ./marketdata.csv to s3://ex-bucket/marketdata.csv # aws --endpoint-url=http://kr.object.ncloudstorage.com s3 ls s3://ex-bucket/ --Upload check result 2021-10-19 11:05:12 16261296 marketdata.csv
NoteYou can also check the result in the NAVER Cloud Platform console's VPC environment by clicking Services > Storage > Object Storage > Bucket Management menus in order.
Import data uploaded to Object Storage with Hive
The following describes how to load data uploaded to Object Storage using Hive External Table.
Run the following command to create an External Table for Hive to import the data in Object Storage.
- For
location
, enter the bucket location where the data is uploaded.
CREATE external TABLE if not EXISTS `marketdata` ( id STRUCT<oid:STRING, bsontype:INT>, Symbol STRING, `Timestamp` STRING, Day INT, Open DOUBLE, High DOUBLE, Low DOUBLE, Close DOUBLE, Volume INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' location 's3a://<Bucket name where the data was uploaded>/';
- For
Run the following command to check whether the external table and data have been connected.
SELECT * FROM marketdata LIMIT 10;
NoteUse the following command when you need to edit a CSV file.
- Delete quotation marks in the text field
find . -name file name.csv -exec perl -pi -e 's/"//g' {} \;
- Delete the first row (column name)
sed -e 'id' file name.csv