Transfer data from CDB MongoDB to Hive

Prev Next

Available in VPC

This guide introduces how to migrate data from Cloud DB for MongoDB to Hive using NAVER Cloud Platform Object Storage.

You can migrate data from MongoDB to Hive in the following two ways:

  1. Import data to NAVER Cloud Platform's CDB MongoDB.
  2. Import data from Cloud Hadoop Hive to External Table after exporting data from CDB MongoDB to Object Storage.

hadoop-vpc-use-ex9_0-0

Preliminary tasks

  1. Create Object Storage.

  2. Create a Cloud Hadoop cluster.

  3. Create Cloud DB for MongoDB and application server.

  4. Check the created MongoDB's private domain, port, user name, and user password.

Import data to MongoDB

The following describes you to access the application server, and then access the MongoDB server to import data.

  1. See Getting started with Cloud DB for MongoDB to access the application server and install MongoDB.

  2. Run the following commands in order to install mongoimport, and decompress the file.

    # wget https://repo.mongodb.org/yum/redhat/6/mongodb-org/4.2/x86_64/RPMS/mongodb-org-tools-4.2.17-1.el6.x86_64.rpm
    # rpm -ivh mongodb-org-tools-4.2.17-1.el6.x86_64.rpm
    # wget https://repo.mongodb.org/yum/redhat/6/mongodb-org/4.2/x86_64/RPMS/mongodb-org-shell-4.2.17-1.el6.x86_64.rpm
    # rpm -ivh mongodb-org-shell-4.2.17-1.el6.x86_64.rpm
    
  3. Use the wget command to download the data to import.

    # wget http://www.barchartmarketdata.com/data-samples/mstf.csv
    
  4. Run the following command to upload the downloaded data to MongoDB.

    # mongoimport mstf.csv --type csv --headerline -d marketdata -c minibars -h <private domain>:<port> -u <username> -p <password> --authenticationDatabase admin
    
    Note

    You can also prepare data by creating DBs and collections directly in MongoDB.

Export data from MongoDB to Object Storage

The following describes how to export data uploaded to MongoDB to Object Storage.

  1. Run the following command to install AWS CLI.

    • Because the object Storage of the NAVER Cloud Platform is compatible with AWS S3, you can use AWS CLI without changes.
    # curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
    # unzip awscliv2.zip
    # ./aws/install
    
    --Confirm installation
    # aws --version
    
  2. Run the following command to export data.

    # mongoexport --host=<private domain>:<port> --collection=minibars --db=marketdata --out=marketdata.csv -u <username> -p <password> --authenticationDatabase admin
    
  3. Check the access key ID, and SecretKey in accordance with Ncloud API.

  4. Set up aws configure using the retrieved authentication key information.

    # aws configure
    AWS Access Key ID [None]: Enter the access key ID.
    AWS Secret Access Key [None]: Enter the secret key.
    Default region name [None] :
    Default output format [None] :
    
  5. Run the following commands in order to check the bucket list, upload data, and then check if it was uploaded successfully.

    # aws --endpoint-url=http://kr.object.ncloudstorage.com s3 ls
    --Bucket list
    2021-10-16 18:49:28 cdbbucket
    2021-09-29 12:20:58 ex-bucket
    2021-10-05 15:24:46 example-5
    2021-10-06 10:59:15 example-6
    
    # aws --endpoint-url=http://kr.object.ncloudstorage.com s3 cp marketdata.csv s3://<Bucket name to upload data to>/
    --Upload result
    upload: ./marketdata.csv to s3://ex-bucket/marketdata.csv
    
    # aws --endpoint-url=http://kr.object.ncloudstorage.com s3 ls s3://ex-bucket/
    --Upload check result
    2021-10-19 11:05:12   16261296 marketdata.csv
    
    Note

    You can also check the result in the NAVER Cloud Platform console's VPC environment by navigating to Services > Storage > Object Storage > Bucket Management.

    hadoop-vpc-use-ex9_2-3_ko

Import data uploaded to Object Storage with Hive

The following describes how to load data uploaded to Object Storage using Hive External Table.

  1. Run the following command to create an External Table for Hive to import the data in Object Storage.

    • For location, enter the bucket location where the data is uploaded.
    CREATE external TABLE if not EXISTS `marketdata` (
            id STRUCT<oid:STRING, bsontype:INT>,
        Symbol STRING,
        `Timestamp` STRING,
        Day INT,
        Open DOUBLE,
        High DOUBLE,
        Low DOUBLE,
        Close DOUBLE,
        Volume INT)
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ','
    location 's3a://<Bucket name where the data was uploaded>/';
    
  2. Run the following command to check whether the external table and data have been connected.

    SELECT * FROM marketdata LIMIT 10;
    
    Note

    Use the following command when you need to edit a CSV file.

    • Delete quotation marks in the text field
    find . -name file name.csv -exec perl -pi -e 's/"//g' {} \;
    
    • Delete the first row (column name)
    sed -e 'id' file name.csv