Transferring data from CDB MongoDB to Hive
    • PDF

    Transferring data from CDB MongoDB to Hive

    • PDF

    Article Summary

    Available in VPC

    This guide introduces how to migrate data from Cloud DB for MongoDB to Hive using NAVER Cloud Platform Object Storage.

    you can migrate data from MongoDB to Hive in the following two ways.

    1. Import data to NAVER Cloud Platform's CDB MongoDB
    2. Import data from Cloud Hadoop Hive to External Table after exporting data from CDB MongoDB to Object Storage

    hadoop-vpc-use-ex9_0-0_en

    Preparations

    1. Create Object Storage.

    2. Please generate a Cloud Hadoop cluster.

    3. Create Cloud DB for MongoDB and application server.

    4. Check the created MongoDB's private domain, port, user name, and user password.

    Import data to MongoDB

    The following describes you to access the application server, and then access the MongoDB server to import data.

    1. See Getting started with Cloud DB for MongoDB to access the application server and install MongoDB.

    2. Run the following commands in order to install mongoimport, and decompress the file.

      # wget https://repo.mongodb.org/yum/redhat/6/mongodb-org/4.2/x86_64/RPMS/mongodb-org-tools-4.2.17-1.el6.x86_64.rpm
      # rpm -ivh mongodb-org-tools-4.2.17-1.el6.x86_64.rpm
      # wget https://repo.mongodb.org/yum/redhat/6/mongodb-org/4.2/x86_64/RPMS/mongodb-org-shell-4.2.17-1.el6.x86_64.rpm
      # rpm -ivh mongodb-org-shell-4.2.17-1.el6.x86_64.rpm
      
    3. Use the wget command to download the data to import.

      # wget http://www.barchartmarketdata.com/data-samples/mstf.csv
      
    4. Run the following command to upload the downloaded data to MongoDB.

      # mongoimport mstf.csv --type csv --headerline -d marketdata -c minibars -h <private domain>:<port> -u <username> -p <password> --authenticationDatabase admin
      
      Note

      You can also prepare data by creating DBs and collections directly in MongoDB.

    Export data from MongoDB to Object Storage

    The following describes how to export data uploaded to MongoDB to Object Storage.

    1. Run the following command to install AWS CLI.

      • Because the object Storage of the NAVER Cloud Platform is compatible with AWS S3, you can use AWS CLI without changes.
      # curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
      # unzip awscliv2.zip
      # ./aws/install
      
      --Confirm installation
      # aws --version
      
    2. Run the following command to export data.

      # mongoexport --host=<private domain>:<port> --collection=minibars --db=marketdata --out=marketdata.csv -u <username> -p <password> --authenticationDatabase admin
      
    3. Log in to the NAVER Cloud Platform portal, and then click the My Page > Manage authentication key menus in order to check the access key ID and secret key.
      cloudhadoop-mypage-authkey1_en

    4. Set up aws configure using the retrieved authentication key information.

      # aws configure
      AWS Access Key ID [None]: enter the access key ID
      AWS Secret Access Key [None]: enter the secret key
      Default region name [None] :
      Default output format [None] :
      
    5. Run the following commands in order to check the bucket list, upload data, and then check if it was uploaded successfully.

      # aws --endpoint-url=http://kr.object.ncloudstorage.com s3 ls
      --Bucket list
      2021-10-16 18:49:28 cdbbucket
      2021-09-29 12:20:58 ex-bucket
      2021-10-05 15:24:46 example-5
      2021-10-06 10:59:15 example-6
      
      # aws --endpoint-url=http://kr.object.ncloudstorage.com s3 cp marketdata.csv s3://<Bucket name to upload data to>/
      --Upload result
      upload: ./marketdata.csv to s3://ex-bucket/marketdata.csv
      
      # aws --endpoint-url=http://kr.object.ncloudstorage.com s3 ls s3://ex-bucket/
      --Upload check result
      2021-10-19 11:05:12   16261296 marketdata.csv
      
      Note

      You can also check the result in the NAVER Cloud Platform console's VPC environment by clicking Services > Storage > Object Storage > Bucket Management menus in order.

      hadoop-vpc-use-ex9_2-3_en

    Import data uploaded to Object Storage with Hive

    The following describes how to load data uploaded to Object Storage using Hive External Table.

    1. Run the following command to create an External Table for Hive to import the data in Object Storage.

      • For location, enter the bucket location where the data is uploaded.
      CREATE external TABLE if not EXISTS `marketdata` (
              id STRUCT<oid:STRING, bsontype:INT>,
          Symbol STRING,
          `Timestamp` STRING,
          Day INT,
          Open DOUBLE,
          High DOUBLE,
          Low DOUBLE,
          Close DOUBLE,
          Volume INT)
      ROW FORMAT DELIMITED
      FIELDS TERMINATED BY ','
      location 's3a://<Bucket name where the data was uploaded>/';
      
    2. Run the following command to check whether the external table and data have been connected.

      SELECT * FROM marketdata LIMIT 10;
      
      Note

      Use the following command when you need to edit a CSV file.

      • Delete quotation marks in the text field
      find . -name file name.csv -exec perl -pi -e 's/"//g' {} \;
      
      • Delete the first row (column name)
      sed -e 'id' file name.csv
      

    Was this article helpful?

    Changing your password will log you out immediately. Use the new password to log back in.
    First name must have atleast 2 characters. Numbers and special characters are not allowed.
    Last name must have atleast 1 characters. Numbers and special characters are not allowed.
    Enter a valid email
    Enter a valid password
    Your profile has been successfully updated.