Copying HDFS data to Object Storage
    • PDF

    Copying HDFS data to Object Storage

    • PDF

    Article Summary

    Available in VPC

    This guide explains how to create Object Storage and link HDFS data, and to copy HDFS data to Object Storage.

    Create Object Storage

    You must have Object Storage created first in order to link HDFS data.

    Select the Object Storage service from the NAVER Cloud Platform console, and create a bucket. For more information about how to create Object Storage, see Object Storage overview.

    Create API authentication key

    To link Object Storage, you must create an API authentication key.

    The following describes how to create an API authentication key.

    1. Log in to the NAVER Cloud Platform portal.
    2. Click the My Page > Manage account > Manage authentication key menu.
    3. Click the [Create new API authentication key] button.
    4. Check the information for the created API authentication key.
      • The access key ID and secret key are used when linking HDFS data.

    Copy files to HDFS

    If the bucket and API authentication key are all created, then use the CLI provided by Data Forest in the VM to configure a development environment.

    After the development environment configuration is completed, you can set the Object Storage access address and authentication key in a Hadoop command as shown in the example below to copy data to cp.

    $ hadoop fs -Dfs.s3a.endpoint=http://kr.object.private.ncloudstorage.com -Dfs.s3a.access.key={ACCESS_KEY_ID} -Dfs.s3a.secret.key={SECRET_KEY} -Dfs.s3a.connection.ssl.enabled=false -cp hdfs://koya/user/{USERNAME}/ExampleFile  s3a://{BUCKET_NAME}
    

    Copy files to Object Storage with AWS CLI

    NAVER Cloud Platform's Object Storage can be used with the CLI provided by AWS S3.

    Note

    Refer to Object Storage CLI Guide for how to set preferences and use commands to use the CLI.

    1. Create app

    Access the VM, and set the authentication information as shown below using AWS CLI commands.

    $ aws configure
    AWS Access Key ID [****************leLy]: ACCESS_KEY_ID
    AWS Secret Access Key [None]: SECRET_KEY
    Default region name [None]: [Enter]
    Default output format [None]: [Enter]
    

    2. Check my bucket information

    Once the authentication information setting is completed, use the CLI to view the list of buckets created.

    $ aws --endpoint-url=https://kr.object.private.ncloudstorage.com s3 ls
    2020-06-24 11:09:41 bucket-1
    2020-07-14 18:00:17 bucket-3
    2020-09-17 19:37:36 bucket-4
    2020-09-17 20:23:39 bucket-6
    
    Note
    • The --endpoint-url option is required when using CLI.
    • For VPC environments, Object Storage's endpoint-url address is kr.object.private.ncloudstorage.com.

    3. Copy single file

    Use the S3 cp command to upload a specific file to a specific bucket.

    $ aws --endpoint-url=http://kr.object.private.ncloudstorage.com s3 cp SOURCE_FILE s3://DEST_BUCKET/FILE_NAME
    

    4. Copy bulk files

    Use the S3 sync command to sync the content between a directory and bucket, or two buckets.

    $ aws --endpoint-url=http://kr.object.private.ncloudstorage.com s3 sync SOURCE_DIR s3://DEST_BUCKET/
    
    Caution

    Please be careful since using the --delete option may delete files or objects that are not present in the source.

    To upload a directory and files under it to Object Storage at once, use S3 cp's --recursive option.

    $ aws --endpoint-url=http://kr.object.private.ncloudstorage.com s3 cp --recursive SOURCE_DIR s3://DEST_BUCKET/
    

    Was this article helpful?

    Changing your password will log you out immediately. Use the new password to log back in.
    First name must have atleast 2 characters. Numbers and special characters are not allowed.
    Last name must have atleast 1 characters. Numbers and special characters are not allowed.
    Enter a valid email
    Enter a valid password
    Your profile has been successfully updated.