Copying HDFS data to Object Storage

release/20240425
English

Copying HDFS data to Object Storage

Article Summary

Share feedback

Thanks for sharing your feedback!

Available in VPC

This guide explains how to create Object Storage and link HDFS data, and to copy HDFS data to Object Storage.

Create Object Storage

You must have Object Storage created first in order to link HDFS data.

Select the Object Storage service from the NAVER Cloud Platform console, and create a bucket. For more information about how to create Object Storage, see Object Storage overview.

Create API authentication key

To link Object Storage, you must create an API authentication key.

The following describes how to create an API authentication key.

Log in to the NAVER Cloud Platform portal.
Click the My Page > Manage account > Manage authentication key menu.
Click the [Create new API authentication key] button.
Check the information for the created API authentication key.
- The access key ID and secret key are used when linking HDFS data.

Copy files to HDFS

If the bucket and API authentication key are all created, then use the CLI provided by Data Forest in the VM to configure a development environment.

After the development environment configuration is completed, you can set the Object Storage access address and authentication key in a Hadoop command as shown in the example below to copy data to cp.

$ hadoop fs -Dfs.s3a.endpoint=http://kr.object.private.ncloudstorage.com -Dfs.s3a.access.key={ACCESS_KEY_ID} -Dfs.s3a.secret.key={SECRET_KEY} -Dfs.s3a.connection.ssl.enabled=false -cp hdfs://koya/user/{USERNAME}/ExampleFile  s3a://{BUCKET_NAME}

Copy files to Object Storage with AWS CLI

NAVER Cloud Platform's Object Storage can be used with the CLI provided by AWS S3.

Note

Refer to Object Storage CLI Guide for how to set preferences and use commands to use the CLI.

1. Create app

Access the VM, and set the authentication information as shown below using AWS CLI commands.

$ aws configure
AWS Access Key ID [****************leLy]: ACCESS_KEY_ID
AWS Secret Access Key [None]: SECRET_KEY
Default region name [None]: [Enter]
Default output format [None]: [Enter]

2. Check my bucket information

Once the authentication information setting is completed, use the CLI to view the list of buckets created.

$ aws --endpoint-url=https://kr.object.private.ncloudstorage.com s3 ls
2020-06-24 11:09:41 bucket-1
2020-07-14 18:00:17 bucket-3
2020-09-17 19:37:36 bucket-4
2020-09-17 20:23:39 bucket-6

Note

The --endpoint-url option is required when using CLI.
For VPC environments, Object Storage's endpoint-url address is kr.object.private.ncloudstorage.com.

3. Copy single file

Use the S3 cp command to upload a specific file to a specific bucket.

$ aws --endpoint-url=http://kr.object.private.ncloudstorage.com s3 cp SOURCE_FILE s3://DEST_BUCKET/FILE_NAME

4. Copy bulk files

Use the S3 sync command to sync the content between a directory and bucket, or two buckets.

$ aws --endpoint-url=http://kr.object.private.ncloudstorage.com s3 sync SOURCE_DIR s3://DEST_BUCKET/

Caution

Please be careful since using the --delete option may delete files or objects that are not present in the source.

To upload a directory and files under it to Object Storage at once, use S3 cp's --recursive option.

$ aws --endpoint-url=http://kr.object.private.ncloudstorage.com s3 cp --recursive SOURCE_DIR s3://DEST_BUCKET/

Was this article helpful?

What's Next

Registering Spark batch jobs to Oozie scheduler

Table of contents

Create Object Storage
Create API authentication key
Copy files to HDFS
Copy files to Object Storage with AWS CLI