- Print
- PDF
Copying HDFS data to Object Storage
- Print
- PDF
Available in VPC
This guide explains how to create Object Storage and link HDFS data, and to copy HDFS data to Object Storage.
Create Object Storage
You must have Object Storage created first in order to link HDFS data.
Select the Object Storage service from the NAVER Cloud Platform console, and create a bucket. For more information about how to create Object Storage, see Object Storage overview.
Create API authentication key
To link Object Storage, you must create an API authentication key.
The following describes how to create an API authentication key.
- Log in to the NAVER Cloud Platform portal.
- Click the My Page > Manage account > Manage authentication key menu.
- Click the [Create new API authentication key] button.
- Check the information for the created API authentication key.
- The access key ID and secret key are used when linking HDFS data.
Copy files to HDFS
If the bucket and API authentication key are all created, then use the CLI provided by Data Forest in the VM to configure a development environment.
After the development environment configuration is completed, you can set the Object Storage access address and authentication key in a Hadoop command as shown in the example below to copy data to cp
.
$ hadoop fs -Dfs.s3a.endpoint=http://kr.object.private.ncloudstorage.com -Dfs.s3a.access.key={ACCESS_KEY_ID} -Dfs.s3a.secret.key={SECRET_KEY} -Dfs.s3a.connection.ssl.enabled=false -cp hdfs://koya/user/{USERNAME}/ExampleFile s3a://{BUCKET_NAME}
Copy files to Object Storage with AWS CLI
NAVER Cloud Platform's Object Storage can be used with the CLI provided by AWS S3.
Refer to Object Storage CLI Guide for how to set preferences and use commands to use the CLI.
1. Create app
Access the VM, and set the authentication information as shown below using AWS CLI commands.
$ aws configure
AWS Access Key ID [****************leLy]: ACCESS_KEY_ID
AWS Secret Access Key [None]: SECRET_KEY
Default region name [None]: [Enter]
Default output format [None]: [Enter]
2. Check my bucket information
Once the authentication information setting is completed, use the CLI to view the list of buckets created.
$ aws --endpoint-url=https://kr.object.private.ncloudstorage.com s3 ls
2020-06-24 11:09:41 bucket-1
2020-07-14 18:00:17 bucket-3
2020-09-17 19:37:36 bucket-4
2020-09-17 20:23:39 bucket-6
- The
--endpoint-url
option is required when using CLI. - For VPC environments, Object Storage's endpoint-url address is
kr.object.private.ncloudstorage.com
.
3. Copy single file
Use the S3 cp command to upload a specific file to a specific bucket.
$ aws --endpoint-url=http://kr.object.private.ncloudstorage.com s3 cp SOURCE_FILE s3://DEST_BUCKET/FILE_NAME
4. Copy bulk files
Use the S3 sync command to sync the content between a directory and bucket, or two buckets.
$ aws --endpoint-url=http://kr.object.private.ncloudstorage.com s3 sync SOURCE_DIR s3://DEST_BUCKET/
Please be careful since using the --delete
option may delete files or objects that are not present in the source.
To upload a directory and files under it to Object Storage at once, use S3 cp's --recursive
option.
$ aws --endpoint-url=http://kr.object.private.ncloudstorage.com s3 cp --recursive SOURCE_DIR s3://DEST_BUCKET/