Object Storage and Sub Account integrations

Available in VPC

This chapter describes how to connect and use the Cloud Hadoop, Object Storage, and Sub Account of NAVER Cloud Platform.

Bigdata Service

Cloud Hadoop: you can easily build open source frameworks such as Hadoop, HBase, Spark, Hive, and Presto.
For more information on Cloud Hadoop, see the Cloud Hadoop overview guide.
Object Storage: you can use it to store data that requires secure storage or to store large volumes of data.
You can also use the provided APIs for server data backup and recovery purposes.
For more information on Object Storage, see the Object Storage overview guide.
Sub Account: you can register users as sub accounts and grant permissions for specific services.
The registered internal users can use the service for which they have been authorized in the same manner as their main account.
For more information on Sub Account, see Sub Account user guide

Integrate and use Hadoop, Object Storage, and Sub Account

Create and add policies for Sub Account

You can register users as sub accounts and grant permissions for specific services. The registered internal users can use the service for which they have been authorized in the same manner as their main account.

For more information on Sub Account, see Sub Account user guide

To create and add policies for Sub Account, follow these steps:

Create the Sub Account service from the NAVER Cloud Platform console.
- To access Object Storage, you must use API Key. When creating Sub Accounts, select API Access in the item of Access Type.
Click the login ID of Sub Accounts created from the Sub Accounts menu, and then add the usage policy in the [Policies] tab.
- Add the policy of NCP_OBJECT_STORAGE_MANAGER of Object Storage.
- Add the NCP_VPC_SERVER_MANAGER policy for ACG change management.
- Add the NCP_VPC_CLOUD_HADOOP_MANAGER policy of Cloud Hadoop.
[Add] API authentication key for accessing Object Storage in the [Access Key] tab, and confirm it.
Set the Sub Accounts Login Page Access Key in Sub Account > Dashboard.
Access the URL of the Sub account login page, and then Log in to the sub account as follows:
- The sub account ID is the Login ID set when creating the sub account.
- The password is the Login password set when creating the sub account.

You can only use the services allowed by the policies set for the account.

Create Object Storage bucket

Create the ncp-bigdata bucket to store data in Object Storage. You can check the created bucket in Bucket Management.

For more information on creating buckets, see Object Storage user guide

Create Cloud Hadoop cluster

Create a Cloud Hadoop cluster.
For more information about creating a Cloud Hadoop cluster, see Getting started with Cloud Hadoop guide.

Upload sample data onto Object Storage

Download the virtual data set created for examples.
- Decompress the downloaded file.
Click ncp-bigdata bucket in Object Storage > Bucket Management, and upload the decompressed file.

View data

You can view sample data using Web UIs, including Ambari, Hue, and Zeppelin.
For more information about how to connect to the Web UI, see the UI access and password settings by service guide.

When you complete creating your Cloud Hadoop cluster, add the ports required for Ambari, Hue, and Zeppelin access your ACG.
- For how to set ACG, see Set ACG rules
Access Hue (Port 8443), and then create a mart database with a Hive query.

CREATE DATABASE mart;

hadoop-use-ex-hivequery_ko

Create an orders2 table as follows:

USE mart;
CREATE EXTERNAL TABLE `orders2` (
        `order_id` BIGINT,
        `order_number` BIGINT,
        `days_since_prior_order` DOUBLE

)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
location 's3a://ncp-bigdata/';

hadoop-use-ex-hivetable_ko

View the data of the orders2 table.

SELECT * FROM orders2;

hadoop-use-ex3_4-2_ko

Access Zeppelin (port 8443).
Click [Notebook] > Create new note, and newly create Zeppelin Notebook.
Designate Default Interpreter as jdbc, and click the [Create] button.
View the orders2 table registered on Hive Metastore.

%jdbc(hive)
SELECT order_number, AVG(days_since_prior_order)
FROM mart.orders2
WHERE 1=1
AND order_number IS NOT NULL
GROUP BY order_number
ORDER BY order_number

Proceed with a simple analysis of the uploaded CSV file, which was uploaded as an example.
With the assumption that more orders result in shorter reorder terms, proceed with verification.

order_id: order number
order_number: the number of orders
days_since_prior_order: time since last order

In the orders2 table, days_since_prior_order is a column that displays how long it has been since the last order up to 30 days, and order_number is the number of orders by the user.

View the Hive table (orders2) created from Hue using Cloud Hadoop in Zeppelin, and then visualize it into a chart as follows:

When you take a look at the chart, you can assume that customers with a higher number of orders have a shorter time before the next reorder.

Note

If you use Object Storage, while its performance is inferior to HDFS, it has the advantage of being able to view the same data even after shutting down the cluster, by creating the cluster again when needed.
Since the Hive Metastore has not been separated, Hive DDL needs to be executed for new creations, and the Zeppelin Notebook also has to be newly imported.