Available in VPC
Once you've reviewed the Cloud Hadoop [specifications], [quickstart], and [glossary], you're ready to start using the service. Your first step is creating a Cloud Hadoop cluster. You can create and manage Cloud Hadoop clusters from the NAVER Cloud Platform console.
The following summarizes what you can learn from the start guide:
Preliminary task
-
Create Object Storage
Before creating a cluster, you must have created an Object Storage bucket for storing and searching data. For more information, see Object Storage guide. -
Create VPC and subnet
Create VPC and subnet in
> Services > Networking > VPC on the NAVER Cloud Platform console. For more information, see VPC user guide. Regardless of the number of clusters, at least 1 VPC is required. You can have multiple clusters in the same VPC and use them. In a private VPC environment, you can create VPC only in the KR-2 Region.
When creating Cloud Hadoop, you can create and use public subnets and private subnets according to the node's purpose. In the VPC environment, you can set edge nodes and master nodes as public subnets or private subnets and worker nodes only as private subnets. When creating Cloud Hadoop, the number of edge nodes is fixed at 1 and the number of master nodes is fixed at 2. In this guide, 1 private subnet and 2 public subnets were created.
| VPC | SUBNET 1 (PRIVATE) | SUBNET 2 (PUBLIC) | SUBNET 3 (PUBLIC) |
|---|---|---|---|
| 172.16.0.0/16 | 172.16.0.0/24 | 172.16.1.0/28 | 172.16.2.0/28 |
- Select node type
Select a node type in advance considering the expected usage.
Create cluster
To use NAVER Cloud Platform's Cloud Hadoop, you must create a cluster first.
To create a Cloud Hadoop cluster:
- Access the NAVER Cloud Platform console.
- Click Region & Platform in the upper-right corner of the console interface.
- Select the region and platform you are using, and click [Apply].
- Click
in the upper-left corner of the console interface. - Click Services > Big Data & Analytics > Cloud Hadoop in order.
- Click the [Create cluster] button.
- When the Create cluster page appears, proceed with the following steps in order:
1. Set cluster
Specify the cluster settings information and then click the [Next] button.
- Cluster version: Currently, Cloud Hadoop versions 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0, 2.1, 2.2, and 2.3 are provided. For more information on cluster versions, see Cloud Hadoop release notes.
- Cluster type: There are currently four cluster types: Core Hadoop, Presto, HBase, and Spark. You can select a type with the components you need pre-installed. If you need to add necessary services, you can use the Add service feature in Ambari, which is a cluster managing tool.
- Cluster add-on: In addition to the basic type, you can select additional components to install as options.
- Use catalog of Data Catalog: Use Data Catalog's catalog to provide Cloud Hadoop hive metastore.
- Kerberos authentication configuration: Select if you want to configure a Secure Hadoop cluster using Kerberos. Realm is the authentication management domain. Kerberos Distribute Center (KDC) will be configured as the following set values:
- Realm: KDC's Realm information KDC admin (can only use capital letters for Realm names)
- KDC manager account password: KDC admin account’s password
- VPC: Select the VPC created in Preliminary task.
- Cluster admin account: Set the cluster account for accessing the management console of Ambari, Hue, and Zeppelin.
- Cluster admin account password: Enter the cluster admin account's password.
- ACG settings: Cloud Hadoop ACG is automatically created whenever you create a cluster. To set up network ACLs, you can select an automatically generated ACG and modify the rules. For more information about setting up ACGs, see the Firewall settings (ACG) guide.
The account (ID/Password) of Ranger UI in Cloud Hadoop 1.3 version is set to admin/admin.
The account (ID/Password) of Ranger UI in Cloud Hadoop 1.4 or higher version is set to admin/{password you entered}.
2. Set storage and server
After specifying the storage and node server settings information, click the [Next] button.
- Object Storage buckets: Cloud Hadoop cluster can read and write data in the object storage buckets created in Preliminary task. When creating a cluster, select the Object Storage bucket created in Preliminary task. Locked buckets cannot be integrated with Cloud Hadoop. Keep this in mind when creating an Object Storage bucket.
- Bootstrap script: This feature is provided in the Cloud Hadoop 1.6 or higher. Bootstrap script feature is a function that runs the shell script uploaded to the Object Storage bucket integrated with Cloud Hadoop when creating Cloud Hadoop. To enable it, mark the check box and enter the file name of the uploaded shell script. The script will not be executed if the file does not exist in the Object Storage bucket or if an incorrect file name is entered. You can find bootstrap execution logs in the same path as the shell script uploaded to the Object Storage bucket.
- Support high availability: Cloud Hadoop basically provides redundancy for HDFS NameNode, YARN Resource Manager, Oozie Server, and HiveServer. Since this is the specification that is required as a minimum, it cannot be deselected.
- Edge node server type: Select the server type to be used for the edge node. For the specifications of servers that can be used as edge nodes, see Supported server specifications by cluster node.
- Edge node subnet: Select a subnet where the edge node will be located.
- Number of edge nodes: The number of edge nodes is fixed at 1.
- Master node subnet: Select a subnet where the master node will be located.
- Master node server type: Select the server type to be used for the master node. For the specifications of servers that can be used as master nodes, see Supported server specifications by cluster node.
- Number of master nodes: Since Cloud Hadoop provides high availability as the minimum specifications, the number of master nodes is fixed at 2.
- Master node storage type: Select the storage type. You can select between SSD and HDD. You cannot change the storage type after you create a cluster.
- Master node storage capacity: Select the storage capacity. You can select a minimum of 100 GB and up to 2000 GB (in 10 GB units). 4000 GB and 6000 GB can also be selected.
- Worker node subnet: Select the subnet to place the worker node.
- Worker node server type: Select the server type to be used for worker nodes. For the specifications of servers that can be used as operator nodes, see Supported server specifications by cluster node.
- Number of worker nodes: You can select 2-8 worker nodes. Worker nodes can be added or deleted even after the cluster is created.
- Worker node storage type: Select the storage type. You can select between SSD and HDD. You cannot change the storage type after you create a cluster.
- Worker node storage capacity: Select the storage capacity. You can select a minimum of 100 GB and up to 2000 GB (in 10 GB units). 4000 GB and 6000 GB can also be selected.
- Pricing plan: The pricing plan you selected when you created your account applies. For more pricing information, see Pricing information.
If you set the network ACL rules of the VPC service separately, Cloud Hadoop cluster creation may not work properly.
You may fail to create a cluster if you have the following conditions in inbound/outbound rules:
- If there is a deny rule for 0.0.0.0/0 1-65535
- If there is a deny rule that overlaps the bandwidth of the subnet on which you want to create a Cloud Hadoop cluster
- If there is a deny rule that overlaps with Cloud Hadoop's default ACG bandwidth
3. Set authentication key
Set the SSH authentication key required for accessing the node yourself.
Select an authentication key you have or create a new one and click [Next] button.
- To create a new authentication key, select Create new authentication key, enter the authentication key name, and click the [Create and save authentication key] button.
The authentication key is required to verify the admin password. Keep the saved the PEM file in a safe location on your PC.
4. Final confirmation
Check the details and click the [Create] button.
- Cloud Hadoop ACG is automatically created whenever you create a cluster. To set up network ACLs, you can select an automatically generated ACG and modify the rules. For more information on the ACG settings, see Firewall settings (ACG).
- It takes approximately 30 to 50 minutes for a cluster to be created. Once the cluster is created and it starts running, you can see Running displayed in the Status column of the cluster list.
Delete cluster
To delete a Cloud Hadoop cluster:
- Navigate to
> Services > Big Data & Analytics > Cloud Hadoop from the VPC environment on the NAVER Cloud Platform console. - Select the cluster to delete from the cluster list, and then click the [Delete] button.
- Enter the cluster name on the popup window to confirm deletion, and then click the [Yes] button.
It takes several minutes to delete a cluster. Once deleted, the cluster disappears from the cluster list.
If you delete a Cloud Hadoop cluster, the data saved in the node's local file system or HDFS will all be deleted as well. Back up the necessary files, for example, by copying them to the Object Storage bucket.
Delete Object Storage file or bucket
Select the file to delete from the Object Storage console and click [Edit] > Delete.
For more information on deleting Object Storage files or buckets, see Object Storage user guide.
Deleted Object Storage files or buckets cannot be restored. Consider carefully before proceeding.