Getting started with Cloud Hadoop
    • PDF

    Getting started with Cloud Hadoop

    • PDF

    Article Summary

    The latest service changes have not yet been reflected in this content. We will update the content as soon as possible. Please refer to the Korean version for information on the latest updates.

    Available in VPC

    If you have checked the supported environments and required specifications for the Cloud Hadoop and duly noted all scenarios and terms, then you are now ready to start using Cloud Hadoop. The first thing to do is to create a Cloud Hadoop cluster. The creation and management of Cloud Hadoop clusters are conducted from the NAVER Cloud Platform console.
    The following summarizes what you can learn from the start guide.

    Preparations

    1. Object Storage creation
      Before creating a cluster, you must have created an Object Storage bucket for storing and retrieving data. For more information, see Object Storage guide.

    2. VPC, subnet creation
      Create VPC and subnet in Networking > VPC on the NAVER Cloud Platform console. For more information, see VPC user guide. Regardless of the number of clusters, at least 1 VPC is required. You can have multiple clusters in the same VPC and use them. In a private VPC environment, you can create VPC only in the KR-2 Region.
      When creating Cloud Hadoop, you can create and use public subnets and private subnets according to the node's purpose. In the VPC environment, you can set edge nodes and master nodes as public subnets or private subnets and worker nodes only as private subnets. When creating Cloud Hadoop, the number of edge nodes is fixed at 1 and the number of master nodes at 2. In this guide, 1 private subnet and 2 public subnets were created.

    VPCSUBNET 1 (PRIVATE)SUBNET 2 (PUBLIC)SUBNET 3 (PUBLIC)
    172.16.0.0/16172.16.0.0/24172.16.1.0/28172.16.2.0/28
    1. Selecting node type
      Select a node type in advance considering expected usage.

    Create cluster

    To use NAVER Cloud Platform's Cloud Hadoop, you must create a cluster first.

    The following describes how to create a Cloud Hadoop cluster.

    1. Access the NAVER Cloud Platform console.
    2. Click VPC from the Platform menu to switch to the VPC environment.
    3. Click Services > Big Data & Analytics > Cloud Hadoop in order.
    4. Click the [Create cluster] button.
    5. When the Create cluster page appears, proceed with the following steps in order.

    1. Set cluster

    Specify the cluster settings information, and then click the [Next] button.

    • Cluster version: currently, Cloud Hadoop version 1.3, 1.4, 1.5, 1.6, 1.7, 1.8 , 1.9 , 2.0 , and 2.1 are provided. For more information on cluster versions, see Cloud Hadoop release note.
    • Cluster type: currently, there are four cluster types with Core Hadoop, Presto, HBase, and Spark. You can select the installed type with the necessary components. If necessary services need to be added, you can use the Add Service feature in Ambari, which is a cluster managing tool.
    • Cluster admin account: set cluster account for accessing the management console of Ambari, Hue, Zeppelin.
    • Cluster add-on: in addition to the basic type, you may select additional installed components as options.
    • Using catalog of data catalog service: using catalog of data catalog service provides the Cloud Hadoop Hive Metastore.
    • Kerberos authentication configuration: select if you want to configure a Secure Hadoop cluster using Kerberos. Realm is the authentication management domain. Kerberos Distribute Center (KDC) will be configured as the set values below.
      • Realm: KDC's Realm information KDC admin (Can only use capital letters for Realm names)
      • KDC manager account password: KDC admin account’s password
    • VPC: select the VPC created in preparations.
    • Cluster admin account: set cluster account for accessing the management console of Ambari, Hue, Zeppelin.
    • Cluster admin account password: enter the cluster admin account's password.
    • ACG settings: Cloud Hadoop ACG is automatically created whenever you create a cluster. If you want to set up a network ACL, then you can edit the rule by selecting the ACG that was created automatically. For more information on ACG settings, see the guide titled Firewall settings (ACG).
    Caution

    The account (ID/Password) of Ranger UI in Cloud Hadoop 1.3 version is set to admin/admin.
    The account (ID/Password) of Ranger UI in Cloud Hadoop 1.4 or higher version is set admin/{password entered by the user}.

    2. Set storage and server

    Specify the storage and node server settings information, and then click the [Next] button.

    • Object Storage buckets: Cloud Hadoop can read and write data in object storage buckets created in preparations. When creating a cluster, select the Object Storage bucket created in preparations. Locked buckets are not connected with Cloud Hadoop. Keep this in mind when creating an Object Storage bucket.
    • Bootstrap script: bootstrap script feature is a function that runs the shell script uploaded to the Object Storage bucket connected to Cloud Hadoop when creating Cloud Hadoop. To enable it, mark the check box, and enter the file name of the uploaded shell script. The script will not be executed if the file does not exist in the Object Storage bucket or if an incorrect file name is entered. You can find bootstrap execution logs in the same path as the shell script uploaded to the Object Storage bucket.
    • Support high availability: Cloud Hadoop basically provides redundancy for HDFS NameNode, YARN Resource Manager, Oozie Server, and HiveServer. Since this is the specification that is required at minimum, it can't be deselected.
    • Edge node server type: select the server type to be used for the edge node. For specifications of servers that can be used as edge nodes, see Supported server specifications by cluster node.
    • Edge node subnet: select a subnet in which to locate the edge node.
    • Number of edge nodes: the number of edge nodes is fixed at 1.
    • Notebook node subnet: select the subnet to place the master node.
    • Master node server type: select the server type to be used for the master node. For specifications of servers that can be used as master nodes, see Supported server specifications by cluster node.
    • Number of master nodes: since Cloud Hadoop provides high availability as minimum specifications, the number of master nodes is fixed at 2.
    • Master node storage type: select the storage type. You can choose between SSD and HDD. The storage type can't be changed after the cluster is created.
    • Master node storage capacity: select the storage capacity. You can select a minimum of 100 GB, up to 2000 GB (in 10 GB units). 4000 GB and 6000 GB can be also selected.
    • Worker node subnet: select the subnet to place the worker node.
    • Worker node server type: select the server type to be used for worker nodes. For specifications of servers that can be used as operator nodes, see Supported server specifications by cluster node.
    • Number of operator nodes: the number of operator nodes can be between 2 and 8. Worker nodes can be added or deleted even after the cluster is created.
    • Worker node storage type: select the storage type. You can choose between SSD and HDD. The storage type can't be changed after the cluster is created.
    • Worker node storage capacity: select the storage capacity. You can select from 100 GB up to 6 TB, and specify it in 10 GB increments.
    • Pricing plan: the pricing plan selected at account creation is applied. For more information on pricing, see Pricing information.
    Caution

    If you set network ACL rules of the VPC service separately, creating the Cloud Hadoop cluster may not work properly.
    You may fail to create a cluster if you have the following conditions in inbound/outbound rules.

    1. If there is a deny rule for 0.0.0.0/0 1-65535
    2. If there is a deny rule that overlaps the bandwidth of the subnet you want to create a Cloud Hadoop cluster on
    3. If there is a deny rule that overlaps with Cloud Hadoop's default ACG band

    3. Set authentication key

    Set the SSH authentication key required for connecting directly to the node.
    Select an authentication key you have or create a new one and click [Next].

    • To create a new authentication key, select Create new authentication key, enter the authentication key name, and then click the [Create and save authentication key] button.
    Note

    The authentication key is required to get the admin password. Keep the saved PEM file in a safe location on your PC.

    4. Final confirmation

    After checking the request details, click the [Create] button.

    Note
    • Cloud Hadoop ACG is automatically created whenever you create a cluster. If you want to set up a network ACL, then you can edit the rule by selecting the ACG that was created automatically. For more information on ACG settings, see Firewall settings (ACG).
    • It takes about 30 to 50 minutes for a cluster to be created. Once the cluster is created and starts running, you can see Running displayed in the Status column of the cluster list.

    Delete cluster

    The following describes how to delete a Cloud Hadoop cluster.

    1. In Classic environment on the NAVER Cloud Platform console, click Services > Big Data & Analytics > Cloud Hadoop in order.
    2. Select the cluster to delete from the cluster list, and then click the [Delete] button.
    3. Enter the cluster name in the pop-up window to confirm deletion, and then click the [Yes] button.
    Note

    It takes several minutes to delete a cluster. Once the cluster is deleted, the cluster disappears from the cluster list.

    Caution

    If you delete a Cloud Hadoop cluster, then the data saved in the node's local file system or HDFS will all be deleted as well. Back up the necessary files, for example, by copying them into the Object Storage bucket.

    Delete Object Storage bucket and file

    Select the file to delete from the Object Storage console and click [Edit] > Delete.
    For more information on deleting files or buckets, see Object Storage user guide.

    Caution

    Deleted Object Storage files or buckets cannot be restored. Consider carefully before proceeding.


    Was this article helpful?

    Changing your password will log you out immediately. Use the new password to log back in.
    First name must have atleast 2 characters. Numbers and special characters are not allowed.
    Last name must have atleast 1 characters. Numbers and special characters are not allowed.
    Enter a valid email
    Enter a valid password
    Your profile has been successfully updated.