Cloud Hadoop overview
    • PDF

    Cloud Hadoop overview

    • PDF

    Article summary

    Available in Classic

    Cloud Hadoop is a fully managed cloud analysis service where you can freely use open source-based frameworks such as Apache Hadoop, HBase, Spark, Hive, and Presto to process big data easily and quickly. Direct server access through a terminal is allowed, and you can directly manage it through the convenient cluster management feature provided through Ambari.
    You can easily configure the initial infrastructure with the Cloud Hadoop service on NAVER Cloud Platform. Stability, flexible scalability, and availability for services and tasks are secured by 2 master nodes provided including availability of node expansion/reduction at any time. In addition, you can analyze large-scale data with various frameworks and server types supported, and clusters can be controlled by management and monitoring through Web UI.

    Cloud Hadoop features

    • User convenience

      • Cloud Hadoop automatically supports cluster creation, reducing the burden on infrastructure management.
      • You can secure a system capable of analyzing anytime through the installation, configuration, and optimization processes of various open source frameworks.
    • Cost efficiency

      • It is an efficient service where you only pay as much as you used from the start to the end point of the cluster.
      • Cloud Hadoop saves large-scale data at a low cost by using NAVER Cloud Platform's Object Storage as its storage for data.
    • Flexible scalability and stability

      • You can easily reduce or increase the number of instances needed for analyzing data at the desired time.
      • 2 master nodes are provided for higher stability and availability of services and tasks.
    • Various frameworks supported

      • Hadoop: framework that can distribute and process large-scale data sets across the whole computer clusters using a simple programming model
      • Hbase: large-scale data storage that can be distributed and expanded
      • Spark: integrated analysis engine for processing large-scale data
      • Hive: data warehouse software for reading, inserting, and managing large-scale data sets in dispersed storage using SQL
      • Presto: dispersed SQL query engine for big data
    • Web UI provided for management and monitoring

      • A UI is provided for managing the information and status of the Cloud Hadoop cluster.
      • Provision of root access permissions for clusters enables complete control over the clusters, as well as allows the setting values of the framework to be checked or edited.

    Cloud Hadoop user guide

    NAVER Cloud Platform provides various resources and guides to help customers understand Cloud Hadoop better. If you are a developer or marketer in need of detailed information while you are considering adopting Cloud Hadoop for your company or establishing data related policies, then make good use of the resources below.

    Check FAQs first.

    Q. What are the benefits of using Cloud Hadoop?
    A. With Cloud Hadoop, users can freely use clusters in which open-source components are built. It also allows direct access to the server through a terminal. It is an installable cluster service where users can manage clusters directly with its convenient cluster management features provided through Ambari.

    Q. Which cluster node types are provided by Cloud Hadoop?
    A. Cloud Hadoop clusters are sets of clusters or nodes, configured for distributed storage and analysis of data. Depending on the purpose of the node, there are three types of nodes in a cluster.

    • Edge node: gateway node for external connections
    • Master node: admin node for monitoring worker nodes. 2 master nodes are created with high availability support, and the number of master nodes cannot be changed
    • Worker node: node that receives commands from master nodes and actually performs tasks such as data analysis. You can initially create 2 to 8 nodes. More nodes can be added/deleted dynamically afterward

    Q. How is Cloud Hadoop service configured?
    A. Cloud Hadoop is a service for building and managing clusters easily and conveniently. You can create components such as Hadoop, HBase, Spark, and Presto and build and operate a system for processing large-scale data. You can install open source frameworks that can process large amounts of data such as Apache Hadoop, HBase, Hive, and Spark on the cluster. For the configuration of the Cloud Hadoop service, see the following configuration diagram (architecture):

    chadoop-1_01_ko{height="" width="70%"}

    Q. network error: connection timed out occurs in the SSH access process in putty.
    A. When you have allowed ssh access (Port 22) in ACG but an error occurs in ssh access, ssh access (Port 22) is probably blocked in Network ACL (NACL). Allow ssh access (Port 22) in NACL.

    Q. What is the bandwidth of the NCP server?
    A. The basic bandwidth of the NCP server is around 1 Gbps (1 Gbit/sec).

    Q. In the process of reading data while using the NCP server, there is too much traffic overall. What should I do when there is too much network traffic use?
    A.

    • You can disperse data and traffic by adding several worker nodes.
    • You can save data in Object Storage by separating storage resources from computing resources, and read and save data of Object Storage by using the computing resources of Cloud Hadoop to reduce network traffic use.

    Q. In the Cloud Hadoop Ambari Metric service, what are the differences in features between the general operation status and the Maintenance mode operation status?
    A. The Maintenance mode feature provided by Ambari WebUI can be set by unit of service or host.

    • If you set the Maintenance mode, no notification can be sent.
    • If the Maintenance mode is set by unit of host (server), it is excluded from batch jobs such as service restart tasks when the batch jobs are conducted.

    Q. When running show tables in Hue, no View table list appears in Hive interpreter.
    A. When running show tables, only the general table list is exposed. You can run show views to check the View table list.

    Q. When I access Hive with an account that is not Hive and run a hive query, Permission denied error occurs.
    A. There are two solutions to this problem.

    • You can add the relevant account to Yarn Queue ACL. Log in to Ambari WebUI > select Yarn Queue Manager > select default (yarn queue) and add the relevant account to Users of Administer Queue and Users of Submit Applications.
    • If you use a hive account, you can use it without adding an account.

    Q. When I run hadoop fsck / and check the file system, an error occurs.
    A. fsck of hdfs can be run through an hdfs account. Log in to sshuser, convert the account into sudo su - hdfs, and then run it.

    Q. In the process of integrating Object Storage (S3) through Hive, a communication error occurs with S3.
    A. Check the Object Storage address for each Cloud Hadoop Region. Even if a server is within Public Subnet, if the master server has not received a public IP, you can communicate with only the private domain of Object Storage.

    Note

    kr.object.ncloudstorage.com

    Q. I intend to perform data migration with the Object Storage bucket. Can I connect several Hadoop Clusters to a single Object Storage bucket?
    A. You cannot select an Object Storage bucket designated when creating Cloud Hadoop to create another Cloud Hadoop. To conduct migration, you can use the following method.

    1. Create a new bucket on Object Storage and perform data upload.
    2. When creating a new Cloud Hadoop, select the new bucket with the data uploaded.

    Q. If I want to delete the currently used Cloud Hadoop cluster but use the relevant data as they are, what should I do?
    A. You can use the data as they are even if you delete the Cloud Hadoop cluster through the following methods.


    Was this article helpful?

    Changing your password will log you out immediately. Use the new password to log back in.
    First name must have atleast 2 characters. Numbers and special characters are not allowed.
    Last name must have atleast 1 characters. Numbers and special characters are not allowed.
    Enter a valid email
    Enter a valid password
    Your profile has been successfully updated.