Using Kudu
    • PDF

    Using Kudu

    • PDF

    Article Summary

    Available in VPC

    Kudu is a column-based storage used in the Hadoop platform environment. Kudu is specifically designed for the Hadoop ecosystem, enabling Spark, MapReduce, and other Hadoop ecosystem projects to process and analyze data natively. Like a general DBMS, it provides a primary key, enabling millisecond-level random access, and at the same time, it is optimized for large-scale sequential reading, so it can bridge the gap between HBase and Parquet. In addition, since it supports both OLAP and OLTP queries, Kudu can simplify the structure of big data analysis systems.

    Features of Kudu – Impala Integration

    The API provided by Kudu is simple CRUD operations. You can enter desired filter conditions when retrieving data, but since it provides only a simple data retrieval function, you need a separate query processing engine to execute complex queries using GROUP BY or JOIN. You can usually use Impala with Kudu to handle complex queries.

    • CREATE/ALTER/DROP TABLE: Impala supports create, alter, and delete table operations in Kudu. Tables follow the same internal/external approach as any other table in Impala, allowing flexible data processing and querying.
    • INSERT: Impala uses the same mechanism as HDFS and HBase to insert data into Kudu.
    • UPDATE/DELETE: Impala supports row-by-row or batch-type UPDATE and DELETE SQL commands for data stored in Kudu tables. In addition, complex join clauses can be specified in the FROM clause of a query.
    • Flexible Partitioning: similar to Hive's concept of table partitioning, Kudu dynamically partitions tables by a predefined number of tablets based on hashes or ranges to distribute write and query operations evenly across the cluster.
    • Parallel Scan: to achieve optimal performance on commodity hardware, the Kudu client used by Impala parallelizes scans across multiple tablets.

    Kudu architecture

    There are four major components that make up the Kudu service: Table, Tablet, Tablet Server, and Master Server.

    • Table
      A table is a place where Kudu data is stored and has a schema and a primary key. One table is divided into segments called tablets by the primary key and stored.

    • Tablet
      A tablet is a partition of a single table similar to partitions in a relational database. One tablet is replicated to multiple tablet servers, and one of the replicas is elected as Leader Tablet through the Raft Consensus Algorithm. All replicas can provide data read services, and write operations are performed through agreement between Tablet Server groups.

    • Tablet Server
      The tablet server is responsible for storing tablets and serving tablets to clients. Among the replicas, a certain tablet acts as a leader and the rest as a follower.

    • Master Server
      The master server tracks and manages all Tablets, Tablet Servers, Catalog Tables, and other metadata information related to the cluster. At a certain point in time, a single master serves as the leader, and if a problem occurs with the current leader, a new master is registered using the Raft Consensus algorithm. The master is also responsible for coordinating metadata operations to clients. Clients can access catalog tables through the master using the client API.

    chadoop-31-001.jpg

    Using Kudu

    It describes how to use Kudu.

    Create cluster

    From the NAVER Cloud Platform console, create the Cloud Hadoop cluster. For more information on creating clusters, see Create cluster.

    Note

    Starting with Cloud Hadoop 1.9, you can use clusters with Kudu v1.16.0 installed.
    chadoop-30-002_ko.jpg

    Check Kudu service in Ambari UI

    In the Ambari UI, you can view Kudu services as follows. You can start and stop each component of the service from this page.

    chadoop-31-002.jpg

    • Summary: check the host where the components are installed
    • Configs: change configurations of Kudu service
    • Quick Links: Kudu Master WEB-UI, Kudu Tserver WEB-UI
      • Accessing these links requires tunneling. Access through the web UI link provided by the console. For more details, see Access Kudu WEB UI.

    Access WEB UI

    You can access the Kudu WEB UI through [View by application] on the Cloud Hadoop console. For more information, see View by application.
    chadoop-31-003-kudu_en

    You can see the overall status of Kudu services on the Kudu WEB UI page.
    chadoop-31-004.jpg


    Was this article helpful?

    Changing your password will log you out immediately. Use the new password to log back in.
    First name must have atleast 2 characters. Numbers and special characters are not allowed.
    Last name must have atleast 1 characters. Numbers and special characters are not allowed.
    Enter a valid email
    Enter a valid password
    Your profile has been successfully updated.