Using Kudu

release/20240425
English

Using Kudu

Article Summary

Share feedback

Thanks for sharing your feedback!

Available in VPC

Kudu is a column-based storage used in the Hadoop platform environment. Kudu is specifically designed for the Hadoop ecosystem, enabling Spark, MapReduce, and other Hadoop ecosystem projects to process and analyze data natively. Like a general DBMS, it provides a primary key, enabling millisecond-level random access, and at the same time, it is optimized for large-scale sequential reading, so it can bridge the gap between HBase and Parquet. In addition, since it supports both OLAP and OLTP queries, Kudu can simplify the structure of big data analysis systems.

Features of Kudu – Impala Integration

The API provided by Kudu is simple CRUD operations. You can enter desired filter conditions when retrieving data, but since it provides only a simple data retrieval function, you need a separate query processing engine to execute complex queries using GROUP BY or JOIN. You can usually use Impala with Kudu to handle complex queries.

CREATE/ALTER/DROP TABLE: Impala supports create, alter, and delete table operations in Kudu. Tables follow the same internal/external approach as any other table in Impala, allowing flexible data processing and querying.
INSERT: Impala uses the same mechanism as HDFS and HBase to insert data into Kudu.
UPDATE/DELETE: Impala supports row-by-row or batch-type UPDATE and DELETE SQL commands for data stored in Kudu tables. In addition, complex join clauses can be specified in the FROM clause of a query.
Flexible Partitioning: similar to Hive's concept of table partitioning, Kudu dynamically partitions tables by a predefined number of tablets based on hashes or ranges to distribute write and query operations evenly across the cluster.
Parallel Scan: to achieve optimal performance on commodity hardware, the Kudu client used by Impala parallelizes scans across multiple tablets.

Kudu architecture

There are four major components that make up the Kudu service: Table, Tablet, Tablet Server, and Master Server.

Table
A table is a place where Kudu data is stored and has a schema and a primary key. One table is divided into segments called tablets by the primary key and stored.
Tablet
A tablet is a partition of a single table similar to partitions in a relational database. One tablet is replicated to multiple tablet servers, and one of the replicas is elected as Leader Tablet through the Raft Consensus Algorithm. All replicas can provide data read services, and write operations are performed through agreement between Tablet Server groups.
Tablet Server
The tablet server is responsible for storing tablets and serving tablets to clients. Among the replicas, a certain tablet acts as a leader and the rest as a follower.
Master Server
The master server tracks and manages all Tablets, Tablet Servers, Catalog Tables, and other metadata information related to the cluster. At a certain point in time, a single master serves as the leader, and if a problem occurs with the current leader, a new master is registered using the Raft Consensus algorithm. The master is also responsible for coordinating metadata operations to clients. Clients can access catalog tables through the master using the client API.

Using Kudu

It describes how to use Kudu.

Create cluster

From the NAVER Cloud Platform console, create the Cloud Hadoop cluster. For more information on creating clusters, see Create cluster.

Note

Starting with Cloud Hadoop 1.9, you can use clusters with Kudu v1.16.0 installed.

Check Kudu service in Ambari UI

In the Ambari UI, you can view Kudu services as follows. You can start and stop each component of the service from this page.

Summary: check the host where the components are installed
Configs: change configurations of Kudu service
Quick Links: Kudu Master WEB-UI, Kudu Tserver WEB-UI
- Accessing these links requires tunneling. Access through the web UI link provided by the console. For more details, see Access Kudu WEB UI.

Access WEB UI

You can access the Kudu WEB UI through [View by application] on the Cloud Hadoop console. For more information, see View by application.
chadoop-31-003-kudu_en

You can see the overall status of Kudu services on the Kudu WEB UI page.

Was this article helpful?

What's Next

Creating DB and managing account with MySQL stored procedures

Table of contents

Features of Kudu – Impala Integration
Kudu architecture
Using Kudu