Using Kafka

Available in VPC

Apache Kafka is a distributed messaging system that guarantees high performance and security. Use Kafka when processing large volumes of data in real time.

Caution

To use the Kafka app, first create a Zookeeper app of type ZOOKEEPER-3.4.13.

Note

For more information on the Kafka app, see the Kafka official documentation.

Check Kafka app details

Once the app is created, you can view its details. If the Status in the app details is Stable, the app is running normally.
To view app details:

In the VPC environment on the NAVER Cloud Platform console, navigate to > Services > Big Data & Analytics > Data Forest.
Click Data Forest > Apps on the left.
Select the account that owns the app.
Click the app to view its details.
Review the app details.
- Quick links
  - kafka-manager: Kafka Manager URL
- Connecting String
  - zookeeper.connect: Zookeeper ensemble address designated when creating the Kafka app.
- Component
  - broker: Stores and delivers messages between producers and consumers.
  - kafka-manager: Enables broker monitoring, topic management, and partition reassignment.

Configure Kafka app environmental requirements

To use or develop the running Kafka app, you must first configure the app's environmental requirements. Describes how to configure the Kafka app's environmental requirements in the Dev app as an example.

To view the Kafka app's packages and settings, write it as follows: /home/forest/get-app-env.sh {appname} {install directory}.

[test01@shell-0.dev.test01.kr.df.naverncp.com ~][df]$ mkdir kafka
[test01@shell-0.dev.test01.kr.df.naverncp.com ~][df]$ /home/forest/get-app-env.sh kafka ./kafka/

[/home/forest/get-app-env.sh] Apptype: KAFKA-2.4.0[/home/forest/get-app-env.sh] Download install-client script for KAFKA-2.4.0
[/home/forest/get-app-env.sh] Install client on ./kafka/
current kafka: .yarn/services/kafka/components/v1

--2021-05-12 17:00:14--  http://dist.kr.df.naverncp.com/repos/release/kafka/kafka_2.12-2.4.0.tgz
Resolving dist.kr.df.naverncp.com (dist.kr.df.naverncp.com)... 10.213.208.69
Connecting to dist.kr.df.naverncp.com (dist.kr.df.naverncp.com)|10.213.208.69|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 62283588 (59M) [application/octet-stream]
Saving to: ‘./kafka//kafka_2.12-2.4.0.tgz’

100%[============================================>] 62,283,588  --.-K/s   in 0.1s

2021-05-12 17:00:14 (459 MB/s) - ‘./kafka//kafka_2.12-2.4.0.tgz’ saved [62283588/62283588]

Kafka Client has been installed on ./kafka//kafka_2.12-2.4.0

Check producer.properties and consumer.properties in the installation path as follows.

[test01@shell-0.dev.test01.kr.df.naverncp.com ~][df]$ cat ./kafka/kafka_2.12-2.4.0/config/producer.properties
#Generated by Apache Slider
#Wed May 12 16:48:59 KST 2021
bootstrap.servers=broker-0.kafka.test01.kr.df.naverncp.com\:9092,broker-1.kafka.test01.kr.df.naverncp.com\:9092,broker-2.kafka.test01.kr.df.naverncp.com\:9092
compression.type=none

[test01@shell-0.dev.test01.kr.df.naverncp.com ~][df]$ cat ./kafka/kafka_2.12-2.4.0/config/consumer.properties
#Generated by Apache Slider
#Wed May 12 16:48:59 KST 2021
bootstrap.servers=broker-0.kafka.test01.kr.df.naverncp.com\:9092,broker-1.kafka.test01.kr.df.naverncp.com\:9092,broker-2.kafka.test01.kr.df.naverncp.com\:9092
group.id=test-consumer-group

Create the topic "test" using kafka-topic.sh.
Write it as follows: kafka-topics.sh --bootstrap-server {bootstrap servers} --create --topic {topic name} --partitions {number of partitions}.

[test01@shell-0.dev.test01.kr.df.naverncp.com ~][df]$ kafka/kafka_2.12-2.4.0/bin/kafka-topics.sh --bootstrap-server broker-0.kafka.test01.kr.df.naverncp.com\:9092,broker-1.kafka.test01.kr.df.naverncp.com\:9092,broker-2.kafka.test01.kr.df.naverncp.com\:9092 --create --topic test --partitions 6
[test01@shell-0.dev.test01.kr.df.naverncp.com ~][df]$ kafka/kafka_2.12-2.4.0/bin/kafka-topics.sh --bootstrap-server broker-0.kafka.test01.kr.df.naverncp.com\:9092,broker-1.kafka.test01.kr.df.naverncp.com\:9092,broker-2.kafka.test01.kr.df.naverncp.com\:9092 --describe --topic test
Topic: test     PartitionCount: 6       ReplicationFactor: 3    Configs: retention.bytes=50000000000
        Topic: test     Partition: 0    Leader: 1       Replicas: 1,0,2 Isr: 1,0,2
        Topic: test     Partition: 1    Leader: 0       Replicas: 0,2,1 Isr: 0,2,1
        Topic: test     Partition: 2    Leader: 2       Replicas: 2,1,0 Isr: 2,1,0
        Topic: test     Partition: 3    Leader: 1       Replicas: 1,2,0 Isr: 1,2,0
        Topic: test     Partition: 4    Leader: 0       Replicas: 0,1,2 Isr: 0,1,2
        Topic: test     Partition: 5    Leader: 2       Replicas: 2,0,1 Isr: 2,0,1

It is an example of production/consumption using kafka-console-producer.sh and kafka-console-consumer.sh.

[test01@shell-0.dev-1.test01.kr.df.naverncp.com ~][df]$ ./kafka/kafka_2.12-2.4.0/bin/kafka-console-producer.sh --broker-list broker-0.kafka.test01.kr.df.naverncp.com\:9092,broker-1.kafka.test01.kr.df.naverncp.com\:9092,broker-2.kafka.test01.kr.df.naverncp.com\:9092 --topic test
>test1
>test2
>test3
>test4                   
>>^C
[test01@shell-0.dev-1.test01.kr.df.naverncp.com ~][df]$ ./kafka/kafka_2.12-2.4.0/bin/kafka-console-consumer.sh --bootstrap-server broker-0.kafka.test01.kr.df.naverncp.com\:9092,broker-1.kafka.test01.kr.df.naverncp.com\:9092,broker-2.kafka.test01.kr.df.naverncp.com\:9092 --topic test --from-beginning 
test2
test4
test3
test1

Use Kafka Manager

You can monitor brokers, manage topics, and redistribute partitions using Kafka Manager.

To use Kafka Manager:

In the VPC environment on the NAVER Cloud Platform console, navigate to > Services > Big Data & Analytics > Data Forest.
Click Data Forest > Apps on the left.
Select the account that owns the app and click the app.
Access the kafka-manager URL of Quick links in the app details.
When the login interface appears, log in using the user name (KM_USERNAME) and the password (KM_PASSWORD), which you entered in the user settings step upon creating the app.
Click [Add Cluster].
Add the Kafka cluster.
- Cluster Name: App name
- Cluster Zookeeper Hosts: Zookeeper.connect in the console Kafka app details
Check the cluster information.

Change Kafka broker count

It is recommended to operate 1 broker with a small amount of data. That is, addressing 1 topic per 1 Kafka app is more stable than addressing multiple topics in 1 Kafka app. When topic data growth is difficult to predict, it is recommended to start with smaller partitions and adjust the number of brokers as needed.

To change the number of brokers:

In the VPC environment on the NAVER Cloud Platform console, navigate to > Services > Big Data & Analytics > Data Forest > Data Forest > Apps.
Select your account, select the app, and click [Flex].
When the flex changing window appears, edit the number of brokers, and click [Edit].

df-kafka_brokerC_vpc_updated_ko

Note

When you add topics after increasing the number of brokers, partitions are distributed, including the increased brokers. However, the partitions of an existing topic are not automatically reassigned, so for the existing topics, you must reassign the partitions in kafka-manager manually.
When reducing the number of brokers, those with higher COMPONENT_ID values are stopped first. If there are 10 brokers, broker-9, broker-8, broker-7, and so on are excluded in order.

Caution

To prevent failure occurrence and data loss, reassign the broker's partitions to be excluded to other brokers in advance.

Cautions for using Kafka app

Local disk capacity and operation method

The data is stored in the local disk of each Data Forest cluster node. However, the maximum capacity of the disk per node is about 300 GB.
If possible, it is recommended to operate 1 broker with a small amount of data. That is, addressing 1 topic per 1 Kafka app is more stable than addressing multiple topics in 1 Kafka app. When it is difficult to predict which topics experience data growth, it is recommended to start with smaller partitions and adjust the number of brokers as needed.

Caution

If high capacity is needed, run the Kafka app by using the longlived_localdisk queue (for services) or preparing the dedicated queue that consists of high capacity nodes.

Node failure

The Kafka app runs only 1 broker for 1 physical node. Therefore, if the replication factor is 3 (default), no data loss occurs even in the event of 1 or 2 node failures.

Data retention

The Kafka app's data is stored in Data Forest's local file system. Even if you stop the app or it is closed due to other issues, the data is not deleted. The data is only deleted when you destroy the stopped app.

When you start the stopped app again, it runs on the previously running node, so the data is recovered. However, the node's resources may be occupied by other tasks, or the node may be excluded due to a failure. In this case, after 1 hour, the app stops running on the existing note and starts running to another node. Therefore, data on a broker that is not running during this time may be lost.

Clean up Zookeeper node

When you destroy the app, it is recommended to delete zookeeper.connect, the Zookeeper path used by the Kafka app. If you create the app with the same name again without deleting the path, issues may occur.