- Print
- PDF
Using Kafka
- Print
- PDF
Available in VPC
Apache Kafka is a distributed messaging system that guarantees high performance and stability. Kafka is used when processing a large amount of data in real time.
If you want to use the Kafka app, then create the Zookeeper app of the ZOOKEEPER-3.4.13 type first.
Refer to the official Kafka document for more information about the Kafka app.
Check Kafka app details
When the app creation is completed, you can view the details. When the Status is Stable under the app's details, it means the app is running normally.
The following describes how to check the app details.
- From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest > App menus, in that order.
- Select the account that owns the app.
- Click the app whose details you want to view.
- View the app details.
- Quick links
- AppMaster: URL where the container log can be viewed When creating apps, all apps are submitted to the YARN queue. YARN provides a web UI where each app's details can be viewed.
- kafka-manager: Kafka Manager's URL
- Connecting String
- zookeeper.connect: address of the Zookeeper ensemble as specified when creating the Kafka app
- Component
- broker: It saves and delivers messages between the producer who produces messages and the consumer who consumes them.
- kafka-manager: Broker monitoring, topic management, and partition redistribution can be done.
Configure use environment for Kafka app
To use or develop with the running Kafka app, the app's use environment has to be configured first. The following explains how to configure the Kafka app's use environment in the Dev app as an example.
To check the Kafka app's package and settings, write
/home/forest/get-app-env.sh {appname} {install directory}
.[test01@shell-0.dev.test01.kr.df.naverncp.com ~][df]$ mkdir kafka [test01@shell-0.dev.test01.kr.df.naverncp.com ~][df]$ /home/forest/get-app-env.sh kafka ./kafka/ [/home/forest/get-app-env.sh] Apptype: KAFKA-2.4.0[/home/forest/get-app-env.sh] Download install-client script for KAFKA-2.4.0 [/home/forest/get-app-env.sh] Install client on ./kafka/ current kafka: .yarn/services/kafka/components/v1 --2021-05-12 17:00:14-- http://dist.kr.df.naverncp.com/repos/release/kafka/kafka_2.12-2.4.0.tgz Resolving dist.kr.df.naverncp.com (dist.kr.df.naverncp.com)... 10.213.208.69 Connecting to dist.kr.df.naverncp.com (dist.kr.df.naverncp.com)|10.213.208.69|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 62283588 (59M) [application/octet-stream] Saving to: ‘./kafka//kafka_2.12-2.4.0.tgz’ 100%[============================================>] 62,283,588 --.-K/s in 0.1s 2021-05-12 17:00:14 (459 MB/s) - ‘./kafka//kafka_2.12-2.4.0.tgz’ saved [62283588/62283588] Kafka Client has been installed on ./kafka//kafka_2.12-2.4.0
Check
producer.properties
andconsumer.properties
in the installation path as follows.[test01@shell-0.dev.test01.kr.df.naverncp.com ~][df]$ cat ./kafka/kafka_2.12-2.4.0/config/producer.properties #Generated by Apache Slider #Wed May 12 16:48:59 KST 2021 bootstrap.servers=broker-0.kafka.test01.kr.df.naverncp.com\:9092,broker-1.kafka.test01.kr.df.naverncp.com\:9092,broker-2.kafka.test01.kr.df.naverncp.com\:9092 compression.type=none [test01@shell-0.dev.test01.kr.df.naverncp.com ~][df]$ cat ./kafka/kafka_2.12-2.4.0/config/consumer.properties #Generated by Apache Slider #Wed May 12 16:48:59 KST 2021 bootstrap.servers=broker-0.kafka.test01.kr.df.naverncp.com\:9092,broker-1.kafka.test01.kr.df.naverncp.com\:9092,broker-2.kafka.test01.kr.df.naverncp.com\:9092 group.id=test-consumer-group
Create a topic called "test" using kafka-topic.sh.
Writekafka-topics.sh --bootstrap-server {bootstrap servers} --create --topic {topic name} --partitions {number of partitions}
.[test01@shell-0.dev.test01.kr.df.naverncp.com ~][df]$ kafka/kafka_2.12-2.4.0/bin/kafka-topics.sh --bootstrap-server broker-0.kafka.test01.kr.df.naverncp.com\:9092,broker-1.kafka.test01.kr.df.naverncp.com\:9092,broker-2.kafka.test01.kr.df.naverncp.com\:9092 --create --topic test --partitions 6 [test01@shell-0.dev.test01.kr.df.naverncp.com ~][df]$ kafka/kafka_2.12-2.4.0/bin/kafka-topics.sh --bootstrap-server broker-0.kafka.test01.kr.df.naverncp.com\:9092,broker-1.kafka.test01.kr.df.naverncp.com\:9092,broker-2.kafka.test01.kr.df.naverncp.com\:9092 --describe --topic test Topic: test PartitionCount: 6 ReplicationFactor: 3 Configs: retention.bytes=50000000000 Topic: test Partition: 0 Leader: 1 Replicas: 1,0,2 Isr: 1,0,2 Topic: test Partition: 1 Leader: 0 Replicas: 0,2,1 Isr: 0,2,1 Topic: test Partition: 2 Leader: 2 Replicas: 2,1,0 Isr: 2,1,0 Topic: test Partition: 3 Leader: 1 Replicas: 1,2,0 Isr: 1,2,0 Topic: test Partition: 4 Leader: 0 Replicas: 0,1,2 Isr: 0,1,2 Topic: test Partition: 5 Leader: 2 Replicas: 2,0,1 Isr: 2,0,1
This is an example of producing or consuming, using kafka-console-producer.sh and kafka-console-consumer.sh.
[test01@shell-0.dev-1.test01.kr.df.naverncp.com ~][df]$ ./kafka/kafka_2.12-2.4.0/bin/kafka-console-producer.sh --broker-list broker-0.kafka.test01.kr.df.naverncp.com\:9092,broker-1.kafka.test01.kr.df.naverncp.com\:9092,broker-2.kafka.test01.kr.df.naverncp.com\:9092 --topic test >test1 >test2 >test3 >test4 >>^C [test01@shell-0.dev-1.test01.kr.df.naverncp.com ~][df]$ ./kafka/kafka_2.12-2.4.0/bin/kafka-console-consumer.sh --bootstrap-server broker-0.kafka.test01.kr.df.naverncp.com\:9092,broker-1.kafka.test01.kr.df.naverncp.com\:9092,broker-2.kafka.test01.kr.df.naverncp.com\:9092 --topic test --from-beginning test2 test4 test3 test1
Use Kafka Manager
You can use Kafka Manager to monitor brokers, manage topics, or redistribute partitions, etc.
The following describes how to use Kafka Manager.
- From the NAVER Cloud Platform console, click the Products & Services > Big Data & Analytics > Data Forest > App menus, in that order.
- Select the account that owns the app, and then click the app.
- From the app's details, connect to the kafka-manager URL under Quick links.
- When the login page appears, enter Data Forest account and password to log in.
- Click the [Add cluster] button.
- Add a Kafka cluster.
- Cluster Name: the app name
- Cluster Zookeeper Hosts: zookeeper.connection
- Check the cluster information.
Change Kafka broker
It is recommended to operate so that the amount of data maintained by a broker is as small as possible. In other words, it's more stable to deal with one topic for each Kafka app rather than dealing with multiple topics in one Kafka app. Also, if it's difficult to predict how much data for a topic will increase, we recommend you to finely divide the partitions from the beginning and adjust the number of brokers based on the situation.
The following describes how to change the number of brokers.
- From the NAVER Cloud Platform console, click the Products & Services > Big Data & Analytics > Data Forest > App menus, in that order.
- Select an account, select the app, and then click the [Flex] button.
- When the Flex change window appears, modify the number of brokers, and then click the [Modify] button.
- If you add topics after increasing the number of brokers, then the partitions will be distributed including the increased brokers. However, the partitions from the existing topics are not automatically reassigned. So the user must reassign the partitions for existing topics manually in kafka-manager.
- When reducing the number of brokers, those with the larger COMPONENT_ID are stopped first. If you have 10 brokers, then the exclusion order is as follows: broker-9, broker-8, broker-7, and so on.
You can prevent failure occurrence or data loss by reassigning the partitions of the brokers to be excluded to other brokers in advance.
Cautions for using Kafka
Local disk capacity and operation method
Data is saved in local disks in each node of Data Forest cluster. However, the disk capacity per node is limited to about 300 GB.
It is recommended to operate so that the amount of data maintained by a broker is as small as possible. In other words, it's more stable to deal with one topic for each Kafka app rather than dealing with multiple topics in one Kafka app. Also, if it's difficult to predict how much data for a certain topic will increase, we recommend you to finely divide the partitions from the beginning and adjust the number of brokers based on the situation.
If a large capacity is needed, then use the longlived_localdisk queue (for service), or prepare a dedicated queue composed of large-volume nodes to run the Kafka app.
Node failure
The Kafka app runs only one broker for one physical node. Therefore, if the replication factor value is 3 (default), it will not lose data, even if one or two nodes fail.
Data retention
The Kafka app's data is saved in Data Forest's local file system, and is not deleted, even when the user stops the app, or the app is ended due to other problems. The data is only deleted when the user destroys the stopped app.
When the stopped app is started again, it will run on the node it previously ran on and the existing data is recovered. However, there may be cases where the previously used node's resource is occupied by another task, or the node has been excluded from service due to failures. In those cases, the app will give up trying to run on the previous node after one hour and run on a different node. Accordingly, the data of the broker not run during that time may get lost.
Organize Zookeeper nodes
When you destroy the app, it is recommended to delete the Zookeeper path used by the Kafka app: zookeeper.connect
. Problems may occur if you create the app with an identical name without deleting it again.