Using Kafka
    • PDF

    Using Kafka

    • PDF

    Article Summary

    Available in VPC

    Apache Kafka is a distributed messaging system that guarantees high performance and stability. Kafka is used when processing a large amount of data in real time.

    Caution

    If you want to use the Kafka app, then create the Zookeeper app of the ZOOKEEPER-3.4.13 type first.

    Note

    Refer to the official Kafka document for more information about the Kafka app.

    Check Kafka app details

    When the app creation is completed, you can view the details. When the Status is Stable under the app's details, it means the app is running normally.
    The following describes how to check the app details.

    1. From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest > App menus, in that order.
    2. Select the account that owns the app.
    3. Click the app whose details you want to view.
    4. View the app details.
      df-kafka_2-1_en
    • Quick links
      • AppMaster: URL where the container log can be viewed When creating apps, all apps are submitted to the YARN queue. YARN provides a web UI where each app's details can be viewed.
      • kafka-manager: Kafka Manager's URL
      • Connecting String
        • zookeeper.connect: address of the Zookeeper ensemble as specified when creating the Kafka app
      • Component
        • broker: It saves and delivers messages between the producer who produces messages and the consumer who consumes them.
        • kafka-manager: Broker monitoring, topic management, and partition redistribution can be done.

    Configure use environment for Kafka app

    To use or develop with the running Kafka app, the app's use environment has to be configured first. The following explains how to configure the Kafka app's use environment in the Dev app as an example.

    1. To check the Kafka app's package and settings, write /home/forest/get-app-env.sh {appname} {install directory}.

      [test01@shell-0.dev.test01.kr.df.naverncp.com ~][df]$ mkdir kafka
      [test01@shell-0.dev.test01.kr.df.naverncp.com ~][df]$ /home/forest/get-app-env.sh kafka ./kafka/
      
      [/home/forest/get-app-env.sh] Apptype: KAFKA-2.4.0[/home/forest/get-app-env.sh] Download install-client script for KAFKA-2.4.0
      [/home/forest/get-app-env.sh] Install client on ./kafka/
      current kafka: .yarn/services/kafka/components/v1
      
      --2021-05-12 17:00:14--  http://dist.kr.df.naverncp.com/repos/release/kafka/kafka_2.12-2.4.0.tgz
      Resolving dist.kr.df.naverncp.com (dist.kr.df.naverncp.com)... 10.213.208.69
      Connecting to dist.kr.df.naverncp.com (dist.kr.df.naverncp.com)|10.213.208.69|:80... connected.
      HTTP request sent, awaiting response... 200 OK
      Length: 62283588 (59M) [application/octet-stream]
      Saving to: ‘./kafka//kafka_2.12-2.4.0.tgz’
      
      100%[============================================>] 62,283,588  --.-K/s   in 0.1s
      
      2021-05-12 17:00:14 (459 MB/s) - ‘./kafka//kafka_2.12-2.4.0.tgz’ saved [62283588/62283588]
      
      Kafka Client has been installed on ./kafka//kafka_2.12-2.4.0
      
    2. Check producer.properties and consumer.properties in the installation path as follows.

      [test01@shell-0.dev.test01.kr.df.naverncp.com ~][df]$ cat ./kafka/kafka_2.12-2.4.0/config/producer.properties
      #Generated by Apache Slider
      #Wed May 12 16:48:59 KST 2021
      bootstrap.servers=broker-0.kafka.test01.kr.df.naverncp.com\:9092,broker-1.kafka.test01.kr.df.naverncp.com\:9092,broker-2.kafka.test01.kr.df.naverncp.com\:9092
      compression.type=none
      
      [test01@shell-0.dev.test01.kr.df.naverncp.com ~][df]$ cat ./kafka/kafka_2.12-2.4.0/config/consumer.properties
      #Generated by Apache Slider
      #Wed May 12 16:48:59 KST 2021
      bootstrap.servers=broker-0.kafka.test01.kr.df.naverncp.com\:9092,broker-1.kafka.test01.kr.df.naverncp.com\:9092,broker-2.kafka.test01.kr.df.naverncp.com\:9092
      group.id=test-consumer-group
      
    3. Create a topic called "test" using kafka-topic.sh.
      Write kafka-topics.sh --bootstrap-server {bootstrap servers} --create --topic {topic name} --partitions {number of partitions}.

      [test01@shell-0.dev.test01.kr.df.naverncp.com ~][df]$ kafka/kafka_2.12-2.4.0/bin/kafka-topics.sh --bootstrap-server broker-0.kafka.test01.kr.df.naverncp.com\:9092,broker-1.kafka.test01.kr.df.naverncp.com\:9092,broker-2.kafka.test01.kr.df.naverncp.com\:9092 --create --topic test --partitions 6
      [test01@shell-0.dev.test01.kr.df.naverncp.com ~][df]$ kafka/kafka_2.12-2.4.0/bin/kafka-topics.sh --bootstrap-server broker-0.kafka.test01.kr.df.naverncp.com\:9092,broker-1.kafka.test01.kr.df.naverncp.com\:9092,broker-2.kafka.test01.kr.df.naverncp.com\:9092 --describe --topic test
      Topic: test     PartitionCount: 6       ReplicationFactor: 3    Configs: retention.bytes=50000000000
              Topic: test     Partition: 0    Leader: 1       Replicas: 1,0,2 Isr: 1,0,2
              Topic: test     Partition: 1    Leader: 0       Replicas: 0,2,1 Isr: 0,2,1
              Topic: test     Partition: 2    Leader: 2       Replicas: 2,1,0 Isr: 2,1,0
              Topic: test     Partition: 3    Leader: 1       Replicas: 1,2,0 Isr: 1,2,0
              Topic: test     Partition: 4    Leader: 0       Replicas: 0,1,2 Isr: 0,1,2
              Topic: test     Partition: 5    Leader: 2       Replicas: 2,0,1 Isr: 2,0,1
      
    4. This is an example of producing or consuming, using kafka-console-producer.sh and kafka-console-consumer.sh.

      [test01@shell-0.dev-1.test01.kr.df.naverncp.com ~][df]$ ./kafka/kafka_2.12-2.4.0/bin/kafka-console-producer.sh --broker-list broker-0.kafka.test01.kr.df.naverncp.com\:9092,broker-1.kafka.test01.kr.df.naverncp.com\:9092,broker-2.kafka.test01.kr.df.naverncp.com\:9092 --topic test
      >test1
      >test2
      >test3
      >test4                   
      >>^C
      [test01@shell-0.dev-1.test01.kr.df.naverncp.com ~][df]$ ./kafka/kafka_2.12-2.4.0/bin/kafka-console-consumer.sh --bootstrap-server broker-0.kafka.test01.kr.df.naverncp.com\:9092,broker-1.kafka.test01.kr.df.naverncp.com\:9092,broker-2.kafka.test01.kr.df.naverncp.com\:9092 --topic test --from-beginning 
      test2
      test4
      test3
      test1
      

    Use Kafka Manager

    You can use Kafka Manager to monitor brokers, manage topics, or redistribute partitions, etc.

    The following describes how to use Kafka Manager.

    1. From the NAVER Cloud Platform console, click the Products & Services > Big Data & Analytics > Data Forest > App menus, in that order.
    2. Select the account that owns the app, and then click the app.
    3. From the app's details, connect to the kafka-manager URL under Quick links.
    4. When the login page appears, enter Data Forest account and password to log in.
    5. Click the [Add cluster] button.
    6. Add a Kafka cluster.
      • Cluster Name: the app name
      • Cluster Zookeeper Hosts: zookeeper.connection
    7. Check the cluster information.
      df-kafka_05_vpc_ko

    Change Kafka broker

    It is recommended to operate so that the amount of data maintained by a broker is as small as possible. In other words, it's more stable to deal with one topic for each Kafka app rather than dealing with multiple topics in one Kafka app. Also, if it's difficult to predict how much data for a topic will increase, we recommend you to finely divide the partitions from the beginning and adjust the number of brokers based on the situation.

    The following describes how to change the number of brokers.

    1. From the NAVER Cloud Platform console, click the Products & Services > Big Data & Analytics > Data Forest > App menus, in that order.
    2. Select an account, select the app, and then click the [Flex] button.
    3. When the Flex change window appears, modify the number of brokers, and then click the [Modify] button.
      df-kafka_brokerC_vpc_en
    Note
    • If you add topics after increasing the number of brokers, then the partitions will be distributed including the increased brokers. However, the partitions from the existing topics are not automatically reassigned. So the user must reassign the partitions for existing topics manually in kafka-manager.
    • When reducing the number of brokers, those with the larger COMPONENT_ID are stopped first. If you have 10 brokers, then the exclusion order is as follows: broker-9, broker-8, broker-7, and so on.
    Caution

    You can prevent failure occurrence or data loss by reassigning the partitions of the brokers to be excluded to other brokers in advance.

    Cautions for using Kafka

    Local disk capacity and operation method

    Data is saved in local disks in each node of Data Forest cluster. However, the disk capacity per node is limited to about 300 GB.
    It is recommended to operate so that the amount of data maintained by a broker is as small as possible. In other words, it's more stable to deal with one topic for each Kafka app rather than dealing with multiple topics in one Kafka app. Also, if it's difficult to predict how much data for a certain topic will increase, we recommend you to finely divide the partitions from the beginning and adjust the number of brokers based on the situation.

    Caution

    If a large capacity is needed, then use the longlived_localdisk queue (for service), or prepare a dedicated queue composed of large-volume nodes to run the Kafka app.

    Node failure

    The Kafka app runs only one broker for one physical node. Therefore, if the replication factor value is 3 (default), it will not lose data, even if one or two nodes fail.

    Data retention

    The Kafka app's data is saved in Data Forest's local file system, and is not deleted, even when the user stops the app, or the app is ended due to other problems. The data is only deleted when the user destroys the stopped app.

    When the stopped app is started again, it will run on the node it previously ran on and the existing data is recovered. However, there may be cases where the previously used node's resource is occupied by another task, or the node has been excluded from service due to failures. In those cases, the app will give up trying to run on the previous node after one hour and run on a different node. Accordingly, the data of the broker not run during that time may get lost.

    Organize Zookeeper nodes

    When you destroy the app, it is recommended to delete the Zookeeper path used by the Kafka app: zookeeper.connect. Problems may occur if you create the app with an identical name without deleting it again.


    Was this article helpful?

    What's Next
    Changing your password will log you out immediately. Use the new password to log back in.
    First name must have atleast 2 characters. Numbers and special characters are not allowed.
    Last name must have atleast 1 characters. Numbers and special characters are not allowed.
    Enter a valid email
    Enter a valid password
    Your profile has been successfully updated.