Available in VPC
This guide describes how to replace Kafka clusters by creating a separate VPC server and using Confluent Replicator to replicate data between Kafka clusters in real time.
This process may help change clusters created in Cent Linux with clusters applied with Rocky Linux.
Preliminary tasks
Preliminary tasks before proceeding with this guide are as follows:
- BrokerNode access information of the existing Cloud Data Streaming Service cluster
- BrokerNode access information of the new Cloud Data Streaming Service cluster with the same Kafka version as before
- In this document, the existing cluster is defined as A, and the new cluster is defined as B.
Cluster transition order
To transition to a newly created cluster:
- Complete creating VPC Server, installing and applying Confluent Replicator, and setting ACG, following the next guide.
- Check and validate whether the topic and data of the existing cluster are replicated to the newly created cluster properly.
- Change BootstrapUrl to the cluster where the consumer was newly created.
- Change BootstrapUrl to the cluster where the producer was newly created.
- Check and validate whether the queue of the Kafka Topic of the existing cluster was processed completely.
- Return the VPC Server with Confluent Replicator installed and the existing cluster after stopping them.
Creating VPC Server
STEP 1. Create server
Create a server where Confluent Replicator is to be installed.
- From the NAVER Cloud Platform console's VPC environment, navigate to
> Services > Compute > Server. - Click the [Create server] to create a server. (Server creation guide)
- When you create a server, create a server with the identical subnet with the cluster's BrokerNode or with the same type (Private) of subnet.
- Connect to the created server through SSH to prepare for Confluent Replicator installation. (Server connection guide)
- As for the VPC server created in the private subnet, NATG/W is needed additionally, and router setting is required.
Setting network ACG
STEP 1. Configure ACG
The newly created VPC Server should be able to access 9092 port of the Cloud Data Streaming Service Cluster's broker node.
To add an ACG for this:
- From the NAVER Cloud Platform console's VPC environment, navigate to
> Services > Big Data & Analytics > Cloud Data Streaming Service. - Select Cluster and click the Open in new window button located next to the BrokerNode ACG.
- Select the ACG that has the same BrokerNode ACG name or that has the same ACGID and click the [Set ACG] at the top.
- Enter the following information to the inbound tab and click [Add].
- Protocol: TCP
- Access source: The IP of the VPC Server or the name of the ACG to which the VPC Server belongs
- Allowed port: 9092
- Click the [Apply] at the bottom.
- Following the same method, register ACG with both the cluster currently in use and the new cluster.
Install Confluent Replicator
This section introduces an example of installing Confluent Replicator on a server. You need to check the access information for the Kafka broker node when installing it. You may have to check the availability of Confluent Replicator depending on the license. NAVER Cloud Platform only provides a guide on how to use it. You need to check the access information for the Kafka Broker Node when installing it.
STEP 1. Install Java
- Install jdk by entering the following command.
At least version 1.8 or Java 11 or higher is recommended.sudo yum install java-devel -y
STEP 2. Install Confluent Replicator
Confluent Replicator is provided as a part of Confluent Platform. Install Confluent Platform on the created VPC Server and download the Replicator connector.
Download Confluent Platform
You can download Confluent Platform from the Confluent download page.
curl -O https://packages.confluent.io/archive/7.5/confluent-7.5.0.tar.gz
Unzip the downloaded file and go to the directory.
tar -xzf confluent-7.5.0.tar.gz
cd confluent-7.5.0
Set the Confluent Platform installation directory path as an environment variable. (Edit ~/.bashrc or ~/.bash_profile)
export CONFLUENT_HOME=/path/to/confluent-7.5.0
export PATH=$CONFLUENT_HOME/bin:$PATH
Apply the changes.
source ~/.bashrc
or
source ~/.bash_profile
Install Replicator through Confluent Hub
Install the latest version of Replicator's connector.
confluent-hub install confluentinc/kafka-connect-replicator:latest
STEP 3. Check and configure Kafka cluster
Check if Replicator installed on the VPC server can access the brokers of both clusters. Replicator should be able to access the port where the Kafka broker is in operation, and the access must be allowed by ACG setting.
- Kafka topic checking command examples
kafka-topics --bootstrap-server A_CLUSTER_BROKER_1_IP:9092,A_CLUSTER_BROKER_2_IP:9092 --list kafka-topics --bootstrap-server B_CLUSTER_BROKER_1_IP:9092,B_CLUSTER_BROKER_2_IP:9092 --list
When migrating specific topics, Cluster B must have the same topics as Cluster A. The following example is when there is a topic with the name "example-topic" in Cluster A.
- To automatically create and migrate an entire topic, refer to STEP 4. Write Replicator configuration file.
kafka-topics --bootstrap-server B_CLUSTER_BROKER_1_IP:9092 --create --topic example-topic --partitions 3 --replication-factor 2
STEP 4. Write Replicator configuration file
Write the Kafka Replicator configuration file (replicator.properties) in the middle server. This file includes necessary settings to replicate data between 2 clusters. Edit the /path/to/confluent-7.5.0/etc/kafka-connect-replicator/quickstart-replicator.properties file with the following information.
# Configure connector name
name=replicator-connector
connector.class=io.confluent.connect.replicator.ReplicatorSourceConnector
key.converter=io.confluent.connect.replicator.util.ByteArrayConverter
value.converter=io.confluent.connect.replicator.util.ByteArrayConverter
header.converter=io.confluent.connect.replicator.util.ByteArrayConverter
tasks.max=4
# Connection information of the source cluster
src.kafka.bootstrap.servers=A_CLUSTER_BROKER_1_IP:9092,A_CLUSTER_BROKER_2_IP:9092
# Connection information of the target cluster
dest.kafka.bootstrap.servers=B_CLUSTER_BROKER_1_IP:9092,B_CLUSTER_BROKER_2_IP:9092
# Whether to create topics automatically when there is no topic in the target cluster
topic.auto.create=true
# Name of the topic to replicate (topic1|topic2|topic3 are available when there are multiple topics)
topic.whitelist=example-topic
# Name of the topic to exclude from replication
#topic.blacklist=
# When you need to redefine the name of the topic to be replicated (it is created as existingTopicName.replica)
topic.rename.format=${topic}.replica
# Setting retry interval when topic replication fails (10 seconds)
topic.create.backoff.ms=10000
# Settings for offset management
offset.storage.topic=replicator-offsets
config.storage.topic=replicator-configs
status.storage.topic=replicator-status
STEP 5. Write connector and Replicator configuration files
Edit the connect-standalone.properties file to use the configuration file to run Kafka Connect in standalone mode.
bootstrap.servers=B_CLUSTER_BROKER_1_IP:9092,B_CLUSTER_BROKER_2_IP:9092
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=true
offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=10000
plugin.path=/usr/share/java,/root/confluent-7.5.0/share/confluent-hub-components
STEP 6. Run Replicator
Use Kafka Connect with the changed configuration file applied to run Replicator in standalone mode.
Replication begins as soon as Replicator runs.
connect-standalone $CONFLUENT_HOME/etc/kafka/connect-standalone.properties $CONFLUENT_HOME/etc/kafka-connect-replicator/quickstart-replicator.properties
Log files are stored in the $CONFLUENT_HOME/logs directory.
STEP 7. Troubleshooting and additional settings
- Access error: Check if the 9092 port is allowed in the VPC server and the cluster's broker node ACG.
- Data delay issue: When the replication speed is low, increase the consumer.threads value or check the performance of the broker.
- Replication of various topics: You can use the regular expression in the topic.whitelist configuration to replicate various topics, or use the topic.blacklist option to exclude certain topics.
topic.whitelist=topic1|topic2|topic3
topic.blacklist=topic4|topic5|topic6