Apache Kafka Configuration| Kafka Settings

We will go through Apache Kafka Configuration settings which you need to do as part of setting up Apache Kafka.

Four Kafka Component Settings

  • Broker Settings
  • Producer Settings
  • Consumer Settings
  • Zookeeper Configuration with Kafka

Apache Kafka Configuration

1. Broker Settings

The Overall Performance of Kafka depends on the following Sub-settings.

1. Connection Settings

Zooker session timeout default value is 30000 ms(milliseconds).

zookeeper.session.timeout.ms

Within this time the server sends Zookeeper heartbeat signals, if it fails to do so the server is considered to be dead.
Do not set this value too low, otherwise it will falsely consider a server dead, also do not set this value too high, otherwise zookeeper will take too long to determine a truly dead server.

2. Topic Settings

For each topic, Kafka maintains a structured commit log with one or more partitions. In general, the more the partitions in a Kafka cluster, more parallel consumers can be added, resulting in higher throughput.

Important Topic Properties

auto.create.topics.enable

With this property set to true nonexistent topics get created automatically with a default replication factor

default.replication.factor

For high availability production systems, you should set this value to at least 3.

num.partitions

For automatically created topics it’s default value is 1. You can change based on requirements.

delete.topic.enable

This allows users to delete a topic from Kafka using the admin tool, if this property is turned off then Deleting a topic through the admin tool will have no effect.
By default this feature is turned off (set to false).

3. Log Settings

log.roll.hours

The maximum time, in hours, before a new log segment is rolled out. The default value is 168 hours (seven days).

This setting controls the period of time after which Kafka will force the log to roll, even if the segment file is not full. This ensures that the retention process is able to delete or compact old data.

log.retention.hours

The number of hours to keep a log file before deleting it. The default value is 168 hours (seven days).

log.dirs

A comma-separated list of directories in which log data is kept. If you have multiple disks, list all directories under each disk.

log.retention.bytes

The amount of data to retain in the log for each topic partition. By default, log size is unlimited.

If log.retention.hours and log.retention.bytes are both set, Kafka deletes a segment when either limit is exceeded.

log.segment.bytes

The log for a topic partition is stored as a directory of segment files. This setting controls the maximum size of a segment file before a new segment is rolled over in the log. The default is 1 GB.

Log Flush Management

log.flush.interval.messages

Specifies the number of messages to accumulate on a log partition before Kafka forces a flush of data to disk.

log.flush.scheduler.interval.ms 

Specifies the amount of time (in milliseconds) after which Kafka checks to see if a log needs to be flushed to disk.

log.segment.bytes

Specifies the size of the log file. Kafka flushes the log file to disk whenever a log file reaches its maximum size.

log.roll.hours 

Specifies the maximum length of time before a new log segment is rolled out (in hours); this value is secondary to log.roll.ms. Kafka flushes the log file to disk whenever a log file reaches this time limit.

4. Compacting Settings

log.cleaner.dedupe.buffer.size

Specifies total memory used for log de-duplication across all cleaner threads.

By default, 128 MB of buffer is allocated.

log.cleaner.io.buffer.size

Specifies the total memory used for log cleaner I/O buffers across all cleaner threads. By default, 512 KB of buffer is allocated.

5. General Broker Settings

auto.leader.rebalance.enable

Enables automatic leader balancing, default is enabled.

unclean.leader.election.enable

This property allows you to specify a preference of availability or durability. This is an important setting: If availability is more important than avoiding data loss, ensure that this property is set to true. If preventing data loss is more important than availability, set this property to false.

This property is set to true by default, which favors availability.

controlled.shutdown.enable

Enables controlled shutdown of the server. The default is enabled.

min.insync.replicas

When a producer sets acks to “all”, min.insync.replicas specifies the minimum number of replicas that must acknowledge a write for the write to be considered successful. If this minimum cannot be met, then the producer will raise an exception.

You should set min.insync.replicas to 2 for replication factor equal to 3.

message.max.bytes

Specifies the maximum size of message that the server can receive.

broker.rack

The rack awareness feature distributes replicas of a partition across different racks.

2. Producer Settings

The lifecycle of a request from producer to broker involves several configuration settings:

The producer polls for a batch of messages from the batch queue, one batch per partition. A batch is ready when one of the following is true:a. batch.size is reached. Note: Larger batches typically have better compression ratios and higher throughput, but they have higher latency.

a. batch.size is reached. Note: Larger batches typically have better compression ratios and higher throughput, but they have higher latency.

b. linger.ms (time-based batching threshold) is reached. Note: There is no simple guideline for setting linger.ms values; you should test settings on specific use cases. For small events (100 bytes or less), this setting does not appear to have much impact.

c. Another batch to the same broker is ready.

d. The producer calls flush() or close().

Some additional settings

max.in.flight.requests.per.connection (pipelining)
compression.type

It accepts standard compression codecs (‘gzip’, ‘snappy’, ‘lz4’), as well as ‘uncompressed’ (the default, equivalent to no compression).

acks

The acks setting specifies acknowledgments that the producer requires the leader to receive before considering a request complete. This setting defines the durability level for the producer.
if Acks = 0; it means High Throughput , Low latency
if Acks = 1; it means medium Throughput , medium latency
if Acks = -1; it means low Throughput , High latency

flush() : which makes all buffered records immediately available to send (even if linger.ms is greater than 0).

3. Consumer Settings

One basic guideline for consumer performance is to keep the number of consumer threads equal to the partition count.

4. Zookeeper Configuration with Kafka

Some recommendations :

  • Do not run ZooKeeper on a server where Kafka is running.
  • Make sure you allocate sufficient JVM memory. A good starting point is 4GB.
  • To monitor the ZooKeeper instance, use JMX metrics.
  • When using ZooKeeper with Kafka you should dedicate ZooKeeper to Kafka, and not use ZooKeeper for any other components.

Summary

In this article we saw some configuration settings of Kafka components for it to run with high performance, we saw some recommended settings and what each setting means.
I Hope you liked the article !


Leave a Comment