Apache Kafka Architecture and Components

Cluster Architecture of Apache Kafka

Kafka-architecture-broker-zookeeper-consumer-producer-min

Apache Kafka Main Components

Cluster

It is a group of computers , each executing same instance of kafka broker.

Broker

It is just a meaningful name given to the kafka server, kafka producer does not directly interact with the consumer, they use kafka broker as the agent or broker to interact. In a cluster there can be more than one brokers.

Brokers are stateless, hence to maintain the cluster state they use ZooKeeper.

Zookeeper

ZooKeeper is used for managing and coordinating Kafka broker. ZooKeeper service is mainly used to notify producer and consumer about the presence of any new broker in the Kafka cluster system or failure of the broker in the Kafka cluster system. As per the notification received by the Zookeeper regarding presence or failure of the broker then producer and consumer takes decision and starts coordinating their task with some other broker.

Producers

Producer is a component which pushes data to the brokers, it doesn’t wait for acknowledgement from the brokers rather sends data as fast as the brokers can handle. There can be more than one producers depending on the use case.

Consumers

Since Kafka brokers are stateless, which means that the consumer has to maintain how many messages have been consumed by using partition offset. If the consumer acknowledges a particular message offset, it implies that the consumer has consumed all prior messages. The consumer issues an asynchronous pull request to the broker to have a buffer of bytes ready to consume. The consumers can kind of rewind or skip to any point in a partition simply by supplying an offset value. Consumer offset value is notified by ZooKeeper.

Kafka-topic-partitions-min

Kafka topic

A kafka topic is a logical channel to which producers publish messages and from which the consumers receive messages.

A topic name must be unique so that it is identifiable by both producer and consumer, there can be any number of topics, we cannot modify the data once published.

A Topic may contain any number of partitions as shown in the picture above.

Partitions in kafka

As you know broker store data of a topic, this data can be huge, break the data into two or more parts and distribute it to multiple computers.

In a Kafka cluster, Topics are split into Partitions and also replicated across brokers.

One can also add a key to the message to get ensured that all the messages with this key will end up in the same partition if the message is produced with the key. Because of this kafka also offers message sequence guarantee.

Otherwise without a key data is written to partitions randomly.

Offset

It is the sequence id given to a message in a partition, an offset is local to a partition, There can be any number of partitions, with no limitations to it.

— partition 1

— partition 2

Each partition sits on a single machine.

Note: How to directly locate a message ?

You need to know 3 things:

  • Topic name
  • Partition number
  • Offset

Topic replication factor

It is always a good design decision to have a replication factor of a topic. It helps when a broker goes down the replica will still have the topic data. For example if the replication factor is 2 then a broker will have atleast one additional copy other than the primary.

Replication takes places at partition level only.

There has to be a leader among Brokers for a given partition and that will be only one. The number of replication factor cannot be greater than the number of available brokers.

Kafka-topic-replication-factor-min

Consumer Group

Scenario : when hundreds of producers produce data to a single consumer, it’s hard to manage its volume and velocity.

Partitioning and consumer group is a tool for scalability, Maximum number of consumers in a group is equal to the total number of partitions you have on the topic.

Kafka doesn’t allow more than 2 consumers to read data from the same partition.

Also one consumer group will have one unique group id.

Summary

We learn about Kakfa features, its uses, usecases and core apis, Hope you liked it !


Leave a Comment