Apache Kafka - Introduction

What is Kafka ?

Apache Kafka is a distributed streaming platform. It is a fast, scalable and fault-tolerant publish subscribe messaging system. It is considered to be near real time when communicating between different applications and systems. Kafka originally was developed by linkedIn, later open sourced in 2011.

Before going into the details of Apache Kafka, let’s understand what is a messaging system?

If we have to transfer data from one application to another or one system to another, we use the Messaging System. With the help of Messaging System you need not worry about How to transfer data rather focus on the Data alone. There are two types of messaging patterns available – point to point & publish subscribe (pub-sub) messaging system.

Where most of the messaging follow the pub-sub patter.

Point-to-Point Messaging Pattern

In this system, the messages are persisted in a queue. A particular message can only be consumed by only on consumer. Once the message is read it disappears from the queue.
Publish Subscribe Messaging Pattern

In this system, the messages are persisted in a topic and unlike point-to-point consumer can subscribe to multiple topics. In terms of Kafka the producer is the publisher and consumer is the Subscriber.

What is the need for Kafka?

It is generally used in these scenarios:

Building real-time streaming data pipelines that can be relied on to get data between systems or applications
Building real-time streaming applications that transform or react to the streams of data

Uses

Kafka is used for the following purposes:

Is Reliable − Kafka is distributed, partitioned, replicated and fault tolerance.
Is Scalable − Kafka messaging system scales easily without down time.
Is Durable − Kafka uses Distributed commit log which means messages persists on disk as fast as possible, hence it is durable.
High Performance − Kafka has high throughput for both publishing and subscribing messages.

Various Use Cases :

Kafka is used for the following purposes:

Messaging System
Metrics
Log Collection or Log Aggregation
Stream Processing
Event Sourcing
Website Activity Tracking
Commit Log

Kafka Four Core APIs :

Consumer API
Allows clients to connect to Kafka servers running in the cluster and consume streams of records from one or more Kafka topics. Kafka consumes the messages from Kafka topics.
Producer API
Allows clients to connect to Kafka servers running in the cluster and publish the stream of records to one or more Kafka topics.
Streams API
Allows clients to act as stream processors by consuming streams from one or more topics and producing the streams to other output topics. Allowing application to transform the input to output streams.
Connector API
This Kafka Connector API allows building and running reusable producers or consumers that connect Kafka topics to existing applications or data systems. For example, a connector to a relational database might capture every change to a table.

Summary

We learn about Kakfa features, its uses, usecases and core apis, Hope you liked it !

Apache Kafka – Introduction