Apache Kafka: Real-Time Event Streaming

Once upon a time, companies were content to process data in hefty overnight batches. But then the business world woke up. They realized they needed to know when a user clicked a button immediately, not hours later. Enter Event Streaming.

From Batch to Stream Processing


Kafka Architecture

Core Concepts

Component Description
Topic A stream of records, akin to a table
Partition Physical division of a topic
Producer Publishes records to topics
Consumer Reads records from topics
Broker A Kafka server
Cluster Set of brokers

Key Properties

Property Value
Throughput Millions of messages per second
Latency Less than 10 milliseconds
Storage Petabytes
Retention Days to forever
Ordering Per partition
Delivery At least once

Kafka Topics, Partitions & Consumer Groups


Producer Patterns

Fire and Forget

Use Case: Metrics, logs

Synchronous Send

Use Case: Financial transactions

Asynchronous with Callback

Use Case: Most applications


Kafka Guarantees

Delivery Semantics

Semantic Description Use Case
At Most Once Messages may be lost but never duplicated Metrics, logs
At Least Once Messages never lost but may duplicate Most applications
Exactly Once Messages delivered exactly once Financial systems

Ordering Guarantees


Stream Processing with Kafka

Kafka Streams API

Common Patterns

Filtering

Transformation

Aggregation

Joins


Kafka vs Traditional Messaging

Feature Kafka Traditional MQ
Storage Persistent log Transient queue
Replay Yes, any offset No, consumed = gone
Throughput Millions per second Thousands per second
Consumers Pull-based Push-based
Ordering Per partition Global or none
Scalability Horizontal Vertical

Real-World Use Cases

LinkedIn (Original Creator)

Netflix

Uber

Airbnb

1 / 1