Skip to main content

When I started learning backend development, I was excited about building microservices. I joined several sessions on advanced backend topics, attended webinars, and followed tech discussions. That's where I kept hearing about Kafka, RabbitMQ, and other message brokers.

Honestly? I was intimidated. Every time someone mentioned Kafka, they talked about "distributed systems," "event streaming," and "high throughput" with such seriousness that I thought, "This must be incredibly complex. I'll learn it later when I'm more experienced."

So I avoided it. For months.

But eventually, I couldn't ignore it anymore. Kafka kept coming up in job descriptions, architecture discussions, and system design interviews. I decided to face my fear and dive in.

Here's what I discovered: Kafka isn't nearly as hard as I thought. Yes, it's powerful and handles complex problems, but the core concepts? Actually pretty straightforward. I just needed to understand what it is, why it exists, and what all those terms mean. Once I learned the fundamentals systematically, everything clicked.

That's why I'm writing this guide. If you're in the same boat I was—hearing about Kafka everywhere but feeling too intimidated to start—this is for you. Let me show you that Kafka is not some mystical, unreachable technology. It's a tool, and you can learn it.

What is Apache Kafka?

Let me start with the simplest explanation: Kafka is a distributed system that helps different parts of your application communicate with each other through messages.

But that's too simple. Here's what makes Kafka special and why it's different from traditional systems:

Traditional Approach (Synchronous Communication): Imagine you have an Order Service, Inventory Service, Payment Service, and Email Service. In a traditional setup, when a user places an order:

  1. Order Service calls Inventory Service: "Do we have stock?" (waits for response)

  2. Order Service calls Payment Service: "Process payment" (waits for response)

  3. Order Service calls Email Service: "Send confirmation" (waits for response)

Problems with this:

  • If any service is down, the entire flow breaks

  • Order Service needs to know about all other services

  • Everything happens sequentially, making it slow

  • Services are tightly coupled

graph LR
    A[Order Service] -->|1. Check Stock| B[Inventory Service]
    B -->|Response| A
    A -->|2. Process Payment| C[Payment Service]
    C -->|Response| A
    A -->|3. Send Email| D[Email Service]
    D -->|Response| A
    
    style A fill:#ff6b6b
    style B fill:#4ecdc4
    style C fill:#4ecdc4
    style D fill:#4ecdc4

Kafka Approach (Asynchronous Communication): With Kafka, when a user places an order:

  1. Order Service publishes an event: "Order Placed" to Kafka

  2. Order Service is done—no waiting, no blocking

  3. Inventory Service reads the event and updates stock

  4. Payment Service reads the event and processes payment

  5. Email Service reads the event and sends confirmation

All these services work independently, at their own pace. They don't even need to be online at the same time.

graph TD
    A[Order Service] -->|Publish: Order Placed| K[Kafka Topic: orders]
    K -->|Subscribe| B[Inventory Service]
    K -->|Subscribe| C[Payment Service]
    K -->|Subscribe| D[Email Service]
    
    style A fill:#ff6b6b
    style K fill:#ffd93d
    style B fill:#4ecdc4
    style C fill:#4ecdc4
    style D fill:#4ecdc4

This is the fundamental shift: from direct service-to-service calls to event-driven architecture where services communicate through events stored in Kafka.

Why Was Kafka Created?

Understanding Kafka's origin helps understand its design.

LinkedIn created Kafka in 2010. They had a massive problem: millions of users generating billions of events every day—profile views, connection requests, messages, job searches. They needed to:

  • Capture all these events

  • Make them available to dozens of different systems

  • Process them in real-time

  • Never lose data

  • Scale to handle growth

Traditional messaging systems couldn't handle this scale. Direct connections between services created a tangled web. They needed something new.

Kafka was their solution. They open-sourced it in 2011, and today it's used by thousands of companies:

  • Netflix uses it to process viewing activity and recommendations

  • Uber uses it to handle trip data and real-time pricing

  • LinkedIn still uses it to process over 7 trillion messages per day

  • Banks use it for fraud detection and transaction processing

When Should You Use Kafka?

Not every project needs Kafka. Here's when it makes sense:

Use Kafka when:

  • You need real-time data processing (fraud detection, live dashboards, instant notifications)

  • Multiple services need the same data independently

  • You're building microservices and want loose coupling

  • You need to handle high volumes of data

  • You can't afford to lose data

  • You need to scale horizontally as traffic grows

Don't use Kafka when:

  • You have simple request-response needs (use REST APIs)

  • You're building a small application with 2-3 services

  • You need immediate synchronous responses

  • Your data volume is small (hundreds of messages per day)

  • You want something simple to set up and maintain

For a basic CRUD app with a frontend and backend, you don't need Kafka. For a system processing millions of events from various sources, Kafka is perfect.

Understanding Kafka Terminology

This is where most beginners struggle. Kafka has its own vocabulary, and understanding these terms is crucial. Let me explain each one clearly with examples.

Message (or Event)

A message is a piece of data representing something that happened. It's also called an "event" because it describes an event that occurred in your system.

Examples of messages:

// User signup event
{
  "event_type": "user_signup",
  "user_id": "12345",
  "email": "john@example.com",
  "timestamp": "2025-01-15T10:30:00Z"
}

// Payment event
{
  "event_type": "payment_completed",
  "order_id": "ORD-789",
  "amount": 99.99,
  "currency": "USD",
  "timestamp": "2025-01-15T10:31:45Z"
}

// Temperature reading from IoT sensor
{
  "sensor_id": "TEMP-001",
  "temperature": 23.5,
  "unit": "celsius",
  "location": "warehouse_A",
  "timestamp": "2025-01-15T10:32:10Z"
}

Each message has three components:

  • Key (optional): An identifier like "user_12345" or "sensor_001"

  • Value: Your actual data (usually JSON, but can be any format)

  • Timestamp: When the event occurred

Think of a message as a row in a log file, recording something that happened.

Topic

A topic is a named stream or category where related messages are stored.

Think of topics like folders on your computer or channels in Slack. Each topic holds a specific type of message.

Common topic naming examples:

  • user-signups - All user registration events

  • order-events - All order-related events (placed, cancelled, completed)

  • payment-transactions - All payment events

  • inventory-updates - Stock level changes

  • sensor-readings - IoT device data

  • application-logs - Error and info logs

Real-world example: Imagine an e-commerce system with these topics:

  • orders - When orders are placed

  • shipments - When items are shipped

  • returns - When customers return items

  • reviews - When customers leave reviews

Each topic is independent. A service can publish to multiple topics and subscribe to multiple topics.

Topic naming best practices:

  • Use descriptive names that explain the content

  • Use lowercase with hyphens or underscores

  • Be consistent across your organization

  • Avoid generic names like "data" or "events"

Producer

A producer is any application that sends (publishes) messages to a Kafka topic.

Examples of producers:

  • Your web API that publishes a "user_signup" event when someone registers

  • A mobile app that sends "button_click" events for analytics

  • A payment gateway that publishes "payment_completed" or "payment_failed" events

  • An IoT device that sends sensor readings every second

  • A logging library that sends application errors to Kafka

Key characteristics:

  • Producers don't care who reads the messages

  • They just write to topics and move on

  • Multiple producers can write to the same topic

  • Producers decide which topic to write to

Simple analogy: Think of a producer as someone posting a message on a public bulletin board. They post it and walk away. They don't know who will read it or when.

Consumer

A consumer is any application that reads (subscribes to) messages from a Kafka topic.

Examples of consumers:

  • An email service that reads "user_signup" events and sends welcome emails

  • An analytics service that reads all events and updates dashboards

  • A database sync service that reads events and updates a database

  • A notification service that reads events and pushes notifications to mobile devices

  • An audit service that reads all events and stores them for compliance

Key characteristics:

  • Consumers choose which topics to subscribe to

  • They read messages at their own pace

  • Multiple consumers can read the same messages independently

  • Each consumer maintains its own position in the topic

Important concept: The same message can be read by multiple consumers doing completely different things:

  • Consumer A (Email Service) reads "order_placed" → sends confirmation email

  • Consumer B (Analytics Service) reads "order_placed" → updates sales dashboard

  • Consumer C (Inventory Service) reads "order_placed" → reduces stock

  • Consumer D (Shipping Service) reads "order_placed" → creates shipping label

All four read the same event, but process it differently. They don't affect each other.

Broker

A broker is a Kafka server—the actual program running that stores messages and handles requests.

Simple explanation: When people say "Kafka server" or "Kafka instance," they mean a broker. It's the software that:

  • Receives messages from producers

  • Stores them on disk

  • Serves them to consumers

  • Manages topics and partitions

In practice:

  • For learning: You run 1 broker on your laptop

  • For production: Companies run 3-5 or more brokers together (called a cluster)

  • Multiple brokers provide redundancy—if one fails, others continue working

Analogy: Think of a broker as a post office. It receives mail (messages), stores it in mailboxes (topics), and delivers it when requested.

Partition

This is where Kafka's power comes from. Each topic is divided into partitions, and this enables massive scalability.

What is a partition? A partition is a subset of a topic's messages. It's an ordered, immutable sequence of messages.

Visual representation:

Topic: "order-events" (divided into 3 partitions)

Partition 0: [msg1] → [msg4] → [msg7] → [msg10] → ...
Partition 1: [msg2] → [msg5] → [msg8] → [msg11] → ...
Partition 2: [msg3] → [msg6] → [msg9] → [msg12] → ...
graph TD
    T[Topic: order-events] --> P0[Partition 0]
    T --> P1[Partition 1]
    T --> P2[Partition 2]
    
    P0 --> M1[msg1]
    M1 --> M4[msg4]
    M4 --> M7[msg7]
    
    P1 --> M2[msg2]
    M2 --> M5[msg5]
    M5 --> M8[msg8]
    
    P2 --> M3[msg3]
    M3 --> M6[msg6]
    M6 --> M9[msg9]
    
    style T fill:#ffd93d
    style P0 fill:#95e1d3
    style P1 fill:#95e1d3
    style P2 fill:#95e1d3

Why partitions matter:

1. Ordering Guarantee Messages within a single partition are strictly ordered. If msg1 comes before msg2 in a partition, this order never changes. This is crucial for many use cases.

Example: All orders from user_123 go to the same partition, so they're processed in the correct order.

2. Parallel Processing Different partitions can be processed simultaneously by different consumers. This is how Kafka scales.

Example: If you have 6 partitions and 6 consumers, each consumer processes one partition. That's 6x throughput compared to one consumer.

3. Key-Based Routing When you send a message with a key, Kafka uses that key to determine which partition it goes to. Messages with the same key always go to the same partition.

Practical example:

Topic: "user-activity" (3 partitions)

Message with key "user_123" → Partition 0
Message with key "user_456" → Partition 2
Message with key "user_123" → Partition 0 (same key, same partition)
Message with key "user_789" → Partition 1
Message with key "user_123" → Partition 0 (again, same partition)

All activity for user_123 stays in Partition 0, maintaining order.

How many partitions should you have?

  • Start with 3-6 partitions for learning

  • In production, it depends on throughput needs

  • More partitions = more parallelism, but more overhead

  • You can increase partitions later (but can't easily decrease)

Offset

An offset is a unique number that identifies each message's position within a partition.

How it works: Kafka assigns each message in a partition a sequential number starting from 0:

Partition 0:
  Offset 0: {"user": "alice", "action": "login"}
  Offset 1: {"user": "alice", "action": "view_product", "product_id": "123"}
  Offset 2: {"user": "alice", "action": "add_to_cart", "product_id": "123"}
  Offset 3: {"user": "alice", "action": "checkout"}
  Offset 4: {"user": "alice", "action": "payment_completed"}

Why offsets are important: Consumers use offsets to track what they've read. It's like a bookmark in a book.

Example scenario:

  1. Consumer starts reading from offset 0

  2. Processes messages at offsets 0, 1, 2

  3. Commits offset 2 (saying "I've processed everything up to 2")

  4. Consumer crashes

  5. Consumer restarts, checks last committed offset: 2

  6. Resumes reading from offset 3

  7. No messages lost, no messages reprocessed

Offset management:

  • Kafka automatically stores committed offsets

  • Consumers can choose where to start: earliest (offset 0), latest (newest messages), or specific offset

  • You can rewind and reprocess messages by resetting offsets

Consumer Group

A consumer group is a set of consumers working together to read from a topic.

The problem it solves: You have a topic with 1000 messages per second. One consumer can't keep up. What do you do? Add more consumers and put them in a consumer group.

How it works:

  • All consumers with the same group_id are in the same group

  • Kafka automatically assigns partitions to consumers in the group

  • Each partition is assigned to exactly ONE consumer in the group

  • If consumers are added or removed, Kafka rebalances automatically

Example 1: Perfect Distribution

Topic: "orders" has 6 partitions
Consumer Group: "order-processors" has 3 consumers

Kafka assigns:
- Consumer A: Partitions 0, 1
- Consumer B: Partitions 2, 3
- Consumer C: Partitions 4, 5

Each order is processed by exactly one consumer.
graph TD
    T[Topic: orders<br/>6 Partitions] --> P0[Partition 0]
    T --> P1[Partition 1]
    T --> P2[Partition 2]
    T --> P3[Partition 3]
    T --> P4[Partition 4]
    T --> P5[Partition 5]
    
    P0 --> CA[Consumer A]
    P1 --> CA
    P2 --> CB[Consumer B]
    P3 --> CB
    P4 --> CC[Consumer C]
    P5 --> CC
    
    CA -.belongs to.- CG[Consumer Group:<br/>order-processors]
    CB -.belongs to.- CG
    CC -.belongs to.- CG
    
    style T fill:#ffd93d
    style CA fill:#6c5ce7
    style CB fill:#6c5ce7
    style CC fill:#6c5ce7
    style CG fill:#a29bfe

Example 2: Adding Capacity

You add a 4th consumer to the group.

Kafka rebalances:
- Consumer A: Partitions 0, 1
- Consumer B: Partitions 2, 3
- Consumer C: Partitions 4
- Consumer D: Partitions 5

Load is redistributed automatically.

Example 3: Handling Failure

Consumer C crashes.

Kafka rebalances:
- Consumer A: Partitions 0, 1
- Consumer B: Partitions 2, 3, 4
- Consumer D: Partitions 5

Consumer B takes over partition 4. No messages are lost.

Multiple Consumer Groups: Different consumer groups are completely independent. Each group gets all messages.

Topic: "order-events"

Consumer Group "email-service":
- Reads all orders
- Sends confirmation emails

Consumer Group "analytics-service":
- Reads all orders (same events)
- Updates dashboards

Consumer Group "inventory-service":
- Reads all orders (same events)
- Updates stock levels

All three groups process the same events independently.

graph TD
    T[Topic: order-events] --> CG1[Consumer Group:<br/>email-service]
    T --> CG2[Consumer Group:<br/>analytics-service]
    T --> CG3[Consumer Group:<br/>inventory-service]
    
    CG1 --> C1[Email Consumer]
    CG2 --> C2[Analytics Consumer]
    CG3 --> C3[Inventory Consumer]
    
    C1 -.action.- A1[Send Emails]
    C2 -.action.- A2[Update Dashboard]
    C3 -.action.- A3[Update Stock]
    
    style T fill:#ffd93d
    style CG1 fill:#a29bfe
    style CG2 fill:#a29bfe
    style CG3 fill:#a29bfe
    style C1 fill:#6c5ce7
    style C2 fill:#6c5ce7
    style C3 fill:#6c5ce7

Rule of thumb:

  • Same application, same goal → same consumer group

  • Different applications, different goals → different consumer groups

Replication

Replication means keeping multiple copies of your data across different brokers.

Why replication matters: Hardware fails. Disks crash. Servers go down. Without replication, you'd lose data.

How it works: When you create a topic, you specify a replication factor:

  • Replication factor 1: One copy (no redundancy)

  • Replication factor 2: Two copies on different brokers

  • Replication factor 3: Three copies on different brokers (common in production)

Example:

Topic: "payments" with replication factor 3

Message: {"payment_id": "PAY-123", "amount": 50}

Stored on:
- Broker 1 (Leader)
- Broker 2 (Follower)
- Broker 3 (Follower)

If Broker 1 crashes:
- Broker 2 becomes the new leader
- No data lost
- Producers and consumers continue working

Leader and Followers:

  • Each partition has one leader broker (handles all reads and writes)

  • Other replicas are followers (keep synchronized copies)

  • If the leader fails, a follower becomes the new leader automatically

For learning, replication factor 1 is fine. In production, always use at least 3.

Cluster

A Kafka cluster is a group of brokers working together.

Why use multiple brokers?

  • Fault tolerance: If one broker fails, others continue

  • Load distribution: Spread partitions across multiple machines

  • Scalability: Add more brokers as your data grows

Example cluster:

Kafka Cluster "production-cluster"
├── Broker 1 (stores partitions 0, 3, 6)
├── Broker 2 (stores partitions 1, 4, 7)
├── Broker 3 (stores partitions 2, 5, 8)
└── ZooKeeper (coordinates the brokers)

Each broker holds different partitions and replicas, distributing the load.

ZooKeeper and KRaft

You'll see these terms when setting up Kafka.

ZooKeeper:

  • A coordination service Kafka traditionally used

  • Manages cluster metadata, broker coordination, and leader elections

  • Runs as a separate system alongside Kafka

  • Being phased out in newer Kafka versions

KRaft (Kafka Raft):

  • Kafka's built-in coordination mechanism

  • Replaces ZooKeeper completely

  • Simpler to operate (no external dependency)

  • Became production-ready in Kafka 3.3, and is now the default in Kafka 4.0+

For new learners: If you're starting fresh in 2025, you'll only work with KRaft mode. ZooKeeper is a legacy technology that you'll only encounter in older, existing deployments. The good news? KRaft is simpler and easier to manage.

How Everything Works Together

Let me show you a complete flow so you see how all these pieces connect.

Scenario: E-commerce order processing

Here's a visual overview of the complete system:

flowchart TD
    U[User Places Order] --> API[Order Service<br/>Producer]
    API -->|Publish Event| K[Kafka Topic: orders<br/>3 Partitions<br/>Replicated across brokers]
    
    K -->|Consumer Group:<br/>inventory-updaters| INV[Inventory Service]
    K -->|Consumer Group:<br/>email-senders| EMAIL[Email Service]
    K -->|Consumer Group:<br/>analytics-processors| ANA[Analytics Service]
    K -->|Consumer Group:<br/>warehouse-system| WH[Warehouse Service]
    
    INV --> INV_ACT[Reduce Stock]
    EMAIL --> EMAIL_ACT[Send Confirmation]
    ANA --> ANA_ACT[Update Dashboard]
    WH --> WH_ACT[Create Picking List]
    
    style U fill:#95a5a6
    style API fill:#ff6b6b
    style K fill:#ffd93d
    style INV fill:#4ecdc4
    style EMAIL fill:#4ecdc4
    style ANA fill:#4ecdc4
    style WH fill:#4ecdc4

Step 1: Setup

- Topic: "orders" with 3 partitions
- Replication factor: 3 (data on 3 brokers)
- Multiple services ready to consume

Step 2: Order Placed

User places an order → Web API (Producer) publishes:

Message Key: "user_12345"
Message Value: {
  "order_id": "ORD-789",
  "user_id": "user_12345",
  "items": [{"id": "item_A", "qty": 2}],
  "total": 99.99,
  "timestamp": "2025-01-15T10:30:00Z"
}

Kafka receives the message:
- Uses key "user_12345" to determine partition (let's say Partition 1)
- Assigns offset 1523
- Stores on Broker 1 (leader) and replicates to Broker 2, 3
- Makes available to consumers

Step 3: Multiple Services React

Inventory Service (Consumer Group: "inventory-updaters")

- Subscribes to "orders" topic
- Reads message at Partition 1, Offset 1523
- Reduces stock for item_A by 2
- Commits offset 1523

Email Service (Consumer Group: "email-senders")

- Subscribes to "orders" topic
- Reads the same message (different group, gets all messages)
- Sends order confirmation to user_12345
- Commits offset 1523 for its group

Analytics Service (Consumer Group: "analytics-processors")

- Subscribes to "orders" topic
- Reads the same message (another independent group)
- Updates real-time sales dashboard
- Commits offset 1523 for its group

Warehouse Service (Consumer Group: "warehouse-system")

- Subscribes to "orders" topic
- Reads the same message
- Creates picking list for warehouse staff
- Commits offset 1523 for its group

Key observations:

  • One event (order placed) triggers four different actions

  • Each service works independently

  • Services process at their own pace

  • If one service is slow or crashes, others are unaffected

  • All services get the complete event data

  • The order service doesn't know or care about downstream services

Step 4: Failure Handling

Scenario: Email service crashes after reading but before sending email

1. Message was read from Partition 1, Offset 1523
2. Email service crashes before committing the offset
3. Email service restarts
4. Checks last committed offset: 1522
5. Reads and processes offset 1523 again
6. Email is sent
7. Commits offset 1523

Result: Email delivered, no data lost

Message Delivery Guarantees

One crucial topic is reliability. What happens if something goes wrong?

Kafka offers three delivery guarantees:

graph TD
    A[Message Delivery Guarantees] --> B[At Most Once]
    A --> C[At Least Once]
    A --> D[Exactly Once]
    
    B --> B1[May lose messages<br/>Never duplicates]
    B --> B2[Fastest performance]
    B --> B3[Use: Non-critical logs]
    
    C --> C1[Never loses messages<br/>May duplicate]
    C --> C2[Most common choice]
    C --> C3[Use: Most applications]
    
    D --> D1[No loss, no duplicates<br/>Perfect delivery]
    D --> D2[Slowest performance]
    D --> D3[Use: Financial transactions]
    
    style A fill:#ffd93d
    style B fill:#74b9ff
    style C fill:#00b894
    style D fill:#6c5ce7

At Most Once

Definition: Messages may be lost but will never be delivered twice.

How it works:

  • Producer sends message and doesn't wait for confirmation

  • If the network fails, message might be lost

  • Consumer reads message and commits offset immediately before processing

When to use: When performance matters more than data loss (e.g., monitoring metrics)

Example: Logging non-critical events where occasional loss is acceptable

At Least Once (Most Common)

Definition: Messages will never be lost but may be delivered more than once.

How it works:

  • Producer waits for Kafka to confirm message receipt

  • Consumer processes message first, then commits offset

  • If consumer crashes before committing, it reprocesses the message

When to use: Most production applications (default setting)

Requirement: Your consumers must be idempotent (can safely process the same message multiple times)

Example:

Message: "Reduce stock for item_A by 1"

Bad approach: stock = stock - 1 (if processed twice, stock is wrong)
Good approach: Use transaction IDs and check if already processed

Exactly Once

Definition: Each message is delivered precisely once—the holy grail.

How it works:

  • Kafka uses transactions and idempotency mechanisms

  • More complex configuration

  • Some performance overhead

When to use: Financial transactions, critical operations where duplicates are unacceptable

Example: Processing payments (you can't charge a customer twice)

Most applications use "at least once" with idempotent consumers. It's the sweet spot between reliability and simplicity.

Kafka Design Patterns and Best Practices

Here are practical patterns I learned that will help you use Kafka effectively.

1. Use Meaningful Keys for Ordering

Messages with the same key go to the same partition, maintaining order.

Examples:

  • User events → key: user_id (all events from one user stay ordered)

  • Device telemetry → key: device_id (all readings from one device stay ordered)

  • Account transactions → key: account_id (all transactions for an account stay ordered)

When you don't need ordering: Leave the key null. Kafka distributes messages evenly across partitions.

2. Design Self-Contained Messages

Each message should have all the information needed to process it independently.

Bad:

{"order_id": "123"}

Consumer needs to make database calls to get order details.

Good:

{
  "order_id": "123",
  "user_id": "user_456",
  "items": [{"product_id": "P1", "quantity": 2}],
  "total": 99.99,
  "shipping_address": {...}
}

Consumer has everything it needs.

3. Make Consumers Idempotent

Since "at least once" delivery can send duplicates, design consumers to handle this.

Techniques:

  • Use unique message IDs and track what you've processed

  • Design operations that give the same result if repeated

  • Use database transactions

Example:

def process_order(message):
    order_id = message['order_id']
    
    # Check if already processed
    if database.is_processed(order_id):
        print(f"Order {order_id} already processed, skipping")
        return
    
    # Process the order
    database.save_order(message)
    database.mark_as_processed(order_id)

4. Monitor Consumer Lag

Consumer lag = latest message offset - last processed offset

Growing lag means your consumer is falling behind. This is your most important metric.

Causes of lag:

  • Consumer is too slow

  • Too much data, not enough consumers

  • Consumer keeps crashing and restarting

Solutions:

  • Add more consumers to the consumer group

  • Optimize processing logic

  • Increase partition count

5. Choose Appropriate Retention

Kafka doesn't keep messages forever. Configure retention based on your needs:

# Keep messages for 7 days
retention.ms=604800000

# Keep messages until 10GB, then delete oldest
retention.bytes=10737418240

Considerations:

  • How long might consumers be offline?

  • Do you need to replay historical data?

  • How much disk space do you have?

6. Start with Fewer Partitions

More partitions seem better, but they add overhead:

  • More files on disk

  • More network connections

  • More coordination

Recommendation:

  • Start with 3-6 partitions

  • Monitor throughput

  • Increase if needed

  • You can't easily decrease partitions later

7. Use Consistent Naming Conventions

Good topic names:

  • user-events

  • order-lifecycle

  • payment-transactions

  • inventory-updates

Bad topic names:

  • data

  • events

  • stuff

  • topic1

Be descriptive and consistent across your organization.

Common Pitfalls to Avoid

Here are mistakes I made (and saw others make) when learning Kafka:

1. Creating Too Many Topics

Mistake: Creating a separate topic for every tiny event type

user-signup
user-login
user-logout
user-profile-update
user-password-change

Better: Group related events

user-events (with event_type field)

Why: Fewer topics are easier to manage, and consumers can subscribe to all user events in one place.

2. Ignoring Message Size

Kafka handles large messages but performs best with smaller ones.

Guideline: Keep messages under 1MB

For large files:

  • Store the file in object storage (S3, GCS)

  • Send a reference in Kafka: {"file_id": "abc123", "s3_url": "..."}

3. Not Handling Rebalancing

When consumers join or leave a group, Kafka rebalances partitions. Your code must handle this gracefully.

What happens:

  • Consumer stops reading

  • Partitions are reassigned

  • Consumer resumes with new partitions

Solution: Use proper consumer clients that handle rebalancing automatically (most do).

4. Forgetting About Idempotency

Assuming messages are delivered exactly once leads to bugs.

Always ask: "What if my consumer processes this message twice?"

Design accordingly.

5. Not Planning for Schema Evolution

Your message format will change over time. Plan for it:

Bad:

{"name": "John", "email": "john@example.com"}

Later you want to add phone number. Old consumers break.

Good:

{
  "version": 1,
  "name": "John",
  "email": "john@example.com"
}

Or use Schema Registry (Confluent's tool for managing schemas).

6. Over-Engineering Initially

Mistake: Starting with:

  • 50 partitions

  • Exactly-once semantics

  • Schema Registry

  • Complex key structures

Better: Start simple:

  • 3-6 partitions

  • At-least-once delivery

  • JSON messages

  • Basic string keys

Add complexity only when needed.

Kafka vs Other Technologies

You'll hear about other tools. Here's how they compare:

Kafka vs RabbitMQ

RabbitMQ:

  • Traditional message queue

  • Messages are deleted after consumption

  • Great for task queues

  • Easier to set up

  • Lower throughput

Kafka:

  • Distributed log

  • Messages are retained (can replay)

  • Great for event streaming

  • More complex setup

  • Higher throughput

When to use RabbitMQ: Task queues, request-response patterns, simpler deployments

When to use Kafka: Event streaming, high throughput, multiple consumers reading same data

Kafka vs AWS SQS/SNS

SQS (Queue):

  • Fully managed

  • Simpler, no infrastructure management

  • Lower throughput

  • Messages deleted after reading

SNS (Pub/Sub):

  • Fully managed

  • Push-based (sends to subscribers)

  • No message retention

  • Simpler use cases

Kafka:

  • Self-managed (or managed offerings like Confluent Cloud, AWS MSK)

  • Higher throughput

  • Messages retained, can replay

  • Pull-based (consumers read)

When to use SQS/SNS: Simple use cases, fully serverless architecture, AWS ecosystem

When to use Kafka: High throughput, complex event processing, multi-cloud, need message retention

Kafka vs Database Logs

Database Change Data Capture (CDC):

  • Tracks database changes

  • Limited to one source

  • Tied to database

Kafka:

  • General-purpose event streaming

  • Multiple sources and sinks

  • Decoupled from storage

They're often used together: CDC tools (like Debezium) stream database changes to Kafka.

What to Learn Next

You now understand Kafka's fundamentals. Here's your learning path forward:

Level 2: Intermediate Concepts

  • Kafka Streams: Build stream processing applications (like real-time analytics)

  • Kafka Connect: Move data between Kafka and other systems without code

  • Schema Registry: Manage message schemas and evolution

  • Security: Authentication, authorization, encryption

  • Monitoring: Prometheus, Grafana, lag monitoring

Level 3: Advanced Topics

  • Performance tuning: Compression, batching, buffer sizes

  • Operations: Cluster management, upgrades, disaster recovery

  • Exactly-once semantics: Implementing transactional messages

  • Tiered storage: Archive old data to cheap storage

  • Multi-datacenter replication: MirrorMaker, cluster linking

Hands-On Practice

  • Set up Kafka locally with Docker

  • Write simple producers and consumers in your preferred language

  • Build a small project (activity stream, order processing simulation)

  • Read Kafka's official documentation

  • Join Kafka community forums

Resources I Found Helpful

  • Official Apache Kafka documentation

  • Confluent's Kafka tutorials

  • "Kafka: The Definitive Guide" book

  • Kafka Improvement Proposals (KIPs) for deep dives

  • YouTube channels covering Kafka architecture

Final Thoughts

When I first heard about Kafka, I thought it was this impossibly complex distributed system that only senior engineers could understand. I was wrong.

Kafka is powerful, yes. It solves hard problems. But the core concepts—topics, producers, consumers, partitions—are actually straightforward. Once you understand what each term means and how they fit together, everything makes sense.

The key is systematic learning:

  1. Understand what Kafka is and why it exists

  2. Learn the terminology thoroughly

  3. See how pieces connect

  4. Start with simple examples

  5. Gradually explore advanced features

You don't need to understand every detail of distributed consensus algorithms or log compaction strategies to use Kafka effectively. Start with the fundamentals, build something simple, and grow from there.

Kafka is just a tool. Like any tool, it takes practice to master, but it's absolutely learnable. If I could go back and tell my earlier self anything, it would be: "Don't be intimidated. Just start learning. It's not as hard as you think."

I hope this guide gives you the confidence to dive in. Kafka is an incredibly valuable skill, and you're more than capable of learning it.

Happy learning, and welcome to the world of event streaming!

Loading comments...

Share this article