Kafka Introduction
Apache Kafka is heavily used in my work now. Therefore, I thought it would be useful and helpful to write a post about it.
Kafka is an event streaming platform. Apache Kafka defines event streaming as the following:
Technically speaking, event streaming is the practice of capturing data in real-time from event sources like databases, sensors, mobile devices, cloud services, and software applications in the form of streams of events; storing these event streams durably for later retrieval; manipulating, processing, and reacting to the event streams in real-time as well as retrospectively; and routing the event streams to different destination technologies as needed. Event streaming thus ensures a continuous flow and interpretation of data so that the right information is at the right place, at the right time.
Installation and Hello World with Kafka
For how to install and run Kafka on other systems, please refer to the official documentation: https://kafka.apache.org/quickstart
I am using Mac. Homebrew is a very nice tool to install and manage packages on Mac, quite similar to apt-get on Ubuntu. The App store on macOS is enough to install, upgrade, and update applications for normal users, but definitely not sufficient for developers. Honestly, I think Apple should acquire Homebrew and integrate it into macOS.
Install Java
|
|
Install Kafka
|
|
Start Zookeeper
Open a new terminal, run the command to start ZooKeeper, because Kafka leverages ZooKeeper to manage clusters.
|
|
Start Kafka
Note: Edit /usr/local/etc/kafka/server.properties to change listeners to listeners=PLAINTEXT://localhost:9092
Open a new terminal, run the command to start Kafka:
|
|
Create Kafka Topic
Create a test topic with replication-factor=1(no extra replicas) and partition=1(only 1 partition)
|
|
Initialize 2 Kafka Producers
Open 2 other terminals and run the command to start Kafka producers:
|
|
Write 4 messages in the following sequence:
- producer1 –> message 1
- producer2 –> message 1
- producer1 –> message 2
- producer2 –> message 2
(base) xxxx@MacBook-Air-2 ~ % kafka-console-producer –broker-list localhost:9092 –topic test producer1 message 1
producer1 message 2
(base) xxxx@MacBook-Air-2 ~ % kafka-console-producer –broker-list localhost:9092 –topic test producer2 message1
producer2 message2
Start a Kafka Consumer
Open a terminal and run the command to initialize a kafka consumer:
|
|
And we will the 4 messages written before:
(base) xxxx@MacBook-Air-2 ~ % kafka-console-consumer –bootstrap-server localhost:9092 –topic test –from-beginning
producer1 message 1
producer2 message1
producer1 message 2
producer2 message2
Kafka is using Pub-Sub pattern. We can create multiple producers and consumers for a certain topic to play with it.
A Comparison between Kafka and RabbitMQ
By design Kafka is an event streaming platform while RabbitMQ is a message queue.
Here are some of their major differences between Apache Kafka and RabbitMQ:
Kafka | RabbitMQ | |
---|---|---|
Message Ordering | provides message ordering thanks to its partitioning. Messages are sent to topics by message key. | N/A |
Message Retention | Policy-based (e.g., 30 days) Kafka is a log, which means that it retains messages by default. You can manage this by specifying a retention policy. | Acknowledgment basedRabbitMQ is a queue, so messages are done away with once consumed, and acknowledgment is provided. |
Message Priorities | N/A | In RabbitMQ, you can specify message priorities and consume message with high priority first. |
Routing | Publish/Subscribe based | Multiple exchange types: Direct, Fan out, Topic, Header-based |
Generally, Kafka retains messages while RabbitMQ doesn’t; RabbitMQ offers more complex and flexible ways of routing messages and prioritizing messages, while Kafka supports a simple Pub-Sub pattern.
Regarding performance, both systems can be scaled to process millions of messages per second. Some benchmarking results suggest that:
- Given same resource configuration, Kafka has higher throughput than RabbitMQ;
- RabbitMQ delivers lower message latency than Kafka
Conclusion
Both Apache Kafka and RabbitMQ are great systems. When complex routing and prioritizing of messages are
required, RabbitMQ is the preferred option; when message ordering and retention are needed, Kafka is the choice.
In terms business cases, RabbitMQ is more used for transactional systems and is perfect for routing
message between clusters of microservices whereas Kafka is more used in analytical systems such as data
streaming and pipelines.