Apache Kafka Review
Its publisher-subscriber pattern has allowed our applications to access and consume data in real time.

Improvements to My Organization

Through its publisher-subscriber pattern, Kafka has allowed our applications to access and consume data at a real time pace.

Valuable Features

I like the performance and reliability of Kafka. I needed a data streaming buffer that could handle thousands of messages per second with at least one processing point for an analytics pipeline. Kafka fits this requirement very well, as it is a fast, distributed message broker. It definitely does exactly what it is designed to do.

Room for Improvement

As an open-source project, Kafka is still fairly young and has not yet built out the stability and features that other open-source projects have acquired over the many years. If done correctly, Kafka can also take over the stream-processing space that technologies such as Apache Storm cover.

Currently, as it is in the big/fast data integration world, you need to piece together many different open-source technologies. For example, to create a reliable, fault-tolerant streaming processing system that ingests data, you need:

  • a producer service
  • an event/message buffer such as Kafka or a message queue
  • a stream processing consumer such as Spark, Flink, Storm, etc.
  • something to help facilitate the ingestion into target datasources such as Flume or some customized concoction.

This is simply to ingest the data and does not necessarily account for the analytical pieces, which may consist of Spark ML, SystemML, ElasticSearch, Mahout, etc.

What I'm getting at is basically the need for a Spring framework of big data.

Stability Issues

The only stability issues we had were mostly a result of the evolving APIs and existing bugs.

Scalability Issues

Kafka is designed to be very easily scalable so I did not have any trouble here.

Customer Service and Technical Support

We used the open-source version and did not buy support from Confluent.

Previous Solutions

We did not have any other previous solutions. Our project was green field and a new type of project development.

Initial Setup

Initial setup was straightforward. We simply hosted multiple Kafka brokers and ZooKeeper servers on AWS EC2 instances.

Implementation Team

We implemented it in-house and then went with the Hortonworks Data Platform distribution.

Other Solutions Considered

We evaluated AWS Kinesis as well.

Other Advice

Kafka is open source and requires an administrator to maintain the servers.

Disclosure: I am a real user, and this review is based on my own experience and opinions.

Add a Comment

Anonymous avatar x30
Why do you like it?

Sign Up with Email