Apache Kafka Review

Its publisher-subscriber pattern has allowed our applications to access and consume data in real time.


How has it helped my organization?

Through its publisher-subscriber pattern, Kafka has allowed our applications to access and consume data at a real time pace.

What is most valuable?

I like the performance and reliability of Kafka. I needed a data streaming buffer that could handle thousands of messages per second with at least one processing point for an analytics pipeline. Kafka fits this requirement very well, as it is a fast, distributed message broker. It definitely does exactly what it is designed to do.

What needs improvement?

As an open-source project, Kafka is still fairly young and has not yet built out the stability and features that other open-source projects have acquired over the many years. If done correctly, Kafka can also take over the stream-processing space that technologies such as Apache Storm cover.

Currently, as it is in the big/fast data integration world, you need to piece together many different open-source technologies. For example, to create a reliable, fault-tolerant streaming processing system that ingests data, you need:

  • a producer service
  • an event/message buffer such as Kafka or a message queue
  • a stream processing consumer such as Spark, Flink, Storm, etc.
  • something to help facilitate the ingestion into target datasources such as Flume or some customized concoction.

This is simply to ingest the data and does not necessarily account for the analytical pieces, which may consist of Spark ML, SystemML, ElasticSearch, Mahout, etc.

What I'm getting at is basically the need for a Spring framework of big data.

What do I think about the stability of the solution?

The only stability issues we had were mostly a result of the evolving APIs and existing bugs.

What do I think about the scalability of the solution?

Kafka is designed to be very easily scalable so I did not have any trouble here.

How is customer service and technical support?

We used the open-source version and did not buy support from Confluent.

Which solutions did we use previously?

We did not have any other previous solutions. Our project was green field and a new type of project development.

How was the initial setup?

Initial setup was straightforward. We simply hosted multiple Kafka brokers and ZooKeeper servers on AWS EC2 instances.

What about the implementation team?

We implemented it in-house and then went with the Hortonworks Data Platform distribution.

Which other solutions did I evaluate?

We evaluated AWS Kinesis as well.

What other advice do I have?

Kafka is open source and requires an administrator to maintain the servers.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Add a Comment
Guest
Sign Up with Email