I like the performance and reliability of Kafka. I needed a data streaming buffer that could handle thousands of messages per second with at least one processing point for an analytics pipeline. Kafka fits this requirement very well, as it is a fast, distributed message broker. It definitely does exactly what it is designed to do.
Improvements to My Organization:
Through its publisher-subscriber pattern, Kafka has allowed our applications to access and consume data at a real time pace.
Room for Improvement:
As an open-source project, Kafka is still fairly young and has not yet built out the stability and features that other open-source projects have acquired over the many years. If done correctly, Kafka can also take over the stream-processing space that technologies such as Apache Storm cover.
Currently, as it is in the big/fast data integration world, you need to piece together many different open-source technologies. For example, to create a reliable, fault-tolerant streaming processing system that ingests data, you need:
- a producer service
- an event/message buffer such as Kafka or a message queue
- a stream processing consumer such as Spark, Flink, Storm, etc.
- something to help facilitate the ingestion into target datasources such as Flume or some customized concoction.
This is simply to ingest the data and does not necessarily account for the analytical pieces, which may consist of Spark ML, SystemML, ElasticSearch, Mahout, etc.
What I'm getting at is basically the need for a Spring framework of big data.
Use of Solution:
I have been using Kafka for 2 years.
The only stability issues we had were mostly a result of the evolving APIs and existing bugs.
Kafka is designed to be very easily scalable so I did not have any trouble here.
We used the open-source version and did not buy support from Confluent.
We did not have any other previous solutions. Our project was green field and a new type of project development.
Initial setup was straightforward. We simply hosted multiple Kafka brokers and ZooKeeper servers on AWS EC2 instances.
Other Solutions Considered:
We evaluated AWS Kinesis as well.
Kafka is open source and requires an administrator to maintain the servers.
Disclosure: I am a real user, and this review is based on my own experience and opinions.