Apache Kafka Overview
What is Apache Kafka?
Apache Kafka is a distributed streaming platform, with the following capabilities:
- It lets you publish and subscribe to streams of records. In this respect it is similar to a message queue or enterprise messaging system.
- It lets you store streams of records in a fault-tolerant way.
- It lets you process streams of records as they occur.
Apache Kafka gets used for two broad classes of application:
- Building real-time streaming data pipelines that reliably get data between systems or applications.
- Building real-time streaming applications that transform or react to the streams of data.
Apache Kafka Buyer's Guide
Download the Apache Kafka Buyer's Guide including reviews and more. Updated: May 2021
Apache Kafka CustomersUber, Netflix, Activision, Spotify, Slack, Pinterest
Apache Kafka Video
Filter Archived Reviews (More than two years old)
- Highest Rating
- Lowest Rating
- Review Length
Showingreviews based on the current filters.
Technical Consultant at KPMG
Oct 17, 2018
It eases our current data flow and framework
What is our primary use case?It's convenient and flexible for almost all kinds of data producers. We integrated it with Kafka Streams, which can perform some easy data processing, like summary, count, group, etc
How has it helped my organization?It eases our current data flow and framework, which digests all types of sources regardless of it being structured or not.
What is most valuable?High availability High throughput With such a large digest, I was genuinely impressed at the process being almost real-time.
What needs improvement?Kafka 2.0 has been released for over a month, and I wanted to try out the new features. However, the configuration is a little bit complicated: Kafka Broker, Kafka Manager, ZooKeeper Servers, etc.
For how long have I used the…
Senior Technical Architect at a tech vendor with 51-200 employees
Nov 6, 2017
Its publisher-subscriber pattern has allowed our applications to access and consume data in real time.
Pros and Cons
- "I like the performance and reliability of Kafka. I needed a data streaming buffer that could handle thousands of messages per second with at least one processing point for an analytics pipeline. Kafka fits this requirement very well."
- "As an open-source project, Kafka is still fairly young and has not yet built out the stability and features that other open-source projects have acquired over the many years. If done correctly, Kafka can also take over the stream-processing space that technologies such as Apache Storm cover."
What other advice do I have?Kafka is open source and requires an administrator to maintain the servers.
Learn what your peers think about Apache Kafka. Get advice and tips from experienced pros sharing their opinions. Updated: May 2021.
502,104 professionals have used our research since 2012.
Senior Java Consultant at a tech services company with 501-1,000 employees
Oct 9, 2017
The product is a distributed system for persistent messaging
What other advice do I have?It's a high-performance distributed system. If you want to track the user activities or any stream processing, then this is perfect. We have used Docker Kafka for our implementation. It's very easy for setup and testing. You could also try the same.
Team Lead at a financial services firm with 1,001-5,000 employees
May 24, 2017
Messages stay in Kafka after clients consume them. A message can be consumed by the same or a different client until topic retention kicks in and the oldest messages get deleted.
What other advice do I have?Go ahead. It's a great product.
Big Data Lead at a marketing services firm with 51-200 employees
May 23, 2017
We use it as an MQ. From it, we have several consumers like Secor that upload raw data to S3.
What other advice do I have?Read the documentation and understand the offset issues (where to save them, read from start to end).
May 23, 2017
This is the base streaming component of our IoT platform. It needs a separate cluster and a separate administrator.
What other advice do I have?If the Hadoop distribution is MapR, then consider MapR Streaming. MapR Streaming has overcome these fundamental issues. It stores data within the MapR-FS itself. So there is extra overhead, but with a licensing cost.
Founder, CEO at a tech vendor with 1-10 employees
May 14, 2017
The ability to partition data is valuable. There are far superior and cheaper alternatives in cloud-based solutions
Pros and Cons
- "The ability to partition data on Kafka is valuable."
- "The product is good, but it needs implementation and on-going support. The whole cloud engagement model has made the adoption of Kafka better due to PaaS (Amazon Kinesis, a fully managed service by AWS)."
What other advice do I have?If you have a dedicated Kafka resource to implement and manage the services, then go for Apache Kafka. Otherwise, do consider cloud-based services from AWS or Azure.
Senior Software Engineering Consultant at a tech services company with 51-200 employees
May 13, 2017
It offers throughput with built-in fault-tolerance and replication.
Pros and Cons
- "Kafka, as compared with other messaging system options, is great for large scale message processing applications. It offers high throughput with built-in fault-tolerance and replication."
- "Kafka requires non-trivial expertise with DevOps to deploy in production at scale. The organization needs to understand ZooKeeper and Kafka and should consider using additional tools, such as MirrorMaker, so that the organization can survive an availability zone or a region going down."
What other advice do I have?Consider using a managed Kafka service, such as from Heroku. If messaging is not a central component of the business and vendor lock-in is less of a concern, consider using something like Amazon's Kinesis. This can more rapidly provide the benefits of a messaging service without the pain of understanding it deeply, setting it up, and managing it. It's important to use a lean approach to understand how it will break in production. Implement a non-critical transaction with it. Perhaps use a feature toggle within a facade and implement the behavior with the old approach and with Kafka to reduce…
May 12, 2017
Does real-time streaming and persistence into distributed nodes. It provides a mechanism to create, publish, and subscribe.
What other advice do I have?Kafka provides distributed persistence and streaming layers. The user has flexibility in managing as a consumer on how to consume messages if they have to handle resilience in their code. It requires ZooKeeper.
Solutions Architect at a consultancy with 1,001-5,000 employees
May 10, 2017
Has the ability to write data at one velocity and have subscribing consumers read at different velocities.
Pros and Cons
- "Apache Kafka is actually a distributed commit log. That is different than most messaging and queuing systems before it."
- "The GUI tools for monitoring and support are still very basic and not very rich. There is no help in determining a shard key for performance."
What other advice do I have?Be sure to define the use cases as best as possible at first. Kafka is very good, but it is complex to support. It can handle any message size, whereas native cloud options have size limitations. Be sure to understand what messages will be sent and how many discrete topics will be needed. Be aware that you must code both producers and consumers. The bulk of the work is with the consumer. The Apache stack for Kafka is very open source. There are essentially no tools other than command line options to monitor brokers and topic health. So there are 3rd party tools that will help with that, some…
Head of Engineering
May 10, 2017
Interactions among micro-services are used as input to our analytics infrastructure.
Pros and Cons
- "Ease of use."
- "Stability of the API and the technical support could be improved."
What other advice do I have?The product is easy to use. However, to leverage its power, there is a need for good knowledge of event based processing. I suggest using the massive amount of material shared by the Confluent team, or what is available online.
Apr 13, 2017
One of the best features which I have worked with is replay.
What is most valuable?One of the best features which I have worked with is replay.
How has it helped my organization?Real-time log aggregation which was earlier done with rsync has been moved to Kafka infrastructure along with other real-time streams.
What needs improvement?GUI for Kafka infrastructure monitoring and deployment
For how long have I used the solution?I have used it for two years.
What was my experience with deployment of the solution?Documentation is quite comprehensive.
What do I think about the stability of the solution?I found it very stable.
What do I think about the scalability of the solution?No issues with scalability.
How are customer service and technical support?Customer Service: We used the open-source version. Technical…
Apr 13, 2017
The speed at which it publishes messages is valuable.
What is most valuable?Excellent speeds for publishing messages faster.
What needs improvement?Too much dependency on the zookeeper and leader selection is still the bottleneck for Kafka implementation.
What do I think about the scalability of the solution?RESTful API implementation actually uses the Kafka Broker to publish the messages but I am not able to find it becoming scalable. Partially, the reason might be there is no load balancer for the RESTful API web server.
How was the initial setup?Setup is very much straightforward for development, and cluster setup is also easy. I am not aware of the production setup yet.
What about the implementation team?I implemented it on my own.
Enterprise Architect at a logistics company with 1,001-5,000 employees
Jan 25, 2017
We use it for reactive architecture, track and trace, mail and parcel.
What is most valuable?Supports more than 10,000 events/second. Scalability Replication It is a good product for event-driven architecture.
How has it helped my organization?We use Kafka for reactive architecture, track and trace, mail and parcel.
What needs improvement?A good free monitor tool would be great for Apache Kafka (from Apache foundation).
For how long have I used the solution?We used Kafka 0.8 for 2 years and Kafka 0.10 for 3 months.
What do I think about the stability of the solution?We have not encountered any stability issues.
What do I think about the scalability of the solution?We have not encountered any scalability issues.
How are customer service and technical support?We haven’t used technical support.
Which solution did I use…
Jan 25, 2017
Topic-based eventing, scalability, and retention periods are valuable.
What other advice do I have?This is the best tool I have ever used for asynchronous, event-based solutions.
Lead Engineer at a retailer with 10,001+ employees
Jan 24, 2017
We use the product for high-scale distributed messaging. Multiple consumers can sync with it and fetch messages.
What other advice do I have?I would advise others to start with non-SSL implementations and try to do PoCs. Afterwards, they should move towards more secure features.
Java Developer at a media company with 10,001+ employees
Jan 5, 2017
It provides safety for data in case of node failure or data center outage. Partitioning is useful for parallelizing processing.
What other advice do I have?Give it a try. It’s a valuable, high-performance, distributed processing tool.
Product CategoriesMessage Queue (MQ) Software
Download our free Apache Kafka Report and get advice and tips from experienced pros sharing their opinions.