We just raised a $30M Series A: Read our story

Apache Kafka OverviewUNIXBusinessApplication

Apache Kafka is the #2 ranked solution in our list of top Message Queue Software. It is most often compared to IBM MQ: Apache Kafka vs IBM MQ

What is Apache Kafka?

Apache Kafka is a distributed streaming platform, with the following capabilities:

  • It lets you publish and subscribe to streams of records. In this respect it is similar to a message queue or enterprise messaging system.
  • It lets you store streams of records in a fault-tolerant way.
  • It lets you process streams of records as they occur.

Apache Kafka gets used for two broad classes of application:

  • Building real-time streaming data pipelines that reliably get data between systems or applications.
  • Building real-time streaming applications that transform or react to the streams of data.
Apache Kafka Buyer's Guide

Download the Apache Kafka Buyer's Guide including reviews and more. Updated: October 2021

Apache Kafka Customers
Uber, Netflix, Activision, Spotify, Slack, Pinterest
Apache Kafka Video

Archived Apache Kafka Reviews (more than two years old)

Filter by:
Filter Reviews
Industry
Loading...
Filter Unavailable
Company Size
Loading...
Filter Unavailable
Job Level
Loading...
Filter Unavailable
Rating
Loading...
Filter Unavailable
Considered
Loading...
Filter Unavailable
Order by:
Loading...
  • Date
  • Highest Rating
  • Lowest Rating
  • Review Length
Search:
Showingreviews based on the current filters. Reset all filters
JL
Technical Consultant at KPMG
Real User
It eases our current data flow and framework

What is our primary use case?

It's convenient and flexible for almost all kinds of data producers. We integrated it with Kafka Streams, which can perform some easy data processing, like summary, count, group, etc

How has it helped my organization?

It eases our current data flow and framework, which digests all types of sources regardless of it being structured or not.

What is most valuable?

High availability High throughput With such a large digest, I was genuinely impressed at the process being almost real-time.

What needs improvement?

Kafka 2.0 has been released for over a month, and I wanted to try out the new features. However, the configuration is a little bit complicated: Kafka Broker, Kafka Manager, ZooKeeper Servers, etc.

For how long have I used the

What is our primary use case?

It's convenient and flexible for almost all kinds of data producers. We integrated it with Kafka Streams, which can perform some easy data processing, like summary, count, group, etc

How has it helped my organization?

It eases our current data flow and framework, which digests all types of sources regardless of it being structured or not.

What is most valuable?

  • High availability
  • High throughput

With such a large digest, I was genuinely impressed at the process being almost real-time.

What needs improvement?

Kafka 2.0 has been released for over a month, and I wanted to try out the new features. However, the configuration is a little bit complicated: Kafka Broker, Kafka Manager, ZooKeeper Servers, etc.

For how long have I used the solution?

Less than one year.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Kevin  Quon
Senior Technical Architect at a tech vendor with 51-200 employees
Real User
Its publisher-subscriber pattern has allowed our applications to access and consume data in real time.

Pros and Cons

  • "I like the performance and reliability of Kafka. I needed a data streaming buffer that could handle thousands of messages per second with at least one processing point for an analytics pipeline. Kafka fits this requirement very well."
  • "As an open-source project, Kafka is still fairly young and has not yet built out the stability and features that other open-source projects have acquired over the many years. If done correctly, Kafka can also take over the stream-processing space that technologies such as Apache Storm cover."

How has it helped my organization?

Through its publisher-subscriber pattern, Kafka has allowed our applications to access and consume data at a real time pace.

What is most valuable?

I like the performance and reliability of Kafka. I needed a data streaming buffer that could handle thousands of messages per second with at least one processing point for an analytics pipeline. Kafka fits this requirement very well, as it is a fast, distributed message broker. It definitely does exactly what it is designed to do.

What needs improvement?

As an open-source project, Kafka is still fairly young and has not yet built out the stability and features that other open-source projects have acquired over the many years. If done correctly, Kafka can also take over the stream-processing space that technologies such as Apache Storm cover.

Currently, as it is in the big/fast data integration world, you need to piece together many different open-source technologies. For example, to create a reliable, fault-tolerant streaming processing system that ingests data, you need:

  • a producer service
  • an event/message buffer such as Kafka or a message queue
  • a stream processing consumer such as Spark, Flink, Storm, etc.
  • something to help facilitate the ingestion into target datasources such as Flume or some customized concoction.

This is simply to ingest the data and does not necessarily account for the analytical pieces, which may consist of Spark ML, SystemML, ElasticSearch, Mahout, etc.

What I'm getting at is basically the need for a Spring framework of big data.

What do I think about the stability of the solution?

The only stability issues we had were mostly a result of the evolving APIs and existing bugs.

What do I think about the scalability of the solution?

Kafka is designed to be very easily scalable so I did not have any trouble here.

How are customer service and technical support?

We used the open-source version and did not buy support from Confluent.

Which solution did I use previously and why did I switch?

We did not have any other previous solutions. Our project was green field and a new type of project development.

How was the initial setup?

Initial setup was straightforward. We simply hosted multiple Kafka brokers and ZooKeeper servers on AWS EC2 instances.

What about the implementation team?

We implemented it in-house and then went with the Hortonworks Data Platform distribution.

Which other solutions did I evaluate?

We evaluated AWS Kinesis as well.

What other advice do I have?

Kafka is open source and requires an administrator to maintain the servers.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Learn what your peers think about Apache Kafka. Get advice and tips from experienced pros sharing their opinions. Updated: October 2021.
543,424 professionals have used our research since 2012.
it_user660591
Senior Java Consultant at a tech services company with 501-1,000 employees
Consultant
The product is a distributed system for persistent messaging

What is most valuable?

The most valuable features are performance, persistent messaging, and reliability. It allows us to persist the message for a configurable number of days, even after it has been delivered to the consumer. The message delivery is also fast.

How has it helped my organization?

We wanted to track the customer activities on our application and store those details on another system(RDBMS/Apache Hadoop). We do extensive analysis with that. This helps the company to analyze the customer activities, such as search terms, and do better.

What needs improvement?

It’s perfect for our requirements.

For how long have I used the solution?

I have been using Apache Kafka for two years.

What do I think about the stability of the solution?

We have had no issues with stability.

What do I think about the scalability of the solution?

We have had no issues with scalability.

How are customer service and technical support?

We use the open source one, so we did not opt for any technical support.

Which solution did I use previously and why did I switch?

We started to use Apache Kafka with our application from scratch.

How was the initial setup?

The initial setup was straightforward. We faced some issues during the development in areas such as message producer and consumer. We rectified those with the tweaking the producer and consumer configurations. The documentation is very good.

What's my experience with pricing, setup cost, and licensing?

I don’t have any idea, as we use the open source version.

What other advice do I have?

It's a high-performance distributed system. If you want to track the user activities or any stream processing, then this is perfect. We have used Docker Kafka for our implementation. It's very easy for setup and testing. You could also try the same.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user650004
Team Lead at a financial services firm with 1,001-5,000 employees
Vendor
Messages stay in Kafka after clients consume them. A message can be consumed by the same or a different client until topic retention kicks in and the oldest messages get deleted.

What is most valuable?

  • Message Retention: Unlike regular message queues, messages stay in Kafka after clients consume them. A message can be consumed over and over again by the same or a different client until topic retention (by max data size or oldest message timestamp) kicks in and the oldest messages get deleted. This can be very handy in many scenarios: handling bugs in software, testing code, simple distribution of message processing, and routing messages to many different consumers simultaneously.
  • Horizontal Scalability: To add more capacity, both in terms of storage and performance to a Kafka cluster, you just need to add more servers. Regular message queues usually work in a master-slave configuration and do not scale very well horizontally.
  • Simplicity in operations.

How has it helped my organization?

It has become dead simple to connect different application and services, saving a lot of development hours.

What needs improvement?

The standard Kafka Java library, which is shipped with the product, is too complex for inexperienced users. At my company, engineering teams ended up writing wrapper libraries to solve complex issues. Kafka client libraries in general are complex, regardless of language. This is the price Kafka users have to pay for having simple, yet robust, server-side code.

What could be improved is the hard dependency on ZooKeeper. The work in this direction has already been started, though. Overall, the project is moving forward at a very good pace

For how long have I used the solution?

I have used Kafka for three years.

What do I think about the stability of the solution?

Sometimes we have stability issues, but not often.

What do I think about the scalability of the solution?

We have not had any scalability issues.

How are customer service and technical support?

There is no official technical support as the product is 100% open source.

Which solution did I use previously and why did I switch?

We used RabbitMQ before. It does not scale well.

How was the initial setup?

The setup was pretty straightforward.

What's my experience with pricing, setup cost, and licensing?

There is no pricing and licensing.

Which other solutions did I evaluate?

We didn't evaluate any other options.

What other advice do I have?

Go ahead. It's a great product.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user642168
Big Data Lead at a marketing services firm with 51-200 employees
Vendor
We use it as an MQ. From it, we have several consumers like Secor that upload raw data to S3.

What is most valuable?

We are using Kafka consumer and producer.

How has it helped my organization?

We are using Kafka as MQ; our servers generate events which are being sent to Kafka. From Kafka, we have several consumers like Secor (https://github.com/pinterest/secor) that upload raw data to S3; Spark stream that is doing aggregations and saving the result in Cassandra; and Druid for OLAP.

What needs improvement?

  • Maintenance: Sometimes brokers disconnect and there are repartitions issues.
  • Built-in monitoring application for Kafka infrastructure.
  • UI for Kafka would also be great (similar to http://www.kafkatool.com/).

For how long have I used the solution?

I have used this product for two years.

What do I think about the stability of the solution?

We used to have problems in Kafka every three weeks and our dev ops team fixed a few issues. For the last six months, there have been no production problems, but during the time Kafka was not stable, it was not easy to understand what was wrong and how to fix it.

What do I think about the scalability of the solution?

We have not encountered any scalability issues yet. We are growing and currently, we manage 1M events per second in Kafka.

How are customer service and technical support?

We need more documentation regarding maintenance issues.

Which solution did I use previously and why did I switch?

I used RabbitMQ and ActiveMQ. Kafka is the standard, so there is no question what to use (unless you need better performance, like in ZeroMQ).

Which other solutions did I evaluate?

We did not evaluate other options as Apache Kafka is the standard.

What other advice do I have?

Read the documentation and understand the offset issues (where to save them, read from start to end).

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
Hadoop Technical Lead (Assistant Consultant) at a tech services company with 10,001+ employees
Real User
This is the base streaming component of our IoT platform. It needs a separate cluster and a separate administrator.

What is most valuable?

  • Distributed
  • Persistence
  • Offset management by consumer

How has it helped my organization?

This is the base streaming component of our IoT platform.

In case of disaster recovery, we mirror the data in the cluster by maintaining the offsets and store the data within Hadoop 2.8 HDFS.

What needs improvement?

  • It needs a separate cluster and a separate administrator to manage the Kafka cluster, adding an extra cost.
  • It is challenging when data is moved to a mirror cluster, in the case of disaster recovery. It doesn't keep the offset.

For how long have I used the solution?

I have used this solution for one year.

How are customer service and technical support?

The open source community is very strong. Also, distributors like Cloudera and Hortonworks provide paid support.

Which solution did I use previously and why did I switch?

For big data, we did not have a previous solution. I have used Microsoft MQ for building traditional systems.

How was the initial setup?

The setup was straightforward.

What's my experience with pricing, setup cost, and licensing?

This is open source with the cost of a cluster administrator.

Which other solutions did I evaluate?

We did not look at anything else. At that time, this was already accepted by the industry for streaming data processing.

What other advice do I have?

If the Hadoop distribution is MapR, then consider MapR Streaming. MapR Streaming has overcome these fundamental issues. It stores data within the MapR-FS itself. So there is extra overhead, but with a licensing cost.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
DR
Founder, CEO at a tech vendor with 1-10 employees
Real User
The ability to partition data is valuable. There are far superior and cheaper alternatives in cloud-based solutions

Pros and Cons

  • "The ability to partition data on Kafka is valuable."
  • "The product is good, but it needs implementation and on-going support. The whole cloud engagement model has made the adoption of Kafka better due to PaaS (Amazon Kinesis, a fully managed service by AWS)."

How has it helped my organization?

We have used Kafka for streaming customer web clicks from live sessions to understand customer behavioral patterns.

What is most valuable?

The ability to partition data on Kafka is valuable. But Kafka needs support and management. It is better to have it fully managed on the cloud.

The only reason I give Kafka as product a low rating is because there are far superior and cheaper alternatives in cloud-based solutions, where we save money on manpower, electricity, servers, datacenters, networking, etc.

In fact, this is the view I have for pretty much all open source software compared to cloud based services. They just make things cheaper, faster, scalable and manageable. Kafka is good, but Kafka as a cloud service is awesome!!

This is a relative rating (compared to cloud services), not that something is wrong with Kafka. I hope that is clear.

What needs improvement?

The product is good, but it needs implementation and on-going support. The whole cloud engagement model has made the adoption of Kafka better due to PaaS (Amazon Kinesis, a fully managed service by AWS).

What do I think about the stability of the solution?

No issues here with stability.

What do I think about the scalability of the solution?

Ah, scalability!!! We need to set up multiple servers again for handling the load, which makes Kafka not scalable, unless you subscribe to cloud services.

How are customer service and technical support?

It’s an Apache-community based support, so it is not really prioritized if you have a business issue. This is why most enterprise customers pay for cloud services.

Which solution did I use previously and why did I switch?

We didn’t have a previous solution. We started with Kafka and then switched to Amazon Kinesis (PaaS for Kafka). I think Microsoft Azure also released a competing service.

How was the initial setup?

The setup was straightforward.

What's my experience with pricing, setup cost, and licensing?

Licensing issues are not applicable. Apache licensing makes it simple with almost zero cost for the software itself.

Which other solutions did I evaluate?

We unsuccessfully, and kind of foolishly, tried Apache Camel. They were not similar in services, so we moved to Kafka rightfully, and then to AWS cloud ultimately.

What other advice do I have?

If you have a dedicated Kafka resource to implement and manage the services, then go for Apache Kafka. Otherwise, do consider cloud-based services from AWS or Azure.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user660627
Senior Software Engineering Consultant at a tech services company with 51-200 employees
Consultant
It offers throughput with built-in fault-tolerance and replication.

Pros and Cons

  • "Kafka, as compared with other messaging system options, is great for large scale message processing applications. It offers high throughput with built-in fault-tolerance and replication."
  • "Kafka requires non-trivial expertise with DevOps to deploy in production at scale. The organization needs to understand ZooKeeper and Kafka and should consider using additional tools, such as MirrorMaker, so that the organization can survive an availability zone or a region going down."

How has it helped my organization?

I used Kafka with a client to decouple applications with different availability profiles. Before using a messaging-based architecture with Kafka as the messaging system, the client used a coordinator application to fire off various posts to as many as eight other applications. With an application that's impacting at least a customer a second in airports, where the customers demand that the system always works, there were issues with ensuring high availability.

A typical way to calculate system availability is: Availability = Uptime/(Uptime + Downtime). Hence, where there are two applications involved with a 99% availability, the total system availability degrades quickly: 99% * 99% = 98.01%.

With eight applications, total availability caused issues. However, only two systems needed to provide real-time responses, while other systems were for payment processing, CRM, promotions, etc. It was OK if those systems were not up to date in real time.

Kafka allowed the client to have temporal decoupling for writes, i.e., the flaky third-party CRM system did not need to be available at the moment for us to respond to a user with a successful response. The availability concerns shifted to Kafka, which is a better trade off because it's built for this.

Another benefit, though not required, was the addition of logical decoupling between applications. Additional consumers could be built to overlay concerns of analytics, but the systems responsible for creating the entities on a given topic did not need to be aware of the analytics applications. This simplifies the interaction between applications and concerns of an organization.

Another benefit of this architecture is that testing is simplified. A given application needs to be tested to obey a contract of reading a message and producing another message. A Kafka topic acts as the boundary for an integration test.

What is most valuable?

Kafka, as compared with other messaging system options, is great for large scale message processing applications. It offers high throughput with built-in fault-tolerance and replication.

Messaging systems in general allow for logical and temporal decoupling between applications. Given Kafka's high availability, it's a great option to use if applications require availability, but not real-time processing.

If a downstream system is offline, messages can queue up and process when possible, but the user may not necessarily need to be aware of any issues.

A messaging-based architecture becomes important as a set of micro-services need to scale with high availability. Kafka is a great choice for messaging with such architecture.

What needs improvement?

Kafka requires non-trivial expertise with DevOps to deploy in production at scale. The organization needs to understand ZooKeeper and Kafka and should consider using additional tools, such as MirrorMaker, so that the organization can survive an availability zone or a region going down.

Shifting availability concerns to Kafka means that it cannot go down. It's important to understand the partitioning model and replication needs before relying on it for critical business functions. I'd suggest using it with a feature toggle for a non-critical path in production and learning from failure before relying on it.

While Kafka is built to scale, that does not mean that applications can start as many consumers or producers without consideration for how Kafka brokers will perform. Considerations about scaling out brokers need to occur before publishing millions of messages.

What do I think about the stability of the solution?

Generally, there were no stability issues. However, there was one scare in production when a consumer rebalance took 30 minutes and messages were not being processed during that time.

What do I think about the scalability of the solution?

We have not yet had scalability issues!

How are customer service and technical support?

There are specialized consulting companies in this space and there are online resources to read. That may help companies get past hurdles.

Which solution did I use previously and why did I switch?

No, we did you use a previous messaging system.

How was the initial setup?

The setup was complex. One must consider setting up ZooKeeper, Kafka, multi-zone/region availability, as well as typical associated functions for running it all in production. This includes monitoring, message schema changes (consider Avro), encrypting messages if it's a concern, potentially authorization for different topics depending up on the sensitivity of data.

If an organization uses Kafka as the first messaging system, then the approach for application design must also shift significantly.

What's my experience with pricing, setup cost, and licensing?

It is open source software.

Which other solutions did I evaluate?

The client evaluated alternatives before I arrived, but I was not there during the evaluation so I cannot comment.

What other advice do I have?

Consider using a managed Kafka service, such as from Heroku.

If messaging is not a central component of the business and vendor lock-in is less of a concern, consider using something like Amazon's Kinesis. This can more rapidly provide the benefits of a messaging service without the pain of understanding it deeply, setting it up, and managing it.

It's important to use a lean approach to understand how it will break in production.

Implement a non-critical transaction with it.

Perhaps use a feature toggle within a facade and implement the behavior with the old approach and with Kafka to reduce risk.

Add it to one or two applications and monitor how it goes.

Figure out security, monitoring, scaling, schema migration, etc., before using it as a critical component in an application.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
Principal Software Architect at a tech services company with 11-50 employees
Consultant
Does real-time streaming and persistence into distributed nodes. It provides a mechanism to create, publish, and subscribe.

What is most valuable?

Real-time streaming and persistence into distributed nodes. It provides a simple mechanism to create, publish, and subscribe.

How has it helped my organization?

We are using Kafka as part of our product. It is one of the messaging layers used to interact between various layers of software modules. This provides a clear separation of modules and leverages it for development and testing of different modules.

What needs improvement?

The management tools are getting mature. When we have thousands of topics, it is hard to visualize.

For how long have I used the solution?

I’ve been using Kafka for two years.

What do I think about the stability of the solution?

We have not encountered any stability issues.

What do I think about the scalability of the solution?

We have to balance the nodes when topics partition across cluster nodes. As it assumes they are of equal sizes, sometimes some nodes may not be allocated similar resources. Reassignment moves all the partitions of specified topics which may be an issue when not planned for.

How are customer service and technical support?

We have the source code to make changes if necessary.

Which solution did I use previously and why did I switch?

Kafka rendered itself suitable for our product offering. It supports all the necessary requirements for a real-time pipeline.

How was the initial setup?

Setting up was easy with ZooKeeper.

What's my experience with pricing, setup cost, and licensing?

With paid support from Confluent, you get the additional benefit of Kafka Connect.

Which other solutions did I evaluate?

We used Akka Streams for faster communication, but it would require additional configuration and setup for persistence. Kafka provides those by default.

What other advice do I have?

Kafka provides distributed persistence and streaming layers. The user has flexibility in managing as a consumer on how to consume messages if they have to handle resilience in their code. It requires ZooKeeper.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user653562
Solutions Architect at a consultancy with 1,001-5,000 employees
Consultant
Has the ability to write data at one velocity and have subscribing consumers read at different velocities.

Pros and Cons

  • "Apache Kafka is actually a distributed commit log. That is different than most messaging and queuing systems before it."
  • "The GUI tools for monitoring and support are still very basic and not very rich. There is no help in determining a shard key for performance."

How has it helped my organization?

Kafka has a guaranteed delivery mechanism that is very easy to set up. When starting out with minimal hardware, it can handle very large data volumes. When prototyping and creating a proof of concept, Kafka has helped to speed up the timeline from the prototype all the way to production volumes.

What is most valuable?

Apache Kafka is actually a distributed commit log. That is different than most messaging and queuing systems before it. I find the ability to write data at one velocity and have subscribing consumers read at different velocities to be the best feature.

What needs improvement?

The GUI tools for monitoring and support are still very basic and not very rich. There is no help in determining a shard key for performance.

What do I think about the stability of the solution?

We did not have any issues with stability.

What do I think about the scalability of the solution?

We did not have any issues with scalability.

How are customer service and technical support?

  • Kafka is open source from LinkedIn and support comes from the community of users.
  • You can go with Confluent, the company that was founded by the original engineers from LinkedIn.
  • You can go with a cloud hosting service, like AWS EMR or Azure HDInsight.


    Which solution did I use previously and why did I switch?

    We used traditional message queues and file semaphores. There was a lot of overhead with asynchronous messages being put into an order and making sure nothing got dropped. It required a lot of code and maintenance.

    How was the initial setup?

    Since it is open source, you are on your own for setup. However, the tutorials from the Apache foundation and online sources have been an immense help.

    Getting started is very easy. The complexity of very large volumes of data and appropriate sharding, however, is difficult. There are fewer resources for tuning and best practices.

    What's my experience with pricing, setup cost, and licensing?

    When starting to look at a distributed message system, look for a cloud solution first. It is an easier entry point than an on-premises hardware solution. A lot of the complexity has already been taken care of. Both AWS and Azure have supported Kafka clusters that can be provisioned very easily.

    Which other solutions did I evaluate?

    We looked at RabbitMQ and Spark Streaming.

    What other advice do I have?

    Be sure to define the use cases as best as possible at first.

    Kafka is very good, but it is complex to support. It can handle any message size, whereas native cloud options have size limitations.

    Be sure to understand what messages will be sent and how many discrete topics will be needed.

    Be aware that you must code both producers and consumers.

    The bulk of the work is with the consumer.

    The Apache stack for Kafka is very open source. There are essentially no tools other than command line options to monitor brokers and topic health. So there are 3rd party tools that will help with that, some free, some paid – but it requires that you install agents on the servers hosting Kafka and open up ports for netbeans on the scripts that start up the Kafka services. Additionally, you also have to monitor zookeeper – which is very memory intensive. Cloud offerings that provide the whole modern data architecture stack – like AWS EMR and Azure HDInsight as well as Hortonworks and Cloudera provide a console GUI as part of each of their offerings. Also Confluent, a company founded by the Linked-In engineers that designed Kafka, also have a paid enterprise offering that has much better tools for maintain the kafka cluster. But apache Kafka with the community – you are on your own.

    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    it_user660630
    SDET II at a tech services company with 5,001-10,000 employees
    Consultant
    Replication and partitioning are valuable features.

    What is most valuable?

    • Replication, partitioning, and reliability are the most valuable features.
    • Even if one of my clusters fails, the replication factor of a topic makes sure that I have the data available for processing, so I won't lose any of it.
    • Partitioning enables me to process the parallel requests. It helps in reaching the throughput.

    What needs improvement?

    One improvement is in regards to the OS memory management. In case there are too many partitions, it runs into memory issues. Although this is a very rare scenario, it can happen.

    For how long have I used the solution?

    I have been using this product for a year now.

    What do I think about the stability of the solution?

    There were no stability issues.

    What do I think about the scalability of the solution?

    Kafka is a highly scalable product. We have not faced any scalability issues so far.

    How is customer service and technical support?

    Since it's an open source product, no technical support is available. However, the open source community is very active.

    How was the initial setup?

    The initial setup was straightforward. Just go through the Kafka documentation and it will be up and running in no time.

    What's my experience with pricing, setup cost, and licensing?

    Since it's an open source product, there is no pricing for it.

    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    it_user647457
    Head of Engineering
    Vendor
    Interactions among micro-services are used as input to our analytics infrastructure.

    Pros and Cons

    • "Ease of use."
    • "Stability of the API and the technical support could be improved."

    How has it helped my organization?

    Kafka was at the base of our system architecture. The system was designed as an event based architecture. Almost all the interactions among micro-services and the same data are used as input to our analytics infrastructure.

    What is most valuable?

    • Scalability
    • Reliability
    • Ease of use

    What needs improvement?

    Stability of the API and the technical support could be improved.

    The Kafka API is changing quite radically with the different releases. There are many new improvements and that's good. But the inherent cost of adapting to a new version of the platform was worrying me at the time.

    The documentation was sometimes misleading, since it was describing some feature in the new version of the API rather than the one we were using.

    What do I think about the stability of the solution?

    We did not encounter any issues with stability.

    What do I think about the scalability of the solution?

    We did not encounter any issues with scalability.

    How are customer service and technical support?

    We were not completely satisfied with the technical support. We subscribed to the Confluent professional platform to receive guidance and support on development and deployment. Whilst the development side is quite well covered by their consultants, the deployment and administration is not at the same level.

    Which solution did I use previously and why did I switch?

    The previous solution was not really an equivalent one. I have been using several messaging systems, but Kafka fits us better for a more scalable system.

    How was the initial setup?

    The initial setup was straightforward.

    What's my experience with pricing, setup cost, and licensing?

    I would not subscribe to the Confluent platform, but rather stay on the free open source version. The extra cost wasn't justified.

    Which other solutions did I evaluate?

    We didn't evaluate other options, as we already had a positive experience across the team with Kafka. Everybody agreed to work with it.

    We were considering Kinesis too, since we were running on AWS. We preferred to opt for a tool with which people were more familiar.

    What other advice do I have?

    The product is easy to use. However, to leverage its power, there is a need for good knowledge of event based processing. I suggest using the massive amount of material shared by the Confluent team, or what is available online.

    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    ITCS user
    Deputy General Manager, DevOps Manager at a comms service provider with 10,001+ employees
    Real User
    One of the best features which I have worked with is replay.

    What is most valuable?

    One of the best features which I have worked with is replay.

    How has it helped my organization?

    Real-time log aggregation which was earlier done with rsync has been moved to Kafka infrastructure along with other real-time streams.

    What needs improvement?

    GUI for Kafka infrastructure monitoring and deployment

    For how long have I used the solution?

    I have used it for two years.

    What was my experience with deployment of the solution?

    Documentation is quite comprehensive.

    What do I think about the stability of the solution?

    I found it very stable.

    What do I think about the scalability of the solution?

    No issues with scalability.

    How are customer service and technical support?

    Customer Service: We used the open-source version. Technical…

    What is most valuable?

    One of the best features which I have worked with is replay.

    How has it helped my organization?

    Real-time log aggregation which was earlier done with rsync has been moved to Kafka infrastructure along with other real-time streams.

    What needs improvement?

    • GUI for Kafka infrastructure monitoring and deployment

    For how long have I used the solution?

    I have used it for two years.

    What was my experience with deployment of the solution?

    Documentation is quite comprehensive.

    What do I think about the stability of the solution?

    I found it very stable.

    What do I think about the scalability of the solution?

    No issues with scalability.

    How are customer service and technical support?

    Customer Service:

    We used the open-source version.

    Technical Support:

    We used the open-source version.

    Which solution did I use previously and why did I switch?

    We previously used rsync, which was not real-time.

    How was the initial setup?

    Initial setup was mostly intuitive (based on rsync).

    What about the implementation team?

    Implementation was in-house based on the open-source version.

    What was our ROI?

    Target was to achieve real-time service.

    Which other solutions did I evaluate?

    Before choosing this product, we did not evaluate other options.

    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    ITCS user
    Java Architect at a tech vendor with 51-200 employees
    Vendor
    The speed at which it publishes messages is valuable.

    What is most valuable?

    Excellent speeds for publishing messages faster.

    What needs improvement?

    Too much dependency on the zookeeper and leader selection is still the bottleneck for Kafka implementation.

    What do I think about the scalability of the solution?

    RESTful API implementation actually uses the Kafka Broker to publish the messages but I am not able to find it becoming scalable. Partially, the reason might be there is no load balancer for the RESTful API web server.

    How was the initial setup?

    Setup is very much straightforward for development, and cluster setup is also easy. I am not aware of the production setup yet.

    What about the implementation team?

    I implemented it on my own.

    What is most valuable?

    Excellent speeds for publishing messages faster.

    What needs improvement?

    Too much dependency on the zookeeper and leader selection is still the bottleneck for Kafka implementation.

    What do I think about the scalability of the solution?

    RESTful API implementation actually uses the Kafka Broker to publish the messages but I am not able to find it becoming scalable. Partially, the reason might be there is no load balancer for the RESTful API web server.

    How was the initial setup?

    Setup is very much straightforward for development, and cluster setup is also easy. I am not aware of the production setup yet.

    What about the implementation team?

    I implemented it on my own.

    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    it_user592338
    Enterprise Architect at a logistics company with 1,001-5,000 employees
    Vendor
    We use it for reactive architecture, track and trace, mail and parcel.

    What is most valuable?

    Supports more than 10,000 events/second. Scalability Replication It is a good product for event-driven architecture.

    How has it helped my organization?

    We use Kafka for reactive architecture, track and trace, mail and parcel.

    What needs improvement?

    A good free monitor tool would be great for Apache Kafka (from Apache foundation).

    For how long have I used the solution?

    We used Kafka 0.8 for 2 years and Kafka 0.10 for 3 months.

    What do I think about the stability of the solution?

    We have not encountered any stability issues.

    What do I think about the scalability of the solution?

    We have not encountered any scalability issues.

    How are customer service and technical support?

    We haven’t used technical support.

    Which solution did I use

    What is most valuable?

    • Supports more than 10,000 events/second.
    • Scalability
    • Replication

    It is a good product for event-driven architecture.

    How has it helped my organization?

    We use Kafka for reactive architecture, track and trace, mail and parcel.

    What needs improvement?

    A good free monitor tool would be great for Apache Kafka (from Apache foundation).

    For how long have I used the solution?

    We used Kafka 0.8 for 2 years and Kafka 0.10 for 3 months.

    What do I think about the stability of the solution?

    We have not encountered any stability issues.

    What do I think about the scalability of the solution?

    We have not encountered any scalability issues.

    How are customer service and technical support?

    We haven’t used technical support.

    Which solution did I use previously and why did I switch?

    Apache MQ is different. It is a message bus (log rotate) than can manage more than 10,000 events/sec.

    How was the initial setup?

    The basic configuration is quite good. We have built a Hadoop cluster and the Kafka service was included.

    What's my experience with pricing, setup cost, and licensing?

    We use a community version.

    What other advice do I have?

    Kafka processes asynchronous exchanges, so there are no transactional interactions.

    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    ITCS user
    Technical Lead/Project Manager(Consulting Apple Inc) at a tech services company with 1,001-5,000 employees
    Consultant
    Topic-based eventing, scalability, and retention periods are valuable.

    What is most valuable?

    The most valuable features are topic-based eventing, scalability, and retention periods.

    How has it helped my organization?

    My organization is transforming by using the new SOA/eventing-based architecture. The application depends on the employees’ information events. Kafka is very helpful in implementing this. It increases the performance and gives the details to multiple external/internal teams using Kafka topics in an asynchronous manner.

    For example, if someone is moving from one office to another one, we have to update the software. While updating it, the system puts that event in a topic so that all other consumers can update that person’s new location. This can include the payroll team, the insurance team, and the hospital network.

    The retention period helps us retain the data in the topic for the configured number of days. In this example, if any of the consumers fail to consume the message from the topic, then that message will be there until the retention period ends.

    What needs improvement?

    I would like to see a more user-friendly GUI.

    For how long have I used the solution?

    We have used this solution since December, 2015.

    What do I think about the stability of the solution?

    If you are using the same group ID for multiple topics, it may shut down the application. We have faced this issue before.

    What do I think about the scalability of the solution?

    We have not had any scalability issues.

    How are customer service and technical support?

    I would give technical support a rating of 6 out of 10.

    Which solution did I use previously and why did I switch?

    We were using ActiveMQ, which is just a messaging system. We are changing because of Kafka’s added value of scalability, retention, and high payload support.

    How was the initial setup?

    The installation was somewhat straightforward.

    What's my experience with pricing, setup cost, and licensing?

    The solution is worth the money.

    What other advice do I have?

    This is the best tool I have ever used for asynchronous, event-based solutions.

    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    it_user590451
    Lead Engineer at a retailer with 10,001+ employees
    Vendor
    We use the product for high-scale distributed messaging. Multiple consumers can sync with it and fetch messages.

    What is most valuable?

    We use the product for high-scale distributed messaging. The processing capability of the product is enormous. Being a distributed platform, multiple consumers can sync with it and fetch messages.

    Another great feature is the consumer offset log which tells you where the consumer left and where he needs to start again. Consumers aren’t required to code and put extra effort to maintain the offset.

    How has it helped my organization?

    We were using another commercial messaging engine, which was not scalable unless you paid more. Each hub that we provisioned was expensive. This solution is open source, which is much easier to use and doesn’t cost us anything.

    What needs improvement?

    This product guarantees at-least-once delivery. We have asked JIRA to provide features such as at-most-once delivery to remove duplicate message consumption.

    What do I think about the stability of the solution?

    We haven’t faced any issues so far. Some of the clusters churn millions of records per seconds with ease.

    What do I think about the scalability of the solution?

    We have clustered environments and we haven’t seen any scalability issues. We can provision a new node in as little as 45 minutes.

    How are customer service and technical support?

    It is open source, so support is in our own hands. The only option is to make a new feature request through JIRA. When multiple people in the community make a request for similar feature, it gets priority.

    Which solution did I use previously and why did I switch?

    We switched from a previous solution mainly to reduce costs and to have a more scalable solution.

    How was the initial setup?

    The initial setup was a bit complex in terms of how to manage it across data centers. But once it was setup, we never faced issues.

    Which other solutions did I evaluate?

    We evaluated multiple options, such as ActiveMQ and RabbitMQ. We leaned towards this solution.

    What other advice do I have?

    I would advise others to start with non-SSL implementations and try to do PoCs. Afterwards, they should move towards more secure features.

    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    it_user578787
    Java Developer at a media company with 10,001+ employees
    Real User
    It provides safety for data in case of node failure or data center outage. Partitioning is useful for parallelizing processing.

    What is most valuable?

    The most valuable features to me are replication, partitioning and easy integration with Apache Spark, which we use quite a bit for distributed processing.

    Replication is good for high availability. It provides additional safety for data in case of node failure or data center outage. Partitioning is a really useful feature for parallelizing processing. We use Apache Spark to process data from a Kafka queue, and Spark is able to assign one executor to each Kafka partition. The more partitions we have, the more threads we can use to process data in parallel. This helps us achieve really good throughput.

    How has it helped my organization?

    It will help us build a scalable platform. This will allow the company to provide better customer service.

    What needs improvement?

    It’s pretty easy to use for now. I haven’t had any difficulty or problems that I can complain about. Maybe they can add a UI to the configure queues and to display statistics about data stores.

    For how long have I used the solution?

    I have used Kafka for about a year.

    What do I think about the stability of the solution?

    So far, we have not encountered any stability issues.

    What do I think about the scalability of the solution?

    We have not had any scalability issues. The product is horizontally scalable, so adding extra hardware is all that is needed.

    How are customer service and technical support?

    We haven’t needed technical support with the product yet.

    Which solution did I use previously and why did I switch?

    I think performance-wise, the product is very good and fits in our use case. We used other distributed message queues, but all products have their own use case

    How was the initial setup?

    Initial setup wasn’t really complex. We use Kafka through Hortonworks Suite, which comes with many other big data tools. Ambari makes it easy to setup

    What's my experience with pricing, setup cost, and licensing?

    Licensing and pricing was handled by my management, so I don’t have much knowledge there.

    What other advice do I have?

    Give it a try. It’s a valuable, high-performance, distributed processing tool.

    Disclosure: I am a real user, and this review is based on my own experience and opinions.