Lead Data Scientist at a transportation company with 51-200 employees
Real User
Top 20
Offers a free version but needs to improve the support offered to users
Pros and Cons
  • "The most valuable features of the solution revolve around areas like the latency part, where the tool offers very little latency and the sequencing part."
  • "One complexity that I faced with the tool stems from the fact that since it is not kind of a stand-alone application, it won't integrate with native cloud, like AWS or Azure."

What is our primary use case?

I was planning to use the tool for real-time analysis in terms of data processing and real-time analytics workflows. The real-time IoT data comes through with a few challenges, and that is for one time, so it is more like a Kafka topic. I want to actually use multiple Kafka topics where one of them can be directly fed into the data pipeline, another one can be fed into the real-time alert system, and the next one can be fed into machine learning.

How has it helped my organization?

The most valuable features of the solution revolve around areas like the latency part, where the tool offers very little latency and the sequencing part. The sequencing part actually helps to aggregate things in a way that I don't need to write another function or kind of sequence it, and I write an aggregate function to figure out the maximum value in the last ten samples.

What needs improvement?

One complexity that I faced with the tool stems from the fact that since it is not kind of a stand-alone application, it won't integrate with native cloud, like AWS or Azure. Apache Kafka has another mask on it, so if users can have a direct service, like Grafana, that can actually be used as a stand-alone tool with Grafana cloud, or you can use a mix of AWS and Grafana, so there is not much difference with it. I expect Apache Kafka to have Grafana's same nature.

The product's support and the cloud integration capabilities are areas of concern where improvements are required.

For how long have I used the solution?

I have been using Apache Kafka for a year.

Buyer's Guide
Apache Kafka
May 2024
Learn what your peers think about Apache Kafka. Get advice and tips from experienced pros sharing their opinions. Updated: May 2024.
772,277 professionals have used our research since 2012.

What do I think about the stability of the solution?

Stability-wise, I rate the solution an eight out of ten.

What do I think about the scalability of the solution?

Scalability-wise, I rate the solution an eight out of ten.

Around four people in my company use the product.

How are customer service and support?

I did not interact much with the product technical support team. I did not have dedicated support that responded to all my queries since I was using the product's free version. I rate the support a seven out of ten.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

I have worked with Databricks. I use Databricks and Apache Kafka simultaneously.

How was the initial setup?

The product's deployment phase is neither complex nor straightforward. As the software has evolved a lot, users can actually keep it even simpler by opting for a plug-and-play model.

The solution is deployed on an on-premises model.

The solution can be deployed in two or three days.

What about the implementation team?

I was involved with the tool's installation process.

What was our ROI?

I cannot comment on the tool's ROI since I did not use it for production purposes.

What's my experience with pricing, setup cost, and licensing?

I was using the product's free version.

What other advice do I have?

I did not come across any scenarios involving fault tolerance because when it comes to the issue data consistency issues, like missing or incorrect value of data are actually part of the system where the data is being fed. Nevertheless here, when it comes to the missing values, I never tried the option, especially whenever a value is missing, that can allow one to impute the value with another parameter.

Speaking about if I incorporated any emerging data stream streaming trends in Apache Kafka workflows, for example, utilization of AI, I would say that I use it as a local system, so if I have an EC2 server where I kind of read the sample and then populate the regression and reintegration model on top of it, but that is done locally and not on the cloud.

I recommend the product to those who plan to use it. I like Kafka and Flink, and I want to actually create a system in AWS mainly for real-time streaming so that I don't need to worry about multiple data copies.

Considering the improvements needed in the product's support, and the cloud integration capabilities, while looking at the simplicity during the installation phase, I rate the tool a seven out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
PeerSpot user
Technical Lead at Interface Fintech Ltd
Real User
Top 10
This very scalable solution works great and is super fast, but I would like less of a learning curve around creating brokers and topics
Pros and Cons
  • "The solution is very scalable. We started with a cluster of three and then scaled it to seven."
  • "I would like them to reduce the learning curve around the creation of brokers and topics. They also need to improve on the concept of the partitions."

What is our primary use case?

We use an open-source version of this solution, and we have two deployments of it. One is on-prem, and the other is in the cloud. We use the on-prem version to aggregate our logs. We use the cloud version to manage queues for financial services. 

What is most valuable?

It just works and it's super fast. We were struggling with a Rabbit MQ cluster, so the Apache cluster is way easier.

What needs improvement?

I would like them to reduce the learning curve around the creation of brokers and topics. They also need to improve on the concept of the partitions. 

As for features, RabbitMQ has an instant response feature where you can send a queue and get an instant response, but Kafka only has one way to send queues. If that's something they could improve on, it would be great.

For how long have I used the solution?

This is my second year working with this solution. 

What do I think about the stability of the solution?

I think it's very stable. I would rate the stability as a four or five out of five. 

What do I think about the scalability of the solution?

The solution is very scalable. We started with a cluster of three and then scaled it to seven. I would give the solution a five out of five for scalability. Currently, we have 20+ employees on the technical team that are using the solution. 

We provide outsource services for other institutions. There is a whole set queue management form, and we have about five institutions, with three technical teams that use the same cluster.

How was the initial setup?

There was a little learning curve, but we managed it. I think it took us around six weeks to complete the deployment. 

What about the implementation team?

We have a team of three people who handled the deployment in-house. They also handle the maintenance for the solution. 

What other advice do I have?

We do not use customer support, but there is a lot of documentation available.

I would definitely recommend this solution to other people. I would rate it as an eight out of ten. 

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Apache Kafka
May 2024
Learn what your peers think about Apache Kafka. Get advice and tips from experienced pros sharing their opinions. Updated: May 2024.
772,277 professionals have used our research since 2012.
Barista Brewing Espresso at Linkedln
Real User
Top 20
Great horizontal scaling, design with library simplicity
Pros and Cons
  • "Good horizontal scaling and design."
  • "Lacks elasticity and the ability to scale down."

What is our primary use case?

Our primary use case of this solution is for data integration and for real-time data consumption. I'm a senior staff engineer for data and infrastructure and we are customers of Apache. 

What is most valuable?

I love the simplicity of the library and the design as well as the architectural concept which is like horizontal scaling.

What needs improvement?

When compared to other commercial competitors, Kafka doesn't have the ability to scale down, the elasticity is lacking in the product. The other issue for us is the delayed queue, which was available to us in the commercial software but not in Kafka. It's something we use in most of our applications for deferred processing and I know it's available in other solutions. I'd like to see some tooling support and language support in the open source version. 

For how long have I used the solution?

I've been using this solution for four years.

What do I think about the stability of the solution?

The stability is good. 

What do I think about the scalability of the solution?

The solution scales horizontally and scales better than its competitors. We have around 400 to 500 microservices consuming this cluster and the company has around 600 employees. We have four different verticals, each with around 100 engineers with 100 to 150 microservices. 90% of the microservices have a touchpoint with Kafka.

How are customer service and support?

I think the community is very good and will respond if you raise a ticket. We also use external third-party libraries that were built in GitHub. It would be good to have some direct support from Apache.

Which solution did I use previously and why did I switch?

Four years ago we were using Rabbit MQ but we switched to Kafka because Rabbit was designed for a very narrow use case. It became difficult for us to run and maintain that server and our client libraries. We had a huge outage, so we shifted to Kafka because of the simplicity in the architecture.

How was the initial setup?

The initial setup was simple although we had a couple of hiccups. It took around a week but that was several years ago and we haven't had any problems since. Our team carried out the deployment and we currently have a few engineers who deal with maintenance. 

What's my experience with pricing, setup cost, and licensing?

We are currently using the open-source version. 

What other advice do I have?

There is room for improvement with this solution so I rate it eight out of 10. 

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
CTO at Estrada & Consultores
Real User
Great scalability with a high throughput and a helpful online community
Pros and Cons
  • "The solution is very easy to set up."
  • "While the solution scales well and easily, you need to understand your future needs and prep for the peaks."

What is our primary use case?

We primarily use the solution for upstreaming messages with different payload for our applications ranging from iOT, Food delivery and patient monitoring. 

For example for one solution we have a real-time location finding, whereby a customer for the food delivery solution wants to know, where his or her order is on a map. The delivery person's mobile phone would start publishing its location to Kafka, and then Kafka processes it, and then publishes it to subscribers, or, in this case, the customer. It allows them to see information in real-time almost instantly.

How has it helped my organization?

Apache Kafka has became our main component on almost all our distributed solutions. It has helped us to delivery fast distributing messages to our customer's applications.

What is most valuable?

The solution is good for publishing transactions for commercial solutions whereby a duplicate will not affect any part of the system.

The solution is very easy to set up.

The stability is very good.

There's an online community available that can help answer questions or troubleshoot problems. 

The scalability of Kafka is very good.

It provides high throughput.

What needs improvement?

Kafka can allow for duplicates, which isn't as helpful in some of our scenarios. They need to work on their duplicate management capabilities but for now developers should ensure idempotent operations for such scenarios.

While the solution scales well and easily, you need to understand your future needs and prep for the peaks. 

For how long have I used the solution?

I've been using the solution for four years so far.

What do I think about the stability of the solution?

The stability is excellent. There are no bugs or glitches. It doesn't crash or freeze. It's reliable. 

What do I think about the scalability of the solution?

Scaling is not really a problem with Kafka. We have used Kubernetes clusters and it is working very well. It scales up and down, almost automatically almost unnoticeable to the consumers, based upon our configuration. Kafka is just one pod inside of our cluster that scales horizontally.

We have a couple of customers that also have vertical scaling, meaning that, there's more CPU, more memory available to the Kafka pod.

How are customer service and technical support?

For Kafka, we don't actually require support from the company. We usually have people experienced in-house and sometimes we just ask in the community. 

How was the initial setup?

The initial setup is easy. The majority of the tools today are really very easy to configure and setup. Docker Containers and Kubernetes, actually, have made life easier for architects as well as developers.

Nowadays, you just install the container, and then you don't have to really manage the internals at libraries, OS levels, et cetera. You just run the container. Everything is containerized.

What's my experience with pricing, setup cost, and licensing?

Apache Kafka is OpenSource, you can set it up in your own Kubernetes cluster or subscribe to Kafka providers online as a service.

What other advice do I have?

New users should understand the product capabilities. Often, people will start putting their hands in new products without knowing the capabilities and the disadvantages in specific scenarios. In our case for example, We haven't used Kafka for financial transaction processing, for which we still use IBM MQ, but It really depends upon your knowledge and experience with the product. My advice is to understand the product very well, its pros and cons and work from there.

Finally I'd rate the solution at a nine out of ten.

Which deployment model are you using for this solution?

Hybrid Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Senior Developer at a financial services firm with 10,001+ employees
Real User
Top 20
User-friendly solution but problems with latency
Pros and Cons
  • "Kafka's most valuable feature is its user-friendliness."
  • "There are some latency problems with Kafka."

What is our primary use case?

I primarily use Kafka in the investment banking sector to update prices and inform clients of updates.

What is most valuable?

Kafka's most valuable feature is its user-friendliness.

What needs improvement?

There are some latency problems with Kafka.

For how long have I used the solution?

I've been using Kafka for more than three years.

What other advice do I have?

I would give Kafka a rating of seven out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Enterprise Architect at Smals vzw
Real User
Top 20
Effective event sequencing, seamless system interactions, and beneficial data management
Pros and Cons
  • "There are numerous possibilities that can be explored. While it may be challenging to fully comprehend the potential advantages, one key aspect is the ability to establish a proper sequence of events rather than simply dealing with a jumbled group of occurrences. These events possess their own timestamps, even if they were not initially provided with one, and are arranged in a chronological order that allows for a clear understanding of the progression of the events."
  • "There have been some challenges with monitoring Apache Kafka, as there are currently only a few production-grade solutions available, which are all under enterprise license and therefore not easily accessible. The speaker has not had access to any of these solutions and has instead relied on tools, such as Dynatrace, which do not provide sufficient insight into the Apache Kafka system. While there are other tools available, they do not offer the same level of real-time data as enterprise solutions."

What is our primary use case?

Apache Kafka is used for more than only a messaging bus but also served as a database to store information. It functioned as a streamer, similar to ETL, to manipulate and transform events before migrating them to other systems for use. The database could also act as a cache. Apache Kafka is used as a database broker, streamer, and source of truth for multiple systems due to its ability to maintain events for at least 10 days. It provided both synchronous and asynchronous communication, making it a complex system that would be easier to understand through diagrams or sketches.

We use reactive frameworks.

How has it helped my organization?

From my experience with Apache Kafka, one of the most notable advantages is its ability to maintain a comprehensive record of historical data that includes every update, alteration, and version of information, unlike a conventional relational database. This feature allows for seamless tracking and analysis of the progression and transformation of the data over time, enabling users to easily review and analyze the history of the information.

The solution has the capability for various systems to effortlessly interact with one another without prior knowledge of their existence, current operational status, or specific configurations. By utilizing service buses and dynamic integration, data can be distributed across networks and retrieved in a way that is most suitable for each system's requirements. In addition, Apache Kafka allows for the modification of data to provide diverse clients, consumers, or observers with unique and varying data. The replication of data can produce multiple versions, and this data can be adjusted to fit various needs. With the use of probes, one can alter the behavior of the transformation process, thereby changing the way in which data is transformed and the output produced. Overall, working with Apache Kafka has brought about an array of benefits, enabling seamless system interactions and allowing for the customization and modification of data to meet individual requirements.

What is most valuable?

There are numerous possibilities that can be explored. While it may be challenging to fully comprehend the potential advantages, one key aspect is the ability to establish a proper sequence of events rather than simply dealing with a jumbled group of occurrences. These events possess their own timestamps, even if they were not initially provided with one, and are arranged in a chronological order that allows for a clear understanding of the progression of the events.

What needs improvement?

There have been some challenges with monitoring Apache Kafka, as there are currently only a few production-grade solutions available, which are all under enterprise license and therefore not easily accessible. The speaker has not had access to any of these solutions and has instead relied on tools, such as Dynatrace, which do not provide sufficient insight into the Apache Kafka system. While there are other tools available, they do not offer the same level of real-time data as enterprise solutions.

One additional area that I think could benefit from improvement is the deployment process on OpenShift. This particular deployment is quite challenging and requires the activation of certain security measures as well as integration with other systems. It's not a straightforward process and typically requires engineers who are highly skilled and have extensive experience with Apache Kafka to carry out these tasks. Therefore, I believe that there is a need for progress in this area, and some tools that can provide information, assistance, and help make the whole process easier would be greatly appreciated.

For how long have I used the solution?

I have been using Apache Kafka for approximately four years.

What do I think about the stability of the solution?

The solution is stable if you have set it up correctly.

What do I think about the scalability of the solution?

Apache Kafka is a scalable solution.

How are customer service and support?

I have not escalated any questions to technical support because Apache Kafka is an open-source system. However, Confluent and other companies sell support and enterprise solutions to make it more convenient and streamline the work. They offer tools, such as a monitoring tool with a visual interface, which provides a lot of information and buttons to press for correction or change without touching the code. Each of those buttons hypothetically could have helped the situation, but it is unclear what they do exactly, it is best to call the data center and ask. If you buy their service, you have access to all the enterprise comforts.

How was the initial setup?

Setting up Apache Kafka is, is not an easy task, especially when trying to containerize it and make it controllable. This is because Apache Kafka has its own distributed mechanism for staying alive, checking readiness, replicating, and scaling. Ensuring that it complies with Kubernetes or OpenShift Orchestrator requires careful attention, as there is a risk of two masters attempting to perform the same task and ultimately undoing each other's work.

In comparison to Kubernetes, OpenShift is a highly skilled and advanced implementation infrastructure that automatically manages and orchestrates all the steps required for an application setup. It operates at a higher level of abstraction and eliminates the need for manual operations that are required with Kubernetes. While Kubernetes can run an application with some pipeline and configuration, OpenShift takes care of everything from finding the required images to creating ports and connecting databases. Although manual changes can be made, it's not necessary as OpenShift offers a much more course-grained management approach.

What about the implementation team?

One skillful DevOps engineer can implement the solution.

What's my experience with pricing, setup cost, and licensing?

Apache Kafka is an open-source solution.

What other advice do I have?

The maintenance of Apache Kafka is crucial due to the complexity of the system with numerous microservices and systems communicating through Apache Kafka, requiring proper integration and configuration to prevent overloading and ensure a healthy cluster. The task is not easy and requires knowledge of the various adjustable parameters, as misadjusting even one of them can greatly slow down the cluster. For example, if the consumer group changes frequently, the messages must be regrouped and reassigned, causing significant delays. Therefore, configuring Apache Kafka correctly is essential to avoid high latency issues.

I would strongly suggest others give Apache Kafka a chance and explore the various advantages that it can offer, especially since it should not be perceived as a message bus or broker but rather an enterprise bus designed for data manipulation. It has the ability to transform data, store and reject it, and even maintain different versions of the same data simultaneously. Moreover, it operates on a pull mechanism rather than a push mechanism, which takes away the risk of losing data and places the responsibility for data loss on the consumer. On the other hand, it also ensures that the data is always available within the specified window and allows for easy replication of the past, which is extremely helpful in situations such as those involving a hacked bank database. With Apache Kafka, you can efficiently go back in time, obtain the required status and events, and make changes accordingly, without the need to go through each transaction separately. Thus, using this solution can make data management much more efficient and convenient.

I rate Apache Kafka an eight out of ten.

In order to improve its user-friendliness, engineer-friendliness, and DevOps-friendliness, the system must undertake various tasks, such as enhancing the overall operation and configuration, ensuring seamless integration with other systems, and adapting to security layers in a more comprehensive and generic manner. This will require significant efforts to make the system more functional, secure, and efficient.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Solution Architect at a manufacturing company with 10,001+ employees
Real User
Good performance when a high throughput is required, but they need to implement a portal
Pros and Cons
  • "The processing power of Apache Kafka is good when you have requirements for high throughput and a large number of consumers."
  • "They need to have a proper portal to do everything because, at this moment, Kafka is lagging in this regard."

What is our primary use case?

I am a solution architect and I used Apache Kafka in this role.

What is most valuable?

The processing power of Apache Kafka is good when you have requirements for high throughput and a large number of consumers. 

What needs improvement?

They need to have a proper portal to do everything because, at this moment, Kafka is lagging in this regard. It could be used to do the preprocessing or the configurations, instead of directly doing it on the queues or the topics. If you look at Solace, for example, they have come up with a portal where you don't need to touch these activities. You don't need to access the platform beyond the portal.

For how long have I used the solution?

I have used Apache Kafka for between one and one and a half years.

What do I think about the stability of the solution?

Apache Kafka is stable.

What do I think about the scalability of the solution?

This is certainly a scalable product. There are currently 30 or more people using it but we expect to scale beyond this. It is going to be an enterprise tool within the company.

How are customer service and technical support?

I am not directly interacting with the service people at this moment. It is limited for now because we are still exploring and effecting our architecture and design, and deciding how to align it with our existing strategy. There is not much progress in this regard and it will take more time.

Which solution did I use previously and why did I switch?

Prior to working with Apache Kafka, there was no messaging queue system. For many projects, they were using the Azure Event Hub, but it was not serving the purpose. So, we started moving towards Kafka, and that's why we have procured Confluent Kafka.

Several months ago, I stopped working on Apache Kafka. I am now working on Confluent Kafka. It was not my decision to switch solutions.

My current organization has chosen Confluent Kafka for various reasons. One is that we have a large number of streaming requirements, and Confluent Kafka has one more layer on top of Apache Kafka to do this transformation and connecting with other multiple lane systems.

There are out-of-the-box features along with the KSQL features. For example, things like fetching the events are kind of query-based. So, that seems to be a good feature for our requirements. That is why we ultimately procured Confluent Kafka.

For some time, I have also worked with Solace and it has an advantage. Given that my core strength is integration, I work with integration platforms such as MuleSoft, Azure functions, then TIBCO. Based on our requirements, I found that the event-driven APA implementation with Solace was easier.

Solace also has a top-notch solution for portal management and you register your producers, consumers, and preprocessing logic. All of these things are pretty easy to do. This is an area where Kafka could use some enhancement.

How was the initial setup?

I don't think that the initial setup was a complex process.

Which other solutions did I evaluate?

MQ messaging systems are not my core strength but for any integration platform where we have a large number of APIs and events, to integrate with an IoT platform, for example, I found Kafka is better than ActiveMQ.

I'm not getting into in MQTT or other things but comparatively, when you compare ActiveMQ and Kafka, Kafka has done better.

What other advice do I have?

I think that many people are using Apache Kafka just as a publishing and subscription model, but I feel that Kafka is better than that. Furthermore, Confluent Kafka is even more than that.

Confluent Kafka is offering features that are equal to those of a data lake. You can do lots with data, and huge data can be persisted. However, many people are not using that feature. Rather than make use of persistence logic, they are pushing the messages and consuming them. Maybe if people were using it for persistence, they would see the impact or real power of Kafka.

I would rate this solution a seven out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Roger Sabourin - PeerSpot reviewer
Roger SabourinSenior Manager, Analyst Relations at a tech vendor with 201-500 employees
Real User

You're in luck, Solace's PubSub+ Event Portal for Kafka does all the things you're looking for, specifically for your Kafka environments, be they open source Kafka, Confluent or Amazon MSK.  Check it out, or request a free trial at https://solace.com/products/po...

Technical Director at Metrofibre Networx
Real User
Top 20
A reliable and stable stream-processing platform with a good customer support team
Pros and Cons
  • "As a software developer, I have found Apache Kafka's support to be the most valuable...The solution is easy to integrate with any of our systems."
  • "The solution should be easier to manage. It needs to improve its visualization feature in the next release."

What is our primary use case?

We have a camera monitoring security system, in which we post messages onto the queue, which involves various steps in processing the message, like checking for the number of clients, running it against the police data, etc. So Apache Kafka is a security application with many types of consumers. We set up a workflow system with different sites, which works well.

What is most valuable?

As a software developer, I have found Apache Kafka's support to be the most valuable. The support team sends available information regarding the library and how to use the plugins. The solution is easy to integrate with any of our systems. We have other alternatives, but this is the one that seems to be the most popular database support.

What needs improvement?

The solution should be easier to manage. It needs to improve its visualization feature in the next release.

For how long have I used the solution?

I have been using it for three years.

What do I think about the stability of the solution?

It is a stable solution. We never faced any issues. I rate it a ten out of ten.

What do I think about the scalability of the solution?

It is a scalable solution. We set up a category with different consumers balancing things, which works as I thought.

How are customer service and support?

I did not contact the technical support as it was not required.

Which solution did I use previously and why did I switch?

We used Linksys for visualization along with Confluence, but there needed to be more value. For us, Apache Kafka is the best solution based on the support and third-party systems as it builds our subsystems around because we have a lot of development teams.

How was the initial setup?

The initial setup was straightforward because I've got a lot of experience in this field. But even for a junior person, it would be fine. There are so many resources, and it's very well documented as they are a premium service provider. So it makes the setup just easier.

The deployment takes a few days.

We set up a free cluster for this service because we use a lot of data. We use ZooKeeper to secure different products for instruction with the cluster. But, it was easy as it is a popular product, and much information is available. It can download data, like fifty gigs per day. We can effectively handle it all as well. I never developed any issues.

What's my experience with pricing, setup cost, and licensing?

It's a premium product, so it is not price-effective for us.

What other advice do I have?

Apache Kafka is an out-of-the-box, reliable solution. For people in the fiber business, we need a reliable solution, and this solution is hundred percent reliable. If it is set up correctly, it hardly has any issues due to the more extensive user base; even if there are issues, it is sorted by the community. I rate it nine out of ten.

Which deployment model are you using for this solution?

Private Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user