We performed a comparison between Apache Flink and Databricks based on real PeerSpot user reviews.
Find out in this report how the two Streaming Analytics solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."Apache Flink is meant for low latency applications. You take one event opposite if you want to maintain a certain state. When another event comes and you want to associate those events together, in-memory state management was a key feature for us."
"Apache Flink's best feature is its data streaming tool."
"The event processing function is the most useful or the most used function. The filter function and the mapping function are also very useful because we have a lot of data to transform. For example, we store a lot of information about a person, and when we want to retrieve this person's details, we need all the details. In the map function, we can actually map all persons based on their age group. That's why the mapping function is very useful. We can really get a lot of events, and then we keep on doing what we need to do."
"Another feature is how Flink handles its radiuses. It has something called the checkpointing concept. You're dealing with billions and billions of requests, so your system is going to fail in large storage systems. Flink handles this by using the concept of checkpointing and savepointing, where they write the aggregated state into some separate storage. So in case of failure, you can basically recall from that state and come back."
"The setup was not too difficult."
"The product helps us to create both simple and complex data processing tasks. Over time, it has facilitated integration and navigation across multiple data sources tailored to each client's needs. We use Apache Flink to control our clients' installations."
"Allows us to process batch data, stream to real-time and build pipelines."
"This is truly a real-time solution."
"Databricks covers end-to-end data analytics workflow in one platform, this is the best feature of the solution."
"It is fast, it's scalable, and it does the job it needs to do."
"The solution's features are fantastic and include interactive clusters that perform at top speed when compared to other solutions."
"The Delta Lake data type has been the most useful part of this solution. Delta Lake is an opensource data type and it was implemented and invented by Databricks."
"The load distribution capabilities are good, and you can perform data processing tasks very quickly."
"The setup was straightforward."
"The capacity of use of the different types of coding is valuable. Databricks also has good performance because it is running in spark extra storage, meaning the performance and the capacity use different kinds of codes."
"The main features of the solution are efficiency."
"In a future release, they could improve on making the error descriptions more clear."
"We have a machine learning team that works with Python, but Apache Flink does not have full support for the language."
"One way to improve Flink would be to enhance integration between different ecosystems. For example, there could be more integration with other big data vendors and platforms similar in scope to how Apache Flink works with Cloudera. Apache Flink is a part of the same ecosystem as Cloudera, and for batch processing it's actually very useful but for real-time processing there could be more development with regards to the big data capabilities amongst the various ecosystems out there."
"In terms of improvement, there should be better reporting. You can integrate with reporting solutions but Flink doesn't offer it themselves."
"Apache Flink's documentation should be available in more languages."
"There is room for improvement in the initial setup process."
"There is a learning curve. It takes time to learn."
"The state maintains checkpoints and they use RocksDB or S3. They are good but sometimes the performance is affected when you use RocksDB for checkpointing."
"The solution could be improved by adding a feature that would make it more user-friendly for our team. The feature is simple, but it would be useful. Currently, our team is more familiar with the language R, but Databricks requires the use of Jupyter Notebooks which primarily supports Python. We have tried using RStudio, but it is not a fully integrated solution. To fully utilize Databricks, we have to use the Jupyter interface. One feature that would make it easier for our team to adopt the Jupyter interface would be the ability to select a specific variable or line of code and execute it within a cell. This feature is available in other Jupyter Notebooks outside of Databricks and in our own IDE, but it is not currently available within Databricks. If this feature were added, it would make the transition to using Databricks much smoother for our team."
"There could be more support for automated machine learning in the database. I would like to see more ways to do analysis so that the reporting is more understandable."
"I have had some issues with some of the Spark clusters running on Databricks, where the Spark runtime and clusters go up and down, which is an area for improvement."
"It would be very helpful if Databricks could integrate with platforms in addition to Azure."
"Pricing is one of the things that could be improved."
"I would like to see the integration between Databricks and MLflow improved. It is quite hard to train multiple models in parallel in the distributed fashions. You hit rate limits on the clients very fast."
"I'm not the guy that I'm working with Databricks on a daily basis. I'm on the management team. However, my team tells me there are limitations with streaming events. The connectors work with a small set of platforms. For example, we can work with Kafka, but if we want to move to an event-driven solution from AWS, we cannot do it. We cannot connect to all the streaming analytics platforms, so we are limited in choosing the best one."
"The integration features could be more interesting, more involved."
Apache Flink is ranked 5th in Streaming Analytics with 15 reviews while Databricks is ranked 1st in Streaming Analytics with 77 reviews. Apache Flink is rated 7.6, while Databricks is rated 8.2. The top reviewer of Apache Flink writes "A great solution with an intricate system and allows for batch data processing". On the other hand, the top reviewer of Databricks writes "A nice interface with good features for turning off clusters to save on computing". Apache Flink is most compared with Amazon Kinesis, Spring Cloud Data Flow, Azure Stream Analytics, Apache Pulsar and Google Cloud Dataflow, whereas Databricks is most compared with Amazon SageMaker, Informatica PowerCenter, Microsoft Azure Machine Learning Studio, Dataiku Data Science Studio and Oracle Analytics Cloud. See our Apache Flink vs. Databricks report.
See our list of best Streaming Analytics vendors.
We monitor all Streaming Analytics reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.