We performed a comparison between Cloudera DataFlow and Databricks based on real PeerSpot user reviews.
Find out in this report how the two Streaming Analytics solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."This solution is very scalable and robust."
"DataFlow's performance is okay."
"The initial setup was not so difficult"
"Databricks helps crunch petabytes of data in a very short period of time."
"We have the ability to scale, collaborate and do machine learning."
"The solution's features are fantastic and include interactive clusters that perform at top speed when compared to other solutions."
"Databricks gives us the ability to build a lakehouse framework and do everything implicit to this type of database structure. We also like the ability to stream events. Databricks covers a broad spectrum, from reporting and machine learning to streaming events. It's important for us to have all these features in one platform."
"The setup is quite easy."
"Databricks makes it really easy to use a number of technologies to do data analysis. In terms of languages, we can use Scala, Python, and SQL. Databricks enables you to run very large queries, at a massive scale, within really good timeframes."
"Databricks' most valuable features are the workspace and notebooks. Its integration, interface, and documentation are also good."
"The capacity of use of the different types of coding is valuable. Databricks also has good performance because it is running in spark extra storage, meaning the performance and the capacity use different kinds of codes."
"Although their workflow is pretty neat, it still requires a lot of transformation coding; especially when it comes to Python and other demanding programming languages."
"It is not easy to use the R language. Though I don't know if it's possible, I believe it is possible, but it is not the best language for machine learning."
"It's an outdated legacy product that doesn't meet the needs of modern data analysts and scientists."
"I would like to see the integration between Databricks and MLflow improved. It is quite hard to train multiple models in parallel in the distributed fashions. You hit rate limits on the clients very fast."
"The solution has some scalability and integration limitations when consolidating legacy systems."
"In the next release, I would like to see more optimization features."
"Databricks is an analytics platform. It should offer more data science. It should have more features for data scientists to work with."
"I would like more integration with SQL for using data in different workspaces."
"Pricing is one of the things that could be improved."
"The product should incorporate more learning aspects. It needs to have a free trial version that the team can practice."
"The solution could be improved by adding a feature that would make it more user-friendly for our team. The feature is simple, but it would be useful. Currently, our team is more familiar with the language R, but Databricks requires the use of Jupyter Notebooks which primarily supports Python. We have tried using RStudio, but it is not a fully integrated solution. To fully utilize Databricks, we have to use the Jupyter interface. One feature that would make it easier for our team to adopt the Jupyter interface would be the ability to select a specific variable or line of code and execute it within a cell. This feature is available in other Jupyter Notebooks outside of Databricks and in our own IDE, but it is not currently available within Databricks. If this feature were added, it would make the transition to using Databricks much smoother for our team."
Cloudera DataFlow is ranked 13th in Streaming Analytics with 3 reviews while Databricks is ranked 1st in Streaming Analytics with 78 reviews. Cloudera DataFlow is rated 6.6, while Databricks is rated 8.2. The top reviewer of Cloudera DataFlow writes "A scalable and robust platform for analyzing data". On the other hand, the top reviewer of Databricks writes "A nice interface with good features for turning off clusters to save on computing". Cloudera DataFlow is most compared with Confluent, Amazon MSK, Spring Cloud Data Flow, Informatica Data Engineering Streaming and Hortonworks Data Platform, whereas Databricks is most compared with Amazon SageMaker, Informatica PowerCenter, Dataiku Data Science Studio, Microsoft Azure Machine Learning Studio and Dremio. See our Cloudera DataFlow vs. Databricks report.
See our list of best Streaming Analytics vendors.
We monitor all Streaming Analytics reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.