We performed a comparison between Cloudera DataFlow and Databricks based on real PeerSpot user reviews.
Find out in this report how the two Streaming Analytics solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."DataFlow's performance is okay."
"The initial setup was not so difficult"
"This solution is very scalable and robust."
"Databricks gives us the ability to build a lakehouse framework and do everything implicit to this type of database structure. We also like the ability to stream events. Databricks covers a broad spectrum, from reporting and machine learning to streaming events. It's important for us to have all these features in one platform."
"When we have a huge volume of data that we want to process with speed, velocity, and volume, we go through Databricks."
"This solution offers a lake house data concept that we have found exciting. We are able to have a large amount of data in a data lake and can manage all relational activities."
"The most valuable feature is the Spark cluster which is very fast for heavy loads, big data processing and Pi Spark."
"Databricks is hosted on the cloud. It is very easy to collaborate with other team members who are working on it. It is production-ready code, and scheduling the jobs is easy."
"It is fast, it's scalable, and it does the job it needs to do."
"The simplicity of development is the most valuable feature."
"The built-in optimization recommendations halved the speed of queries and allowed us to reach decision points and deliver insights very quickly."
"Although their workflow is pretty neat, it still requires a lot of transformation coding; especially when it comes to Python and other demanding programming languages."
"It is not easy to use the R language. Though I don't know if it's possible, I believe it is possible, but it is not the best language for machine learning."
"It's an outdated legacy product that doesn't meet the needs of modern data analysts and scientists."
"Databricks can improve by making the documentation better."
"In the future, I would like to see Data Lake support. That is something that I'm looking forward to."
"It would be nice to have more guidance on integrations with ETLs and other data quality tools."
"The product cannot be integrated with a popular coding IDE."
"I would like more integration with SQL for using data in different workspaces."
"The solution could be improved by integrating it with data packets. Right now, the load tables provide a function, like team collaboration. Still, it's unclear as to if there's a function to create different branches and/or more branches. Our team had used data packets before, however, I feel it's difficult to integrate the current with the previous data packets."
"In the next release, I would like to see more optimization features."
"There are no direct connectors — they are very limited."
Cloudera DataFlow is ranked 13th in Streaming Analytics with 3 reviews while Databricks is ranked 2nd in Streaming Analytics with 78 reviews. Cloudera DataFlow is rated 6.6, while Databricks is rated 8.2. The top reviewer of Cloudera DataFlow writes "A scalable and robust platform for analyzing data". On the other hand, the top reviewer of Databricks writes "A nice interface with good features for turning off clusters to save on computing". Cloudera DataFlow is most compared with Confluent, Amazon MSK, Informatica Data Engineering Streaming, Hortonworks Data Platform and Spring Cloud Data Flow, whereas Databricks is most compared with Amazon SageMaker, Informatica PowerCenter, Dataiku Data Science Studio, Microsoft Azure Machine Learning Studio and Dremio. See our Cloudera DataFlow vs. Databricks report.
See our list of best Streaming Analytics vendors.
We monitor all Streaming Analytics reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.