We compared Databricks and Google Cloud Dataflow based on our user's reviews in several parameters.
Databricks excels in collaborative features, customer service, and pricing, with a focus on data insights. Google Cloud Dataflow stands out for scalability, real-time processing, ease of use, and ROI, with a focus on data transformation. Areas for improvement in Databricks include data visualization and pricing flexibility, while Google Cloud Dataflow could enhance integration, documentation, and error handling.
Features: Databricks stands out with its seamless integration with various platforms, collaborative capabilities, and advanced analytics. On the other hand, Google Cloud Dataflow offers scalability, easy setup, real-time processing, data transformation, and seamless integration with other Google Cloud services.
Pricing and ROI: The setup cost for Databricks product is reported to be straightforward and hassle-free, while Google Cloud Dataflow offers a relatively low setup cost. This makes it easy and affordable for users to get started with the service., Databricks users report increased efficiency, productivity, and data analysis capabilities. Google Cloud Dataflow users mention improved scalability, reduced costs, and flexibility provided by the platform.
Room for Improvement: Databricks has room for improvement in data visualization, monitoring, external integration, documentation, and flexible pricing. Google Cloud Dataflow needs better integration, documentation, error handling, pipeline customization, and improved performance for large-scale data processing.
Deployment and customer support: The user feedback indicates that the duration required for establishing a new tech solution varies for both Databricks and Google Cloud Dataflow. Some users mention spending three months on deployment and an additional week on setup for both products, while others report a week for both stages., Customers have praised the customer service and support offered by both Databricks and Google Cloud Dataflow. However, Databricks is highlighted for its efficient and effective support team, while Google Cloud Dataflow is commended for its availability of extensive resources for self-guidance.
The summary above is based on 56 interviews we conducted recently with Databricks and Google Cloud Dataflow users. To access the review's full transcripts, download our report.
"Imageflow is a visual tool that helps make it easier for business people to understand complex workflows."
"Databricks integrates well with other solutions."
"Databricks allows me to automate the creation of a cluster, optimized for machine learning and construct AI machine learning models for the client."
"The capacity of use of the different types of coding is valuable. Databricks also has good performance because it is running in spark extra storage, meaning the performance and the capacity use different kinds of codes."
"The processing capacity is tremendous in the database."
"The load distribution capabilities are good, and you can perform data processing tasks very quickly."
"We are completely satisfied with the ease of connecting to different sources of data or pocket files in the search"
"Databricks has improved my organization by allowing us to transform data from sources to a different format and feed that to the analytics, business intelligence, and reporting teams. This tool makes it easy to do those kinds of things."
"The most valuable features of Google Cloud Dataflow are the integration, it's very simple if you have the complete stack, which we are using. It is overall very easy to use, user-friendly friendly, and cost-effective if you know how to use it. The solution is very flexible for programmers, if you know how to do scripts or program in Python or any other language, it's extremely easy to use."
"Google Cloud Dataflow is useful for streaming and data pipelines."
"The service is relatively cheap compared to other batch-processing engines."
"The best feature of Google Cloud Dataflow is its practical connectedness."
"It is a scalable solution."
"The solution allows us to program in any language we desire."
"The product's installation process is easy...The tool's maintenance part is somewhat easy."
"The most valuable features of Google Cloud Dataflow are scalability and connectivity."
"I have had some issues with some of the Spark clusters running on Databricks, where the Spark runtime and clusters go up and down, which is an area for improvement."
"Pricing is one of the things that could be improved."
"There should be better integration with other platforms."
"The integration of data could be a bit better."
"Databricks doesn't offer the use of Python scripts by itself and is not connected to GitHub repositories or anything similar. This is something that is missing. if they could integrate with Git tools it would be an advantage."
"When I used the support, I had communication problems because of the language barrier with the agent. The accent was difficult to understand."
"The ability to customize our own pipelines would enhance the product, similar to what's possible using ML files in Microsoft Azure DevOps."
"The solution could be improved by adding a feature that would make it more user-friendly for our team. The feature is simple, but it would be useful. Currently, our team is more familiar with the language R, but Databricks requires the use of Jupyter Notebooks which primarily supports Python. We have tried using RStudio, but it is not a fully integrated solution. To fully utilize Databricks, we have to use the Jupyter interface. One feature that would make it easier for our team to adopt the Jupyter interface would be the ability to select a specific variable or line of code and execute it within a cell. This feature is available in other Jupyter Notebooks outside of Databricks and in our own IDE, but it is not currently available within Databricks. If this feature were added, it would make the transition to using Databricks much smoother for our team."
"The deployment time could also be reduced."
"Google Cloud Data Flow can improve by having full simple integration with Kafka topics. It's not that complicated, but it could improve a bit. The UI is easy to use but the experience could be better. There are other tools available that do a better job."
"There are certain challenges regarding the Google Cloud Composer which can be improved."
"The solution's setup process could be more accessible."
"The authentication part of the product is an area of concern where improvements are required."
"They should do a market survey and then make improvements."
"Google Cloud Dataflow should include a little cost optimization."
"When I deploy the product in local errors, a lot of errors pop up which are not always caught. The solution's error logging is bad. It can take a lot of time to debug the errors. It needs to have better logs."
Databricks is ranked 1st in Streaming Analytics with 77 reviews while Google Cloud Dataflow is ranked 7th in Streaming Analytics with 10 reviews. Databricks is rated 8.2, while Google Cloud Dataflow is rated 7.8. The top reviewer of Databricks writes "A nice interface with good features for turning off clusters to save on computing". On the other hand, the top reviewer of Google Cloud Dataflow writes "Easy to use for programmers, user-friendly, and scalable". Databricks is most compared with Amazon SageMaker, Informatica PowerCenter, Microsoft Azure Machine Learning Studio, Dataiku Data Science Studio and Microsoft Power BI, whereas Google Cloud Dataflow is most compared with Apache NiFi, Amazon Kinesis, Amazon MSK, Spring Cloud Data Flow and Apache Flink. See our Databricks vs. Google Cloud Dataflow report.
See our list of best Streaming Analytics vendors.
We monitor all Streaming Analytics reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.