We performed a comparison between Apache Spark and Google Cloud Dataflow based on real PeerSpot user reviews.
Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop."With Hadoop-related technologies, we can distribute the workload with multiple commodity hardware."
"The most valuable feature of Apache Spark is its flexibility."
"The most valuable feature of Apache Spark is its ease of use."
"The deployment of the product is easy."
"The most valuable feature of Apache Spark is its memory processing because it processes data over RAM rather than disk, which is much more efficient and fast."
"The product is useful for analytics."
"The most crucial feature for us is the streaming capability. It serves as a fundamental aspect that allows us to exert control over our operations."
"The processing time is very much improved over the data warehouse solution that we were using."
"The solution allows us to program in any language we desire."
"It is a scalable solution."
"The most valuable features of Google Cloud Dataflow are the integration, it's very simple if you have the complete stack, which we are using. It is overall very easy to use, user-friendly friendly, and cost-effective if you know how to use it. The solution is very flexible for programmers, if you know how to do scripts or program in Python or any other language, it's extremely easy to use."
"I don't need a server running all the time while using the tool. It is also easy to setup. The product offers a pay-as-you-go service."
"Google Cloud Dataflow is useful for streaming and data pipelines."
"The service is relatively cheap compared to other batch-processing engines."
"The best feature of Google Cloud Dataflow is its practical connectedness."
"The most valuable features of Google Cloud Dataflow are scalability and connectivity."
"It requires overcoming a significant learning curve due to its robust and feature-rich nature."
"Apache Spark provides very good performance The tuning phase is still tricky."
"Include more machine learning algorithms and the ability to handle streaming of data versus micro batch processing."
"We use big data manager but we cannot use it as conditional data so whenever we're trying to fetch the data, it takes a bit of time."
"I would like to see integration with data science platforms to optimize the processing capability for these tasks."
"Stream processing needs to be developed more in Spark. I have used Flink previously. Flink is better than Spark at stream processing."
"In data analysis, you need to take real-time data from different data sources. You need to process this in a subsecond, do the transformation in a subsecond, and all that."
"The initial setup was not easy."
"The technical support has slight room for improvement."
"They should do a market survey and then make improvements."
"The solution's setup process could be more accessible."
"The authentication part of the product is an area of concern where improvements are required."
"Google Cloud Data Flow can improve by having full simple integration with Kafka topics. It's not that complicated, but it could improve a bit. The UI is easy to use but the experience could be better. There are other tools available that do a better job."
"Google Cloud Dataflow should include a little cost optimization."
"The deployment time could also be reduced."
"I would like Google Cloud Dataflow to be integrated with IT data flow and other related services to make it easier to use as it is a complex tool."
Apache Spark is ranked 1st in Hadoop with 60 reviews while Google Cloud Dataflow is ranked 7th in Streaming Analytics with 10 reviews. Apache Spark is rated 8.4, while Google Cloud Dataflow is rated 7.8. The top reviewer of Apache Spark writes "Reliable, able to expand, and handle large amounts of data well". On the other hand, the top reviewer of Google Cloud Dataflow writes "Easy to use for programmers, user-friendly, and scalable". Apache Spark is most compared with Spring Boot, AWS Batch, Spark SQL, SAP HANA and Cloudera Distribution for Hadoop, whereas Google Cloud Dataflow is most compared with Databricks, Apache NiFi, Amazon MSK, Amazon Kinesis and Talend Data Streams.
We monitor all Hadoop reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.