Apache Spark vs Google Cloud Dataflow Comparison 2024

Apache Spark

Google Cloud Dataflow

Apache Spark

Read 60 Apache Spark reviews

2,498 views|1,884 comparisons

Google Cloud Dataflow

Read 10 Google Cloud Dataflow reviews

4,813 views|3,977 comparisons

Comparison Buyer's Guide

Download the complete report

Buyer's Guide

Hadoop

April 2024

Executive Summary

We performed a comparison between Apache Spark and Google Cloud Dataflow based on real PeerSpot user reviews.

Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop.

To learn more, read our detailed Hadoop Report (Updated: April 2024).

Download the complete report

768,578 professionals have used our research since 2012.

Featured Review

NitinKumar

Director of Enginnering at Sigmoid

Easy to code, fast, open-source, very scalable, and great for big data

Spark has been at the forefront of data processing engine. I have used Apache Spark for multiple projects for different clients. It is an excellent... Read more →

Darasimi Ajewole

Software Engineer at Formplus

Helps to run batch-specific jobs, but notifications for error messages could be more detailed

Migrating our batch processing jobs to Google Cloud Dataflow led to a reduction in cost by 70%.

Quotes From Members

We asked business professionals to review the solutions they use.
Here are some excerpts of what they said:

Pros

"With Hadoop-related technologies, we can distribute the workload with multiple commodity hardware.""The most valuable feature of Apache Spark is its flexibility.""The most valuable feature of Apache Spark is its ease of use.""The deployment of the product is easy.""The most valuable feature of Apache Spark is its memory processing because it processes data over RAM rather than disk, which is much more efficient and fast.""The product is useful for analytics.""The most crucial feature for us is the streaming capability. It serves as a fundamental aspect that allows us to exert control over our operations.""The processing time is very much improved over the data warehouse solution that we were using."

More Apache Spark Pros →

"The solution allows us to program in any language we desire.""It is a scalable solution.""The most valuable features of Google Cloud Dataflow are the integration, it's very simple if you have the complete stack, which we are using. It is overall very easy to use, user-friendly friendly, and cost-effective if you know how to use it. The solution is very flexible for programmers, if you know how to do scripts or program in Python or any other language, it's extremely easy to use.""I don't need a server running all the time while using the tool. It is also easy to setup. The product offers a pay-as-you-go service.""Google Cloud Dataflow is useful for streaming and data pipelines.""The service is relatively cheap compared to other batch-processing engines.""The best feature of Google Cloud Dataflow is its practical connectedness.""The most valuable features of Google Cloud Dataflow are scalability and connectivity."

More Google Cloud Dataflow Pros →

Cons

"It requires overcoming a significant learning curve due to its robust and feature-rich nature.""Apache Spark provides very good performance The tuning phase is still tricky.""Include more machine learning algorithms and the ability to handle streaming of data versus micro batch processing.""We use big data manager but we cannot use it as conditional data so whenever we're trying to fetch the data, it takes a bit of time.""I would like to see integration with data science platforms to optimize the processing capability for these tasks.""Stream processing needs to be developed more in Spark. I have used Flink previously. Flink is better than Spark at stream processing.""In data analysis, you need to take real-time data from different data sources. You need to process this in a subsecond, do the transformation in a subsecond, and all that.""The initial setup was not easy."

More Apache Spark Cons →

"The technical support has slight room for improvement.""They should do a market survey and then make improvements.""The solution's setup process could be more accessible.""The authentication part of the product is an area of concern where improvements are required.""Google Cloud Data Flow can improve by having full simple integration with Kafka topics. It's not that complicated, but it could improve a bit. The UI is easy to use but the experience could be better. There are other tools available that do a better job.""Google Cloud Dataflow should include a little cost optimization.""The deployment time could also be reduced.""I would like Google Cloud Dataflow to be integrated with IT data flow and other related services to make it easier to use as it is a complex tool."

More Google Cloud Dataflow Cons →

Pricing and Cost Advice

"Since we are using the Apache Spark version, not the data bricks version, it is an Apache license version, the support and resolution of the bug are actually late or delayed. The Apache license is free."

"Apache Spark is open-source. You have to pay only when you use any bundled product, such as Cloudera."

"We are using the free version of the solution."

"Apache Spark is not too cheap. You have to pay for hardware and Cloudera licenses. Of course, there is a solution with open source without Cloudera."

"Apache Spark is an expensive solution."

"Spark is an open-source solution, so there are no licensing costs."

"On the cloud model can be expensive as it requires substantial resources for implementation, covering on-premises hardware, memory, and licensing."

"It is an open-source solution, it is free of charge."

More Apache Spark Pricing and Cost Advice →

"The price of the solution depends on many factors, such as how they pay for tools in the company and its size."

"Google Cloud is slightly cheaper than AWS."

"The tool is cheap."

"Google Cloud Dataflow is a cheap solution."

"The solution is cost-effective."

"On a scale from one to ten, where one is cheap, and ten is expensive, I rate Google Cloud Dataflow's pricing a four out of ten."

"On a scale from one to ten, where one is cheap, and ten is expensive, I rate the solution's pricing a seven to eight out of ten."

"The solution is not very expensive."

More Google Cloud Dataflow Pricing and Cost Advice →

See Which Vendors Are Best For You

Use our free recommendation engine to learn which Hadoop solutions are best for your needs.

See Recommendations

768,578 professionals have used our research since 2012.

Questions from the Community

What do you like most about Apache Spark?

Top Answer:We use Spark to process data from different data sources.

Read all 30 answers →

What is your experience regarding pricing and costs for Apache Spark?

Top Answer:The solution is moderately priced.

Read all 19 answers →

What needs improvement with Apache Spark?

Top Answer:In data analysis, you need to take real-time data from different data sources. You need to process this in a subsecond, and do the transformation in a subsecond

Read all 32 answers →

What do you like most about Google Cloud Dataflow?

Top Answer:The product's installation process is easy...The tool's maintenance part is somewhat easy.

Read all 10 answers →

What is your experience regarding pricing and costs for Google Cloud Data...

Top Answer:The solution is not very expensive.

Read all 9 answers →

What needs improvement with Google Cloud Dataflow?

Top Answer:The authentication part of the product is an area of concern where improvements are required. For some common users, the solution's authentication part is difficult to use. The scalability of the… more »

Read all 10 answers →

Ranking

1st

out of 22 in Hadoop

Views

2,498

Comparisons

1,884

Reviews

Average Words per Review

432

Rating

8.7

7th

out of 38 in Streaming Analytics

Views

4,813

Comparisons

3,977

Reviews

Average Words per Review

308

Rating

7.7

Comparisons

Spring Boot vs. Apache Spark

Compared 31% of the time.

AWS Batch vs. Apache Spark

Compared 10% of the time.

Spark SQL vs. Apache Spark

Compared 10% of the time.

SAP HANA vs. Apache Spark

Compared 8% of the time.

Cloudera Distribution for Hadoop vs. Apache Spark

Compared 6% of the time.

More Apache Spark Competitors →

Databricks vs. Google Cloud Dataflow

Compared 28% of the time.

Apache NiFi vs. Google Cloud Dataflow

Compared 15% of the time.

Amazon MSK vs. Google Cloud Dataflow

Compared 11% of the time.

Amazon Kinesis vs. Google Cloud Dataflow

Compared 11% of the time.

Talend Data Streams vs. Google Cloud Dataflow

Compared 1% of the time.

More Google Cloud Dataflow Competitors →

Also Known As

Google Dataflow

Learn More

Apache

Google

Overview

Spark provides programmers with an application programming interface centered on a data structure called the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way. It was developed in response to limitations in the MapReduce cluster computing paradigm, which forces a particular linear dataflowstructure on distributed programs: MapReduce programs read input data from disk, map a function across the data, reduce the results of the map, and store reduction results on disk. Spark's RDDs function as a working set for distributed programs that offers a (deliberately) restricted form of distributed shared memory

Google Dataflow is a unified programming model and a managed service for developing and executing a wide range of data processing patterns including ETL, batch computation, and continuous computation. Cloud Dataflow frees you from operational tasks like resource management and performance optimization.

Sample Customers

NASA JPL, UC Berkeley AMPLab, Amazon, eBay, Yahoo!, UC Santa Cruz, TripAdvisor, Taboola, Agile Lab, Art.com, Baidu, Alibaba Taobao, EURECOM, Hitachi Solutions

Absolutdata, Backflip Studios, Bluecore, Claritics, Crystalloids, Energyworx, GenieConnect, Leanplum, Nomanini, Redbus, Streak, TabTale

Top Industries

REVIEWERS

Computer Software Company30%

Financial Services Firm15%

University9%

Marketing Services Firm6%

VISITORS READING REVIEWS

Financial Services Firm24%

Computer Software Company13%

Manufacturing Company7%

Comms Service Provider6%

VISITORS READING REVIEWS

Financial Services Firm14%

Computer Software Company12%

Retailer11%

Manufacturing Company10%

Company Size

REVIEWERS

Small Business40%

Midsize Enterprise19%

Large Enterprise40%

VISITORS READING REVIEWS

Small Business17%

Midsize Enterprise12%

Large Enterprise71%

REVIEWERS

Small Business27%

Midsize Enterprise18%

Large Enterprise55%

VISITORS READING REVIEWS

Small Business17%

Midsize Enterprise12%

Large Enterprise72%

Apache Spark vs Google Cloud Dataflow comparison

Apache Spark

Google Cloud Dataflow