Apache Spark vs Google Cloud Dataflow Comparison 2024

Apache Spark

Google Cloud Dataflow

Apache Spark

Read 60 Apache Spark reviews

2,430 views|1,869 comparisons

Google Cloud Dataflow

Read 10 Google Cloud Dataflow reviews

4,763 views|3,959 comparisons

Comparison Buyer's Guide

Download the complete report

Buyer's Guide

Hadoop

April 2024

Executive Summary

We performed a comparison between Apache Spark and Google Cloud Dataflow based on real PeerSpot user reviews.

Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop.

To learn more, read our detailed Hadoop Report (Updated: April 2024).

Download the complete report

770,458 professionals have used our research since 2012.

Featured Review

Atal Upadhyay

AVP at MIDDAY INFOMEDIA LIMITED

Allows us to consume data from any data source and has a remarkable processing power

Our experience with using Spark for machine learning and big data analytics allows us to consume data from any data source, including freely... Read more →

Darasimi Ajewole

Software Engineer at Formplus

Helps to run batch-specific jobs, but notifications for error messages could be more detailed

Migrating our batch processing jobs to Google Cloud Dataflow led to a reduction in cost by 70%.

Quotes From Members

We asked business professionals to review the solutions they use.
Here are some excerpts of what they said:

Pros

"The product’s most valuable feature is the SQL tool. It enables us to create a database and publish it.""The scalability has been the most valuable aspect of the solution.""With Hadoop-related technologies, we can distribute the workload with multiple commodity hardware.""We use Spark to process data from different data sources.""The deployment of the product is easy.""The most valuable feature of Apache Spark is its memory processing because it processes data over RAM rather than disk, which is much more efficient and fast.""I found the solution stable. We haven't had any problems with it.""The product's deployment phase is easy."

More Apache Spark Pros →

"The solution allows us to program in any language we desire.""The most valuable features of Google Cloud Dataflow are the integration, it's very simple if you have the complete stack, which we are using. It is overall very easy to use, user-friendly friendly, and cost-effective if you know how to use it. The solution is very flexible for programmers, if you know how to do scripts or program in Python or any other language, it's extremely easy to use.""It is a scalable solution.""The service is relatively cheap compared to other batch-processing engines.""The best feature of Google Cloud Dataflow is its practical connectedness.""The product's installation process is easy...The tool's maintenance part is somewhat easy.""I don't need a server running all the time while using the tool. It is also easy to setup. The product offers a pay-as-you-go service.""Google Cloud Dataflow is useful for streaming and data pipelines."

More Google Cloud Dataflow Pros →

Cons

"Apache Spark should add some resource management improvements to the algorithms.""Technical expertise from an engineer is required to deploy and run high-tech tools, like Informatica, on Apache Spark, making it an area where improvements are required to make the process easier for users.""I would like to see integration with data science platforms to optimize the processing capability for these tasks.""When you first start using this solution, it is common to run into memory errors when you are dealing with large amounts of data.""If you have a Spark session in the background, sometimes it's very hard to kill these sessions because of D allocation.""Apache Spark could potentially improve in terms of user-friendliness, particularly for individuals with a SQL background. While it's suitable for those with programming knowledge, making it more accessible to those without extensive programming skills could be beneficial.""It should support more programming languages.""The product could improve the user interface and make it easier for new users."

More Apache Spark Cons →

"The solution's setup process could be more accessible.""They should do a market survey and then make improvements.""The technical support has slight room for improvement.""Google Cloud Dataflow should include a little cost optimization.""The deployment time could also be reduced.""Google Cloud Data Flow can improve by having full simple integration with Kafka topics. It's not that complicated, but it could improve a bit. The UI is easy to use but the experience could be better. There are other tools available that do a better job.""I would like Google Cloud Dataflow to be integrated with IT data flow and other related services to make it easier to use as it is a complex tool.""When I deploy the product in local errors, a lot of errors pop up which are not always caught. The solution's error logging is bad. It can take a lot of time to debug the errors. It needs to have better logs."

More Google Cloud Dataflow Cons →

Pricing and Cost Advice

"Since we are using the Apache Spark version, not the data bricks version, it is an Apache license version, the support and resolution of the bug are actually late or delayed. The Apache license is free."

"Apache Spark is open-source. You have to pay only when you use any bundled product, such as Cloudera."

"We are using the free version of the solution."

"Apache Spark is not too cheap. You have to pay for hardware and Cloudera licenses. Of course, there is a solution with open source without Cloudera."

"Apache Spark is an expensive solution."

"Spark is an open-source solution, so there are no licensing costs."

"On the cloud model can be expensive as it requires substantial resources for implementation, covering on-premises hardware, memory, and licensing."

"It is an open-source solution, it is free of charge."

More Apache Spark Pricing and Cost Advice →

"The price of the solution depends on many factors, such as how they pay for tools in the company and its size."

"Google Cloud is slightly cheaper than AWS."

"The tool is cheap."

"Google Cloud Dataflow is a cheap solution."

"The solution is cost-effective."

"On a scale from one to ten, where one is cheap, and ten is expensive, I rate Google Cloud Dataflow's pricing a four out of ten."

"On a scale from one to ten, where one is cheap, and ten is expensive, I rate the solution's pricing a seven to eight out of ten."

"The solution is not very expensive."

More Google Cloud Dataflow Pricing and Cost Advice →

See Which Vendors Are Best For You

Use our free recommendation engine to learn which Hadoop solutions are best for your needs.

See Recommendations

770,458 professionals have used our research since 2012.

Questions from the Community

What do you like most about Apache Spark?

Top Answer:We use Spark to process data from different data sources.

Read all 30 answers →

What is your experience regarding pricing and costs for Apache Spark?

Top Answer:The solution is moderately priced.

Read all 19 answers →

What needs improvement with Apache Spark?

Top Answer:In data analysis, you need to take real-time data from different data sources. You need to process this in a subsecond, and do the transformation in a subsecond

Read all 32 answers →

What do you like most about Google Cloud Dataflow?

Top Answer:The product's installation process is easy...The tool's maintenance part is somewhat easy.

Read all 10 answers →

What is your experience regarding pricing and costs for Google Cloud Data...

Top Answer:The solution is not very expensive.

Read all 9 answers →

What needs improvement with Google Cloud Dataflow?

Top Answer:The authentication part of the product is an area of concern where improvements are required. For some common users, the solution's authentication part is difficult to use. The scalability of the… more »

Read all 10 answers →

Ranking

1st

out of 22 in Hadoop

Views

2,430

Comparisons

1,869

Reviews

Average Words per Review

444

Rating

8.7

7th

out of 38 in Streaming Analytics

Views

4,763

Comparisons

3,959

Reviews

Average Words per Review

308

Rating

7.7

Comparisons

Spring Boot vs. Apache Spark

Compared 31% of the time.

AWS Batch vs. Apache Spark

Compared 10% of the time.

Spark SQL vs. Apache Spark

Compared 9% of the time.

SAP HANA vs. Apache Spark

Compared 8% of the time.

Cloudera Distribution for Hadoop vs. Apache Spark

Compared 6% of the time.

More Apache Spark Competitors →

Databricks vs. Google Cloud Dataflow

Compared 30% of the time.

Apache NiFi vs. Google Cloud Dataflow

Compared 14% of the time.

Amazon MSK vs. Google Cloud Dataflow

Compared 12% of the time.

Amazon Kinesis vs. Google Cloud Dataflow

Compared 10% of the time.

IBM Streams vs. Google Cloud Dataflow

Compared 1% of the time.

More Google Cloud Dataflow Competitors →

Also Known As

Google Dataflow

Learn More

Apache

Google

Overview

Spark provides programmers with an application programming interface centered on a data structure called the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way. It was developed in response to limitations in the MapReduce cluster computing paradigm, which forces a particular linear dataflowstructure on distributed programs: MapReduce programs read input data from disk, map a function across the data, reduce the results of the map, and store reduction results on disk. Spark's RDDs function as a working set for distributed programs that offers a (deliberately) restricted form of distributed shared memory

Google Dataflow is a unified programming model and a managed service for developing and executing a wide range of data processing patterns including ETL, batch computation, and continuous computation. Cloud Dataflow frees you from operational tasks like resource management and performance optimization.

Sample Customers

NASA JPL, UC Berkeley AMPLab, Amazon, eBay, Yahoo!, UC Santa Cruz, TripAdvisor, Taboola, Agile Lab, Art.com, Baidu, Alibaba Taobao, EURECOM, Hitachi Solutions

Absolutdata, Backflip Studios, Bluecore, Claritics, Crystalloids, Energyworx, GenieConnect, Leanplum, Nomanini, Redbus, Streak, TabTale

Top Industries

REVIEWERS

Computer Software Company30%

Financial Services Firm15%

University9%

Marketing Services Firm6%

VISITORS READING REVIEWS

Financial Services Firm25%

Computer Software Company13%

Manufacturing Company7%

Comms Service Provider6%

VISITORS READING REVIEWS

Financial Services Firm14%

Computer Software Company12%

Retailer11%

Manufacturing Company10%

Company Size

REVIEWERS

Small Business40%

Midsize Enterprise18%

Large Enterprise42%

VISITORS READING REVIEWS

Small Business17%

Midsize Enterprise12%

Large Enterprise71%

REVIEWERS

Small Business27%

Midsize Enterprise18%

Large Enterprise55%

VISITORS READING REVIEWS

Small Business17%

Midsize Enterprise12%

Large Enterprise72%

Apache Spark vs Google Cloud Dataflow comparison