Compare Apache Spark vs. Cloudera Distribution for Hadoop

Apache Spark is ranked 1st in Hadoop with 11 reviews while Cloudera Distribution for Hadoop is ranked 2nd in Hadoop with 9 reviews. Apache Spark is rated 8.0, while Cloudera Distribution for Hadoop is rated 8.0. The top reviewer of Apache Spark writes "Good Streaming features enable to enter data and analysis within Spark Stream". On the other hand, the top reviewer of Cloudera Distribution for Hadoop writes "Open-source solution for intelligent data management and analysis". Apache Spark is most compared with Spring Boot, Azure Stream Analytics and AWS Lambda, whereas Cloudera Distribution for Hadoop is most compared with Amazon EMR, Cassandra and Apache Spark. See our Apache Spark vs. Cloudera Distribution for Hadoop report.
Cancel
You must select at least 2 products to compare!
Most Helpful Review
Find out what your peers are saying about Apache Spark vs. Cloudera Distribution for Hadoop and other solutions. Updated: March 2020.
407,401 professionals have used our research since 2012.
Quotes From Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:

Pros
The processing time is very much improved over the data warehouse solution that we were using.The main feature that we find valuable is that it is very fast.The features we find most valuable are the machine learning, data learning, and Spark Analytics.I feel the streaming is its best feature.The solution is very stable.The most valuable feature of this solution is its capacity for processing large amounts of data.I found the solution stable. We haven't had any problems with it.The scalability has been the most valuable aspect of the solution.

Read more »

The most valuable feature is Kubernetes.We also really like the Cloudera community. You can have any question and will have your answer within a few hours.The most valuable feature is Impala, the querying engine, which is very fast.In terms of scalability, if you have enough hardware you can scale out. Scalability doesn't have any issues.Provides a viable open-source solution for enterprise implementations and reliable, intelligent data analysis.The search function is the most valuable aspect of the solution.We experienced many issues when we started working with Hadoop 3.0 in the Cloudera 6.0 version, so there are a lot of things that need to improve. I believe they are working on that.The features I find most valuable is that the solution is that it is easy to install and to work with. It starts with the installation and from there on the management is very simple and centralized.

Read more »

Cons
I would like to see integration with data science platforms to optimize the processing capability for these tasks.We use big data manager but we cannot use it as conditional data so whenever we're trying to fetch the data, it takes a bit of time.We've had problems using a Python process to try to access something in a large volume of data. It crashes if somebody gives me the wrong code because it cannot handle a large volume of data.When you want to extract data from your HDFS and other sources then it is kind of tricky because you have to connect with those sources.The solution needs to optimize shuffling between workers.When you first start using this solution, it is common to run into memory errors when you are dealing with large amounts of data.It needs a new interface and a better way to get some data. In terms of writing our scripts, some processes could be faster.The management tools could use improvement. Some of the debugging tools need some work as well. They need to be more descriptive.

Read more »

The price of this solution could be lowered.Without the big data environment, we cannot store all of this data live. We have billions of records and terabytes of storage to be used. It's not an option actually for us to have a big data environment.There is a maximum of a one-gigabyte block size, which is an area of storage that can be improved upon.The one thing that we struggled with predominately was support. Because it was relatively new, support was always a big issue and I think it's still a bit of an ongoing concern with the team currently managing it.The solution does not support multiple languages very well and this means users need to create work-arounds to implement some solutions.The user infrastructure and user interface needs to be improved, as well as the performance. The GUI needs to be better.We experienced many issues when we started working with Hadoop 3.0 in the Cloudera 6.0 version, so there is a lot of things that need to improve.I would like to see an improvement in how the solution helps me to handle the whole cluster.

Read more »

report
Use our free recommendation engine to learn which Hadoop solutions are best for your needs.
407,401 professionals have used our research since 2012.
Ranking
1st
out of 24 in Hadoop
Views
10,978
Comparisons
9,231
Reviews
11
Average Words per Review
305
Avg. Rating
8.0
2nd
out of 24 in Hadoop
Views
6,419
Comparisons
4,690
Reviews
8
Average Words per Review
460
Avg. Rating
7.9
Top Comparisons
Compared 36% of the time.
Compared 9% of the time.
Learn
Apache
Cloudera
Overview

Spark provides programmers with an application programming interface centered on a data structure called the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way. It was developed in response to limitations in the MapReduce cluster computing paradigm, which forces a particular linear dataflowstructure on distributed programs: MapReduce programs read input data from disk, map a function across the data, reduce the results of the map, and store reduction results on disk. Spark's RDDs function as a working set for distributed programs that offers a (deliberately) restricted form of distributed shared memory

Cloudera Distribution for Hadoop is the world's most complete, tested, and popular distribution of Apache Hadoop and related projects. CDH is 100% Apache-licensed open source and is the only Hadoop solution to offer unified batch processing, interactive SQL, and interactive search, and role-based access controls. More enterprises have downloaded CDH than all other such distributions combined.
Offer
Learn more about Apache Spark
Learn more about Cloudera Distribution for Hadoop
Sample Customers
NASA JPL, UC Berkeley AMPLab, Amazon, eBay, Yahoo!, UC Santa Cruz, TripAdvisor, Taboola, Agile Lab, Art.com, Baidu, Alibaba Taobao, EURECOM, Hitachi Solutions37signals, Adconion,adgooroo, Aggregate Knowledge, AMD, Apollo Group, Blackberry, Box, BT, CSC
Top Industries
REVIEWERS
Software R&D Company29%
Financial Services Firm29%
Non Profit14%
Marketing Services Firm14%
VISITORS READING REVIEWS
Software R&D Company35%
Media Company12%
Comms Service Provider11%
Financial Services Firm8%
REVIEWERS
Financial Services Firm40%
Marketing Services Firm20%
Media Company10%
Manufacturing Company10%
VISITORS READING REVIEWS
Software R&D Company36%
Comms Service Provider10%
Insurance Company8%
Financial Services Firm7%
Find out what your peers are saying about Apache Spark vs. Cloudera Distribution for Hadoop and other solutions. Updated: March 2020.
407,401 professionals have used our research since 2012.
We monitor all Hadoop reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.