We performed a comparison between Apache Spark and Cloudera Distribution for Hadoop based on real PeerSpot user reviews.
Find out in this report how the two Hadoop solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."AI libraries are the most valuable. They provide extensibility and usability. Spark has a lot of connectors, which is a very important and useful feature for AI. You need to connect a lot of points for AI, and you have to get data from those systems. Connectors are very wide in Spark. With a Spark cluster, you can get fast results, especially for AI."
"I appreciate everything about the solution, not just one or two specific features. The solution is highly stable. I rate it a perfect ten. The solution is highly scalable. I rate it a perfect ten. The initial setup was straightforward. I recommend using the solution. Overall, I rate the solution a perfect ten."
"The good performance. The nice graphical management console. The long list of ML algorithms."
"The most valuable feature is the Fault Tolerance and easy binding with other processes like Machine Learning, graph analytics."
"We use it for ETL purposes as well as for implementing the full transformation pipelines."
"The product's deployment phase is easy."
"We use Spark to process data from different data sources."
"Apache Spark provides a very high-quality implementation of distributed data processing."
"The product is completely secure."
"I don't see any performance issues."
"We experienced many issues when we started working with Hadoop 3.0 in the Cloudera 6.0 version, so there are a lot of things that need to improve. I believe they are working on that."
"Provides a viable open-source solution for enterprise implementations and reliable, intelligent data analysis."
"The features I find most valuable is that the solution is that it is easy to install and to work with. It starts with the installation and from there on the management is very simple and centralized."
"It is helpful to gather and process data."
"Customer service and support were able to fix whatever the issue was."
"The most valuable feature is Impala, the querying engine, which is very fast."
"Technical expertise from an engineer is required to deploy and run high-tech tools, like Informatica, on Apache Spark, making it an area where improvements are required to make the process easier for users."
"Needs to provide an internal schedule to schedule spark jobs with monitoring capability."
"It should support more programming languages."
"Apart from the restrictions that come with its in-memory implementation. It has been improved significantly up to version 3.0, which is currently in use."
"It needs a new interface and a better way to get some data. In terms of writing our scripts, some processes could be faster."
"At times during the deployment process, the tool goes down, making it look less robust. To take care of the issues in the deployment process, users need to do manual interventions occasionally."
"It requires overcoming a significant learning curve due to its robust and feature-rich nature."
"The solution needs to optimize shuffling between workers."
"There is a maximum of a one-gigabyte block size, which is an area of storage that can be improved upon."
"The areas of improvement depend on the scale of the project. For banking customers, security features and an essential budget for commercial licenses would be the top priority. Data regulation could be the most crucial for a project with extensive data or an extra use case."
"The governance aspect of the solution should be improved."
"The dashboard could be improved."
"The initial setup of Cloudera is difficult."
"The price of this solution could be lowered."
"Without the big data environment, we cannot store all of this data live. We have billions of records and terabytes of storage to be used. It's not an option actually for us to have a big data environment."
"Cloudera Distribution for Hadoop is not always completely stable in some cases, which can be a concern for big data solutions."
More Cloudera Distribution for Hadoop Pricing and Cost Advice →
Apache Spark is ranked 1st in Hadoop with 60 reviews while Cloudera Distribution for Hadoop is ranked 2nd in Hadoop with 47 reviews. Apache Spark is rated 8.4, while Cloudera Distribution for Hadoop is rated 8.0. The top reviewer of Apache Spark writes "Reliable, able to expand, and handle large amounts of data well". On the other hand, the top reviewer of Cloudera Distribution for Hadoop writes "Good end-to-end security features and we like that it's cloud independent". Apache Spark is most compared with Spring Boot, AWS Batch, Spark SQL, SAP HANA and AWS Lambda, whereas Cloudera Distribution for Hadoop is most compared with Amazon EMR, HPE Ezmeral Data Fabric, MongoDB, Cassandra and InfluxDB. See our Apache Spark vs. Cloudera Distribution for Hadoop report.
See our list of best Hadoop vendors.
We monitor all Hadoop reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.