We performed a comparison between Apache Spark and Cloudera Distribution for Hadoop based on real PeerSpot user reviews.
Find out in this report how the two Hadoop solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."I appreciate everything about the solution, not just one or two specific features. The solution is highly stable. I rate it a perfect ten. The solution is highly scalable. I rate it a perfect ten. The initial setup was straightforward. I recommend using the solution. Overall, I rate the solution a perfect ten."
"The solution is very stable."
"It is highly scalable, allowing you to efficiently work with extensive datasets that might be problematic to handle using traditional tools that are memory-constrained."
"With Hadoop-related technologies, we can distribute the workload with multiple commodity hardware."
"AI libraries are the most valuable. They provide extensibility and usability. Spark has a lot of connectors, which is a very important and useful feature for AI. You need to connect a lot of points for AI, and you have to get data from those systems. Connectors are very wide in Spark. With a Spark cluster, you can get fast results, especially for AI."
"Spark helps us reduce startup time for our customers and gives a very high ROI in the medium term."
"The main feature that we find valuable is that it is very fast."
"The solution has been very stable."
"The solution is reliable and stable, it fits our requirements."
"The solution is stable."
"The product provides better data processing features than other tools."
"The data science aspect of the solution is valuable."
"The product is completely secure."
"The tool can be deployed using different container technologies, which makes it very scalable."
"We had a data warehouse before all the data. We can process a lot more data structures."
"CDH has a wide variety of proprietary tools that we use, like Impala. So from that perspective, it's quite useful as opposed to something open-source. We get a lot of value from Cloudera's proprietary tools."
"Needs to provide an internal schedule to schedule spark jobs with monitoring capability."
"The management tools could use improvement. Some of the debugging tools need some work as well. They need to be more descriptive."
"The initial setup was not easy."
"Dynamic DataFrame options are not yet available."
"The migration of data between different versions could be improved."
"If you have a Spark session in the background, sometimes it's very hard to kill these sessions because of D allocation."
"More ML based algorithms should be added to it, to make it algorithmic-rich for developers."
"The solution’s integration with other platforms should be improved."
"The solution does not support multiple languages very well and this means users need to create work-arounds to implement some solutions."
"The pricing needs to improve."
"Cloudera Distribution for Hadoop has a limited feature list and a lot of costs involved."
"We experienced many issues when we started working with Hadoop 3.0 in the Cloudera 6.0 version, so there is a lot of things that need to improve."
"The price of this solution could be lowered."
"It would be useful if Cloudera had more tools like SQL Engines that offer the traditional relational database. We have to do a lot of work preparing the data outside Cloudera before getting it into the platform."
"Without the big data environment, we cannot store all of this data live. We have billions of records and terabytes of storage to be used. It's not an option actually for us to have a big data environment."
"The governance aspect of the solution should be improved."
More Cloudera Distribution for Hadoop Pricing and Cost Advice →
Apache Spark is ranked 1st in Hadoop with 60 reviews while Cloudera Distribution for Hadoop is ranked 2nd in Hadoop with 47 reviews. Apache Spark is rated 8.4, while Cloudera Distribution for Hadoop is rated 8.0. The top reviewer of Apache Spark writes "Reliable, able to expand, and handle large amounts of data well". On the other hand, the top reviewer of Cloudera Distribution for Hadoop writes "Good end-to-end security features and we like that it's cloud independent". Apache Spark is most compared with Spring Boot, AWS Batch, Spark SQL, SAP HANA and Azure Stream Analytics, whereas Cloudera Distribution for Hadoop is most compared with Amazon EMR, HPE Ezmeral Data Fabric, Cassandra, ScyllaDB and MongoDB. See our Apache Spark vs. Cloudera Distribution for Hadoop report.
See our list of best Hadoop vendors.
We monitor all Hadoop reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.