We performed a comparison between Apache Spark and Cloudera Distribution for Hadoop based on real PeerSpot user reviews.
Find out in this report how the two Hadoop solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."The solution has been very stable."
"It is highly scalable, allowing you to efficiently work with extensive datasets that might be problematic to handle using traditional tools that are memory-constrained."
"I appreciate everything about the solution, not just one or two specific features. The solution is highly stable. I rate it a perfect ten. The solution is highly scalable. I rate it a perfect ten. The initial setup was straightforward. I recommend using the solution. Overall, I rate the solution a perfect ten."
"It's easy to prepare parallelism in Spark, run the solution with specific parameters, and get good performance."
"The most valuable feature is the Fault Tolerance and easy binding with other processes like Machine Learning, graph analytics."
"AI libraries are the most valuable. They provide extensibility and usability. Spark has a lot of connectors, which is a very important and useful feature for AI. You need to connect a lot of points for AI, and you have to get data from those systems. Connectors are very wide in Spark. With a Spark cluster, you can get fast results, especially for AI."
"Now, when we're tackling sentiment analysis using NLP technologies, we deal with unstructured data—customer chats, feedback on promotions or demos, and even media like images, audio, and video files. For processing such data, we rely on PySpark. Beneath the surface, Spark functions as a compute engine with in-memory processing capabilities, enhancing performance through features like broadcasting and caching. It's become a crucial tool, widely adopted by 90% of companies for a decade or more."
"The most valuable feature of this solution is its capacity for processing large amounts of data."
"The most valuable feature is Kubernetes."
"We're now able to store large volumes of data through Cloudera Distribution for Hadoop. We're able to push large volumes of data to the platform, and that used to be a challenge, especially when storing a terabyte of information. This is the area where Cloudera Distribution for Hadoop improved the organization."
"We experienced many issues when we started working with Hadoop 3.0 in the Cloudera 6.0 version, so there are a lot of things that need to improve. I believe they are working on that."
"The product as a whole is good."
"The solution's most valuable feature is the enterprise data platform."
"The product is completely secure."
"I don't see any performance issues."
"Provides a viable open-source solution for enterprise implementations and reliable, intelligent data analysis."
"I would like to see integration with data science platforms to optimize the processing capability for these tasks."
"It should support more programming languages."
"The product could improve the user interface and make it easier for new users."
"The migration of data between different versions could be improved."
"The solution needs to optimize shuffling between workers."
"This solution currently cannot support or distribute neural network related models, or deep learning related algorithms. We would like this functionality to be developed."
"One limitation is that not all machine learning libraries and models support it."
"They could improve the issues related to programming language for the platform."
"We experienced many issues when we started working with Hadoop 3.0 in the Cloudera 6.0 version, so there is a lot of things that need to improve."
"They should focus on upgrading their technical capabilities in the market."
"The Cloudera training has deteriorated significantly."
"There is a maximum of a one-gigabyte block size, which is an area of storage that can be improved upon."
"The competitors provide better functionalities."
"The initial setup of Cloudera is difficult."
"Cloudera Distribution for Hadoop has a limited feature list and a lot of costs involved."
"The dashboard could be improved."
More Cloudera Distribution for Hadoop Pricing and Cost Advice →
Apache Spark is ranked 1st in Hadoop with 60 reviews while Cloudera Distribution for Hadoop is ranked 2nd in Hadoop with 47 reviews. Apache Spark is rated 8.4, while Cloudera Distribution for Hadoop is rated 8.0. The top reviewer of Apache Spark writes "Reliable, able to expand, and handle large amounts of data well". On the other hand, the top reviewer of Cloudera Distribution for Hadoop writes "Good end-to-end security features and we like that it's cloud independent". Apache Spark is most compared with Spring Boot, AWS Batch, Spark SQL, SAP HANA and AWS Lambda, whereas Cloudera Distribution for Hadoop is most compared with Amazon EMR, HPE Ezmeral Data Fabric, MongoDB, Cassandra and ScyllaDB. See our Apache Spark vs. Cloudera Distribution for Hadoop report.
See our list of best Hadoop vendors.
We monitor all Hadoop reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.