We performed a comparison between Apache Spark and Cloudera Distribution for Hadoop based on real PeerSpot user reviews.
Find out in this report how the two Hadoop solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."I like that it can handle multiple tasks parallelly. I also like the automation feature. JavaScript also helps with the parallel streaming of the library."
"The product’s most valuable feature is the SQL tool. It enables us to create a database and publish it."
"Spark helps us reduce startup time for our customers and gives a very high ROI in the medium term."
"With Spark, we parallelize our operations, efficiently accessing both historical and real-time data."
"The main feature that we find valuable is that it is very fast."
"We use Spark to process data from different data sources."
"The most valuable feature of Apache Spark is its ease of use."
"It's easy to prepare parallelism in Spark, run the solution with specific parameters, and get good performance."
"The product is completely secure."
"We had a data warehouse before all the data. We can process a lot more data structures."
"With a cluster available, you can manage the security layer using the shared SDX - it provides flexibility."
"The most valuable feature is Kubernetes."
"The scalability of Cloudera Distribution for Hadoop is excellent."
"Provides a viable open-source solution for enterprise implementations and reliable, intelligent data analysis."
"The solution is stable."
"It is helpful to gather and process data."
"I know there is always discussion about which language to write applications in and some people do love Scala. However, I don't like it."
"Needs to provide an internal schedule to schedule spark jobs with monitoring capability."
"Apache Spark could potentially improve in terms of user-friendliness, particularly for individuals with a SQL background. While it's suitable for those with programming knowledge, making it more accessible to those without extensive programming skills could be beneficial."
"The solution needs to optimize shuffling between workers."
"The setup I worked on was really complex."
"When you are working with large, complex tasks, the garbage collection process is slow and affects performance."
"The migration of data between different versions could be improved."
"The product could improve the user interface and make it easier for new users."
"The user infrastructure and user interface needs to be improved, as well as the performance. The GUI needs to be better."
"While the deployed product is generally functional, there are instances where it presents difficulties."
"The competitors provide better functionalities."
"The one thing that we struggled with predominately was support. Because it was relatively new, support was always a big issue and I think it's still a bit of an ongoing concern with the team currently managing it."
"The procedure for operations could be simplified."
"The pricing needs to improve."
"There is a maximum of a one-gigabyte block size, which is an area of storage that can be improved upon."
"The security of this solution could be improved. There should also be a way to basically have a blockchain enabled storage with the HDFS."
More Cloudera Distribution for Hadoop Pricing and Cost Advice →
Apache Spark is ranked 1st in Hadoop with 60 reviews while Cloudera Distribution for Hadoop is ranked 2nd in Hadoop with 47 reviews. Apache Spark is rated 8.4, while Cloudera Distribution for Hadoop is rated 8.0. The top reviewer of Apache Spark writes "Reliable, able to expand, and handle large amounts of data well". On the other hand, the top reviewer of Cloudera Distribution for Hadoop writes "Good end-to-end security features and we like that it's cloud independent". Apache Spark is most compared with Spring Boot, AWS Batch, Spark SQL, SAP HANA and Azure Stream Analytics, whereas Cloudera Distribution for Hadoop is most compared with Amazon EMR, HPE Ezmeral Data Fabric, Cassandra, ScyllaDB and MongoDB. See our Apache Spark vs. Cloudera Distribution for Hadoop report.
See our list of best Hadoop vendors.
We monitor all Hadoop reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.