We performed a comparison between Apache Spark and Cloudera Distribution for Hadoop based on real PeerSpot user reviews.
Find out in this report how the two Hadoop solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."It is useful for handling large amounts of data. It is very useful for scientific purposes."
"Features include machine learning, real time streaming, and data processing."
"Apache Spark can do large volume interactive data analysis."
"The product’s most valuable feature is the SQL tool. It enables us to create a database and publish it."
"The product’s most valuable features are lazy evaluation and workload distribution."
"One of the key features is that Apache Spark is a distributed computing framework. You can help multiple slaves and distribute the workload between them."
"It's easy to prepare parallelism in Spark, run the solution with specific parameters, and get good performance."
"One of Apache Spark's most valuable features is that it supports in-memory processing, the execution of jobs compared to traditional tools is very fast."
"It has the best proxy, security, and support features compared to open-source products."
"Customer service and support were able to fix whatever the issue was."
"The product provides better data processing features than other tools."
"The most valuable feature is Impala, the querying engine, which is very fast."
"With a cluster available, you can manage the security layer using the shared SDX - it provides flexibility."
"CDH has a wide variety of proprietary tools that we use, like Impala. So from that perspective, it's quite useful as opposed to something open-source. We get a lot of value from Cloudera's proprietary tools."
"The features I find most valuable is that the solution is that it is easy to install and to work with. It starts with the installation and from there on the management is very simple and centralized."
"I don't see any performance issues."
"Include more machine learning algorithms and the ability to handle streaming of data versus micro batch processing."
"The solution must improve its performance."
"This solution currently cannot support or distribute neural network related models, or deep learning related algorithms. We would like this functionality to be developed."
"Apache Spark could improve the connectors that it supports. There are a lot of open-source databases in the market. For example, cloud databases, such as Redshift, Snowflake, and Synapse. Apache Spark should have connectors present to connect to these databases. There are a lot of workarounds required to connect to those databases, but it should have inbuilt connectors."
"In data analysis, you need to take real-time data from different data sources. You need to process this in a subsecond, do the transformation in a subsecond, and all that."
"At the initial stage, the product provides no container logs to check the activity."
"Needs to provide an internal schedule to schedule spark jobs with monitoring capability."
"We are building our own queries on Spark, and it can be improved in terms of query handling."
"The areas of improvement depend on the scale of the project. For banking customers, security features and an essential budget for commercial licenses would be the top priority. Data regulation could be the most crucial for a project with extensive data or an extra use case."
"While the deployed product is generally functional, there are instances where it presents difficulties."
"The tool's ability to be deployed on a cloud model is an area of concern where improvements are required."
"It would be useful if Cloudera had more tools like SQL Engines that offer the traditional relational database. We have to do a lot of work preparing the data outside Cloudera before getting it into the platform."
"The pricing needs to improve."
"We experienced many issues when we started working with Hadoop 3.0 in the Cloudera 6.0 version, so there is a lot of things that need to improve."
"The solution is not fit for on-premise distributions."
"Currently, we are using many other tools such as Spark and Blade Job to improve the performance."
More Cloudera Distribution for Hadoop Pricing and Cost Advice →
Apache Spark is ranked 1st in Hadoop with 60 reviews while Cloudera Distribution for Hadoop is ranked 2nd in Hadoop with 47 reviews. Apache Spark is rated 8.4, while Cloudera Distribution for Hadoop is rated 8.0. The top reviewer of Apache Spark writes "Reliable, able to expand, and handle large amounts of data well". On the other hand, the top reviewer of Cloudera Distribution for Hadoop writes "Good end-to-end security features and we like that it's cloud independent". Apache Spark is most compared with Spring Boot, AWS Batch, Spark SQL, SAP HANA and AWS Lambda, whereas Cloudera Distribution for Hadoop is most compared with Amazon EMR, HPE Ezmeral Data Fabric, MongoDB, Cassandra and InfluxDB. See our Apache Spark vs. Cloudera Distribution for Hadoop report.
See our list of best Hadoop vendors.
We monitor all Hadoop reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.