We performed a comparison between Apache Spark, Cloudera Distribution for Hadoop, and Hortonworks Data Platform based on real PeerSpot user reviews.
Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop."The most valuable feature of Apache Spark is its flexibility."
"The memory processing engine is the solution's most valuable aspect. It processes everything extremely fast, and it's in the cluster itself. It acts as a memory engine and is very effective in processing data correctly."
"The processing time is very much improved over the data warehouse solution that we were using."
"One of the key features is that Apache Spark is a distributed computing framework. You can help multiple slaves and distribute the workload between them."
"Provides a lot of good documentation compared to other solutions."
"The product is useful for analytics."
"The data processing framework is good."
"Its scalability and speed are very valuable. You can scale it a lot. It is a great technology for big data. It is definitely better than a lot of earlier warehouse or pipeline solutions, such as Informatica. Spark SQL is very compliant with normal SQL that we have been using over the years. This makes it easy to code in Spark. It is just like using normal SQL. You can use the APIs of Spark or you can directly write SQL code and run it. This is something that I feel is useful in Spark."
"We had a data warehouse before all the data. We can process a lot more data structures."
"The features I find most valuable is that the solution is that it is easy to install and to work with. It starts with the installation and from there on the management is very simple and centralized."
"The search function is the most valuable aspect of the solution."
"The solution is stable."
"The solution's most valuable feature is the enterprise data platform."
"The most valuable feature is Kubernetes."
"The product as a whole is good."
"Very good end-to-end security features."
"Ranger for security; with Ranger we can manager user’s permissions/access controls very easily."
"Now, using this solution, it is much cheaper to have all of the data available for searching, not in real-time, but whenever there is a pending request."
"It is a scalable platform."
"Distributed computing, secure containerization, and governance capabilities are the most valuable features."
"We use it for data science activities."
"The product offers a fairly easy setup process."
"The Hortonworks solution is so stable. It is working as a production system, without any error, without any downtime. If I have downtime, it is mostly caused by the hardware of the computers."
"Hortonworks should not be expensive at all to those looking into using it."
"There were some problems related to the product's compatibility with a few Python libraries."
"It's not easy to install."
"Spark could be improved by adding support for other open-source storage layers than Delta Lake."
"Apache Spark could potentially improve in terms of user-friendliness, particularly for individuals with a SQL background. While it's suitable for those with programming knowledge, making it more accessible to those without extensive programming skills could be beneficial."
"The solution must improve its performance."
"We use big data manager but we cannot use it as conditional data so whenever we're trying to fetch the data, it takes a bit of time."
"In data analysis, you need to take real-time data from different data sources. You need to process this in a subsecond, do the transformation in a subsecond, and all that."
"The solution’s integration with other platforms should be improved."
"The tool's ability to be deployed on a cloud model is an area of concern where improvements are required."
"The solution does not support multiple languages very well and this means users need to create work-arounds to implement some solutions."
"The competitors provide better functionalities."
"The initial setup of Cloudera is difficult."
"It would be useful if Cloudera had more tools like SQL Engines that offer the traditional relational database. We have to do a lot of work preparing the data outside Cloudera before getting it into the platform."
"The one thing that we struggled with predominately was support. Because it was relatively new, support was always a big issue and I think it's still a bit of an ongoing concern with the team currently managing it."
"The user infrastructure and user interface needs to be improved, as well as the performance. The GUI needs to be better."
"The solution is not fit for on-premise distributions."
"The version control of the software is also an issue."
"Hive performance. If Hive performance increased, Hadoop would replace (not everywhere) traditional databases."
"Since Cloudera acquired HDP, it's been bundled with CBH and HDP. However, the biggest challenge is cloud storage integration with Azure, GCP, and AWS."
"Security and workload management need improvement."
"It's at end of life and no longer will there be improvements."
"I work a lot with banking, IT and communications customers. Hortonworks must improve or must upgrade their services for these sectors."
"I would like to see more support for containers such as Docker and OpenShift."
"It would also be nice if there were less coding involved."
More Cloudera Distribution for Hadoop Pricing and Cost Advice →