We performed a comparison between Amazon EMR, Cloudera Distribution for Hadoop, and Hortonworks Data Platform based on real PeerSpot user reviews.
Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop."Amazon EMR's most valuable features are processing speed and data storage capacity."
"The solution is pretty simple to set up."
"One of the valuable features about this solution is that it's managed services, so it's pretty stable, and scalable as much as you wish. It has all the necessary distributions. With some additional work, it's also possible to change to a Spark version with the latest version of EMR. It also has Hudi, so we are leveraging Apache Hudi on EMR for change data capture, so then it comes out-of-the-box in EMR."
"In Amazon EMR it is easy to rebuild anything, easy to upgrade and has good fault tolerance."
"The project management is very streamlined."
"We are using applications, such as Splunk, Livy, Hadoop, and Spark. We are using all of these applications in Amazon EMR and they're helping us a lot."
"The initial setup is pretty straightforward."
"Amazon EMR is a good solution that can be used to manage big data."
"The product provides better data processing features than other tools."
"The main advantage is the storage is less expensive."
"Cloudera is a very manageable solution with good support."
"The file system is a valuable feature."
"CDH has a wide variety of proprietary tools that we use, like Impala. So from that perspective, it's quite useful as opposed to something open-source. We get a lot of value from Cloudera's proprietary tools."
"We're now able to store large volumes of data through Cloudera Distribution for Hadoop. We're able to push large volumes of data to the platform, and that used to be a challenge, especially when storing a terabyte of information. This is the area where Cloudera Distribution for Hadoop improved the organization."
"The most valuable feature is Kubernetes."
"I don't see any performance issues."
"The product offers a fairly easy setup process."
"Now, using this solution, it is much cheaper to have all of the data available for searching, not in real-time, but whenever there is a pending request."
"We use it for data science activities."
"The scalability is the key reason why we are on this platform."
"Ranger for security; with Ranger we can manager user’s permissions/access controls very easily."
"Distributed computing, secure containerization, and governance capabilities are the most valuable features."
"The upgrades and patches must come from Hortonworks."
"Ambari Web UI: user-friendly."
"There were times where they would release new versions and it seemed to end up breaking old versions, which is very strange."
"The product's features for storing data in static clusters could be better."
"The problem for us is it starts very slow."
"Amazon EMR is continuously improving, but maybe something like CI/CD out-of-the-box or integration with Prometheus Grafana."
"The dashboard management could be better. Right now, it's lacking a bit."
"As people are shifting from legacy solutions to other technologies, Amazon EMR needs to add more features that give more flexibility in managing user data."
"There is room for improvement in pricing."
"Amazon EMR can improve by adding some features, such as megastore services and HiveServer2. Additionally, the user interface could be better, similar to what Apache service provides, cross-platform services."
"There is a maximum of a one-gigabyte block size, which is an area of storage that can be improved upon."
"The user infrastructure and user interface needs to be improved, as well as the performance. The GUI needs to be better."
"The governance aspect of the solution should be improved."
"This is a very expensive solution."
"It could be faster and more user-friendly."
"The one thing that we struggled with predominately was support. Because it was relatively new, support was always a big issue and I think it's still a bit of an ongoing concern with the team currently managing it."
"The Cloudera training has deteriorated significantly."
"Cloudera's support is extremely bad and cannot be relied on."
"The version control of the software is also an issue."
"Deleting any service requires a lot of clean up, unlike Cloudera."
"It's at end of life and no longer will there be improvements."
"I work a lot with banking, IT and communications customers. Hortonworks must improve or must upgrade their services for these sectors."
"Security and workload management need improvement."
"It would also be nice if there were less coding involved."
"Hive performance. If Hive performance increased, Hadoop would replace (not everywhere) traditional databases."
"The cost of the solution is high and there is room for improvement."
More Cloudera Distribution for Hadoop Pricing and Cost Advice →