We performed a comparison between Amazon EMR, Apache Spark, and Hortonworks Data Platform based on real PeerSpot user reviews.
Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop."It has a variety of options and support systems."
"One of the valuable features about this solution is that it's managed services, so it's pretty stable, and scalable as much as you wish. It has all the necessary distributions. With some additional work, it's also possible to change to a Spark version with the latest version of EMR. It also has Hudi, so we are leveraging Apache Hudi on EMR for change data capture, so then it comes out-of-the-box in EMR."
"Amazon EMR is a good solution that can be used to manage big data."
"Amazon EMR's most valuable features are processing speed and data storage capacity."
"The project management is very streamlined."
"The ability to resize the cluster is what really makes it stand out over other Hadoop and big data solutions."
"The solution is pretty simple to set up."
"We are using applications, such as Splunk, Livy, Hadoop, and Spark. We are using all of these applications in Amazon EMR and they're helping us a lot."
"The most valuable feature of Apache Spark is its ease of use."
"The main feature that we find valuable is that it is very fast."
"One of Apache Spark's most valuable features is that it supports in-memory processing, the execution of jobs compared to traditional tools is very fast."
"Provides a lot of good documentation compared to other solutions."
"Apache Spark provides a very high-quality implementation of distributed data processing."
"The fault tolerant feature is provided."
"There's a lot of functionality."
"The product's deployment phase is easy."
"Ranger for security; with Ranger we can manager user’s permissions/access controls very easily."
"Now, using this solution, it is much cheaper to have all of the data available for searching, not in real-time, but whenever there is a pending request."
"The data platform is pretty neat. The workflow is also really good."
"The Hortonworks solution is so stable. It is working as a production system, without any error, without any downtime. If I have downtime, it is mostly caused by the hardware of the computers."
"The scalability is the key reason why we are on this platform."
"The upgrades and patches must come from Hortonworks."
"Distributed computing, secure containerization, and governance capabilities are the most valuable features."
"The product offers a fairly easy setup process."
"There is room for improvement in pricing."
"Modules and strategies should be better handled and notified early in advance."
"We don't have much control. If we have multiple users, if they want to scale up, the cost will go and increase and we don't know how we can restrict that price part."
"The initial setup was time-consuming."
"The problem for us is it starts very slow."
"The product must add some of the latest technologies to provide more flexibility to the users."
"The most complicated thing is configuring to the cluster and ensure it's running correctly."
"Amazon EMR is continuously improving, but maybe something like CI/CD out-of-the-box or integration with Prometheus Grafana."
"Spark could be improved by adding support for other open-source storage layers than Delta Lake."
"I know there is always discussion about which language to write applications in and some people do love Scala. However, I don't like it."
"Apache Spark can improve the use case scenarios from the website. There is not any information on how you can use the solution across the relational databases toward multiple databases."
"When you are working with large, complex tasks, the garbage collection process is slow and affects performance."
"It's not easy to install."
"When you want to extract data from your HDFS and other sources then it is kind of tricky because you have to connect with those sources."
"The logging for the observability platform could be better."
"The setup I worked on was really complex."
"It's at end of life and no longer will there be improvements."
"The cost of the solution is high and there is room for improvement."
"It would also be nice if there were less coding involved."
"More information could be there to simplify the process of running the product."
"Hive performance. If Hive performance increased, Hadoop would replace (not everywhere) traditional databases."
"The version control of the software is also an issue."
"Security and workload management need improvement."
"I work a lot with banking, IT and communications customers. Hortonworks must improve or must upgrade their services for these sectors."