We performed a comparison between Amazon EMR and Apache Spark based on real PeerSpot user reviews.
Find out in this report how the two Hadoop solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."The project management is very streamlined."
"The solution helps us manage huge volumes of data."
"It allows users to access the data through a web interface."
"The solution is scalable."
"The solution is pretty simple to set up."
"We are using applications, such as Splunk, Livy, Hadoop, and Spark. We are using all of these applications in Amazon EMR and they're helping us a lot."
"The ability to resize the cluster is what really makes it stand out over other Hadoop and big data solutions."
"When we grade big jobs from on-prem to the cloud, we do it in EMR with Spark."
"It is useful for handling large amounts of data. It is very useful for scientific purposes."
"With Spark, we parallelize our operations, efficiently accessing both historical and real-time data."
"There's a lot of functionality."
"The solution has been very stable."
"I appreciate everything about the solution, not just one or two specific features. The solution is highly stable. I rate it a perfect ten. The solution is highly scalable. I rate it a perfect ten. The initial setup was straightforward. I recommend using the solution. Overall, I rate the solution a perfect ten."
"The most valuable feature of this solution is its capacity for processing large amounts of data."
"The product is useful for analytics."
"The most crucial feature for us is the streaming capability. It serves as a fundamental aspect that allows us to exert control over our operations."
"The product must add some of the latest technologies to provide more flexibility to the users."
"Modules and strategies should be better handled and notified early in advance."
"Amazon EMR can improve by adding some features, such as megastore services and HiveServer2. Additionally, the user interface could be better, similar to what Apache service provides, cross-platform services."
"The most complicated thing is configuring to the cluster and ensure it's running correctly."
"There were times where they would release new versions and it seemed to end up breaking old versions, which is very strange."
"Amazon EMR is continuously improving, but maybe something like CI/CD out-of-the-box or integration with Prometheus Grafana."
"There is no need to pay extra for third-party software."
"We don't have much control. If we have multiple users, if they want to scale up, the cost will go and increase and we don't know how we can restrict that price part."
"Apache Spark should add some resource management improvements to the algorithms."
"The initial setup was not easy."
"We use big data manager but we cannot use it as conditional data so whenever we're trying to fetch the data, it takes a bit of time."
"When you first start using this solution, it is common to run into memory errors when you are dealing with large amounts of data."
"Apache Spark is very difficult to use. It would require a data engineer. It is not available for every engineer today because they need to understand the different concepts of Spark, which is very, very difficult and it is not easy to learn."
"Apache Spark could improve the connectors that it supports. There are a lot of open-source databases in the market. For example, cloud databases, such as Redshift, Snowflake, and Synapse. Apache Spark should have connectors present to connect to these databases. There are a lot of workarounds required to connect to those databases, but it should have inbuilt connectors."
"The solution must improve its performance."
"Include more machine learning algorithms and the ability to handle streaming of data versus micro batch processing."
Amazon EMR is ranked 3rd in Hadoop with 20 reviews while Apache Spark is ranked 1st in Hadoop with 60 reviews. Amazon EMR is rated 7.8, while Apache Spark is rated 8.4. The top reviewer of Amazon EMR writes "Provides efficient data processing features and has good scalability ". On the other hand, the top reviewer of Apache Spark writes "Reliable, able to expand, and handle large amounts of data well". Amazon EMR is most compared with Snowflake, Cloudera Distribution for Hadoop, Azure Data Factory, Amazon Redshift and Microsoft Azure Synapse Analytics, whereas Apache Spark is most compared with Spring Boot, AWS Batch, Spark SQL, SAP HANA and AWS Fargate. See our Amazon EMR vs. Apache Spark report.
See our list of best Hadoop vendors.
We monitor all Hadoop reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.