We performed a comparison between Amazon EMR and Apache Spark based on real PeerSpot user reviews.
Find out in this report how the two Hadoop solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."Amazon EMR's most valuable features are processing speed and data storage capacity."
"The solution is scalable."
"It allows users to access the data through a web interface."
"This is the best tool for hosts and it's really flexible and scalable."
"The initial setup is pretty straightforward."
"The initial setup is straightforward."
"In Amazon EMR it is easy to rebuild anything, easy to upgrade and has good fault tolerance."
"When we grade big jobs from on-prem to the cloud, we do it in EMR with Spark."
"The most valuable feature of Apache Spark is its flexibility."
"We use Spark to process data from different data sources."
"It's easy to prepare parallelism in Spark, run the solution with specific parameters, and get good performance."
"The most crucial feature for us is the streaming capability. It serves as a fundamental aspect that allows us to exert control over our operations."
"Apache Spark provides a very high-quality implementation of distributed data processing."
"ETL and streaming capabilities."
"This solution provides a clear and convenient syntax for our analytical tasks."
"The main feature that we find valuable is that it is very fast."
"Amazon EMR is continuously improving, but maybe something like CI/CD out-of-the-box or integration with Prometheus Grafana."
"Amazon EMR can improve by adding some features, such as megastore services and HiveServer2. Additionally, the user interface could be better, similar to what Apache service provides, cross-platform services."
"There is room for improvement in pricing."
"Modules and strategies should be better handled and notified early in advance."
"There is no need to pay extra for third-party software."
"The product's features for storing data in static clusters could be better."
"As people are shifting from legacy solutions to other technologies, Amazon EMR needs to add more features that give more flexibility in managing user data."
"The problem for us is it starts very slow."
"When using Spark, users may need to write their own parallelization logic, which requires additional effort and expertise."
"Apache Spark provides very good performance The tuning phase is still tricky."
"Include more machine learning algorithms and the ability to handle streaming of data versus micro batch processing."
"Apache Spark should add some resource management improvements to the algorithms."
"The solution must improve its performance."
"Apache Spark is very difficult to use. It would require a data engineer. It is not available for every engineer today because they need to understand the different concepts of Spark, which is very, very difficult and it is not easy to learn."
"Apache Spark can improve the use case scenarios from the website. There is not any information on how you can use the solution across the relational databases toward multiple databases."
"The solution needs to optimize shuffling between workers."
Amazon EMR is ranked 3rd in Hadoop with 20 reviews while Apache Spark is ranked 1st in Hadoop with 60 reviews. Amazon EMR is rated 7.8, while Apache Spark is rated 8.4. The top reviewer of Amazon EMR writes "Provides efficient data processing features and has good scalability ". On the other hand, the top reviewer of Apache Spark writes "Reliable, able to expand, and handle large amounts of data well". Amazon EMR is most compared with Snowflake, Cloudera Distribution for Hadoop, Azure Data Factory, Amazon Redshift and Microsoft Azure Synapse Analytics, whereas Apache Spark is most compared with Spring Boot, AWS Batch, Spark SQL, SAP HANA and AWS Fargate. See our Amazon EMR vs. Apache Spark report.
See our list of best Hadoop vendors.
We monitor all Hadoop reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.