- A highly scalable platform to run vast amounts of data processing with very high efficiency.
- Can be set to auto-scale up/down, as and when required.
- Highly reliable platform with almost no downtime
- Cost Effective
- Can be used as a Web Management Console and a Web Services API.
- Web consoles makes it very easy to run simple jobs.
- Can be very easily integrated with Hadoop Clusters and HDFS distributed file systems.
- Uses the in-house Amazon Elastic Cloud (EC2) and Amazon Simple Storage Service (Amazon S3) for providing a dynamic cloud storage facility.
Room for Improvement:
- Setting up jobs for operations like data mining, web indexing, and machine learning is comparatively easier than log file analysis, financial analysis, etc .
- For novice users, there is a bit of a steep learning curve, but things become much easier once you have the basics under your belt.
- One of the lacking features is good web support. Though the web interface looks pretty decent, some of the basic features are missing. For example, you will find it a bit difficult to customize a particular map to reduce tasks, which involves a lot of customizations with regard to a given web indexing task. This involves extensive use of the underlying HDFS file system.
I've been using Amazon Elastic MapReduce for more than a year and found it to be a very useful tool. I was a bit hesitant to try this wonderful tool when I had started for the first time, but having previous expertise in a similar tool helped me grasp things at a faster pace.
The bottom line is that if you have used some similar tools in the past, you are good to go. And, if you are new to the concept of a distributed task structure, it would be wise to spend a couple of minutes to get yourself acquainted with the MapReduce technology. This is my personal experience.