Amazon Redshift vs Apache Hadoop comparison

Cancel
You must select at least 2 products to compare!
Amazon Web Services (AWS) Logo
8,203 views|6,066 comparisons
87% willing to recommend
Apache Logo
2,630 views|2,223 comparisons
89% willing to recommend
Comparison Buyer's Guide
Executive Summary

We performed a comparison between Amazon Redshift and Apache Hadoop based on real PeerSpot user reviews.

Find out in this report how the two Data Warehouse solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI.
To learn more, read our detailed Amazon Redshift vs. Apache Hadoop Report (Updated: March 2024).
767,847 professionals have used our research since 2012.
Featured Review
Quotes From Members
We asked business professionals to review the solutions they use.
Here are some excerpts of what they said:
Pros
"The most valuable feature of Amazon Redshift is its ability to handle really large sets of data.""If the analyst knows SQL, which is comfortable and easy to use to go between all of these tool stacks, I think it's reliable. It's a secure and reliable data warehouse.""I have primarily used the Redshift Spectrum feature and found it most valuable.""I like the cost-benefit ratio, meaning that it is as easy to use as it is powerful and well-performing.""The most valuable feature of Redshift is its cluster.""The most valuable feature is that the solution is fully embedded in the AWS stack.""Though Amazon Redshift is good, it depends on what kind of business you're trying to do, what type of analytics you need, and how much data you have.""In terms of valuable features, I like the columnar storage that Redshift provides. The storage is one of the key features that we're looking for. Also, the data updates and the latency between the data-refreshes."

More Amazon Redshift Pros →

"It's open-source, so it's very cost-effective.""Initially, with RDBMS alone, we had a lot of work and few servers running on-premise and on cloud for the PoC and incubation. With the use of Hadoop and ecosystem components and tools, and managing it in Amazon EC2, we have created a Big Data "lab" which helps us to centralize all our work and solutions into a single repository. This has cut down the time in terms of maintenance, development and, especially, data processing challenges.""The most valuable features are powerful tools for ingestion, as data is in multiple systems.""Two valuable features are its scalability and parallel processing. There are jobs that cannot be done unless you have massively parallel processing.""The ability to add multiple nodes without any restriction is the solution's most valuable aspect.""​​Data ingestion: It has rapid speed, if Apache Accumulo is used.""Hadoop is extensible — it's elastic.""I liked that Apache Hadoop was powerful, had a lot of tools, and the fact that it was free and community-developed."

More Apache Hadoop Pros →

Cons
"In the next release, a pivot function would be a big help. It could save a lot of time creating a query or process to handle operations.""Pricing is one of the things that it could improve. It should be more competitive.""Amazon should provide more cloud-native tools that can integrate with Redshift like Microsoft's development tools for Azure.""This solution lacks integration with non-AWS sources.""Redshift's GUI could be more user-friendly. It's easier to perform queries and all that stuff in Azure Synapse Analytics.""The OLAP slide and dice features need to be improved.""Should be made available across zones, like other Multi-AZ solutions.""It takes a lot of time to ingest and update the data."

More Amazon Redshift Cons →

"The main thing is the lack of community support. If you want to implement a new API or create a new file system, you won't find easy support.""There is a lack of virtualization and presentation layers, so you can't take it and implement it like a radio solution.""The stability of the solution needs improvement.""What could be improved in Apache Hadoop is its user-friendliness. It's not that user-friendly, but maybe it's because I'm new to it. Sometimes it feels so tough to use, but it could be because of two aspects: one is my incompetency, for example, I don't know about all the features of Apache Hadoop, or maybe it's because of the limitations of the platform. For example, my team is maintaining the business glossary in Apache Atlas, but if you want to change any settings at the GUI level, an advanced level of coding or programming needs to be done in the back end, so it's not user-friendly.""In certain cases, the configurations for dealing with data skewness do not make any sense.""The price could be better. I think we would use it more, but the company didn't want to pay for it. Hortonworks doesn't exist anymore, and Cloudera killed the free version of Hadoop.""Real-time data processing is weak. This solution is very difficult to run and implement.""I think more of the solution needs to be focused around the panel processing and retrieval of data."

More Apache Hadoop Cons →

Pricing and Cost Advice
  • "Redshift is very cost effective for a cloud based solution if you need to scale it a lot. For smaller data sizes, I would think about using other products."
  • "If you want a fixed price, an to not worry about every query, but you need to manage your nodes personally, use Redshift."
  • "BI is sold to our customer base as a part of the initial sales bundle. A customer may elect to opt for a white labeled site for an up-charge."
  • "One of my customers went with Google Big Query over Redshift because it was significantly cheaper for their project."
  • "Per hour pricing is helpful to keep the costs of a pilot down, but long-term retention is expensive."
  • "It's around $200 US dollars. There are some data transfer costs but it's minimal, around $20."
  • "The best part about this solution is the cost."
  • "The part that I like best is that you only pay for what you are using."
  • More Amazon Redshift Pricing and Cost Advice →

  • "Do take into consider that data storage and compute capacity scale differently and hence purchasing a "boxed" / 'all-in-one" solution (software and hardware) might not be the best idea."
  • "​There are no licensing costs involved, hence money is saved on the software infrastructure​."
  • "This is a low cost and powerful solution."
  • "The price of Apache Hadoop could be less expensive."
  • "If my company can use the cloud version of Apache Hadoop, particularly the cloud storage feature, it would be easier and would cost less because an on-premises deployment has a higher cost during storage, for example, though I don't know exactly how much Apache Hadoop costs."
  • "We don't directly pay for it. Our clients pay for it, and they usually don't complain about the price. So, it is probably acceptable."
  • "The price could be better. Hortonworks no longer exists, and Cloudera killed the free version of Hadoop."
  • "We just use the free version."
  • More Apache Hadoop Pricing and Cost Advice →

    report
    Use our free recommendation engine to learn which Data Warehouse solutions are best for your needs.
    767,847 professionals have used our research since 2012.
    Questions from the Community
    Top Answer:Amazon Redshift is very fast, has a very good response time, and is very user-friendly. The initial setup is very straightforward. This solution can merge and integrate well with many different… more »
    Top Answer:Redshift Spectrum is the most valuable feature.
    Top Answer:Hadoop File System is compatible with almost all the query engines.
    Top Answer:The tool provides functionalities to deal with data skewness or a diverse set of data. There are some configurations that it usually provides. In certain cases, the configurations for dealing with… more »
    Ranking
    4th
    Views
    8,203
    Comparisons
    6,066
    Reviews
    23
    Average Words per Review
    480
    Rating
    7.7
    5th
    out of 34 in Data Warehouse
    Views
    2,630
    Comparisons
    2,223
    Reviews
    11
    Average Words per Review
    532
    Rating
    8.0
    Comparisons
    Learn More
    Overview

    What is Amazon Redshift?

    Amazon Redshift is a fully administered, petabyte-scale cloud-based data warehouse service. Users are able to begin with a minimal amount of gigabytes of data and can easily scale up to a petabyte or more as needed. This will enable them to utilize their own data to develop new intuitions on how to improve business processes and client relations.

    Initially, users start to develop a data warehouse by initiating what is called an Amazon Redshift cluster or a set of nodes. Once the cluster has been provisioned, users can seamlessly upload data sets, and then begin to perform data analysis queries. Amazon Redshift delivers super-fast query performance, regardless of size, utilizing the exact SQL-based tools and BI applications that most users are already working with today.

    The Amazon Redshift service performs all of the work of setting up, operating, and scaling a data warehouse. These tasks include provisioning capacity, monitoring and backing up the cluster, and applying patches and upgrades to the Amazon Redshift engine.

    Amazon Redshift Functionalities

    Amazon Redshift has many valuable key functionalities. Some of its most useful functionalities include:

    • Cluster administration: The Amazon Redshift cluster is a group of nodes that contains a leader node and one (or more) compute node(s). The compute nodes needed are dependent on the data size, amount of queries needed, and the query execution functionality desired.
    • Cluster snapshots: Snapshots are backups of a cluster from an exact point in time. Amazon Redshift offers two types of snapshots: manual and automated. Amazon will store these snapshots internally in the Amazon Simple Storage Service (Amazon S3) utilizing an SSL connection. Whenever a Snapshot restore is needed, Amazon Redshift will create a new cluster and will import data from the snapshot as directed. 
    • Cluster access: Amazon Redshift provides several intuitive features to help define connectivity rules, encrypt data and connections, and control the overall access of your cluster.
    • IAM credentials and AWS accounts: The Amazon Redshift cluster is only accessible by the AWS account that created the cluster. This automatically secures the cluster and keeps it safe. Inside the AWS account, users access the AWS Identity and IAM protocol to create additional user accounts and manage permissions, granting specified users the desired access needed to control cluster performance.
    • Encryption: Users have the option to choose to encrypt the clusters for additional added security once the cluster is provisioned. When encryption is enabled, Amazon Redshift will store all the data in user-created tables in a secure encrypted format. To manage Amazon Redshift encryption keys, users will access AWS Key Management Service (AWS KMS).

    Reviews from Real Users

    Redshift's versioning and data security are the two most critical features. When migrating into the cloud, it's vital to secure the data. The encryption and security are there.” - Kundan A., Senior Consultant at Dynamic Elements AS

    “With the cloud version whenever you want to deploy, you can scale up, and down, and it has a data warehousing capability. Redshift has many features. They have enriched and elaborate documentation that is helpful.”- Aishwarya K., Solution Architect at Capgemini

    The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
    Sample Customers
    Liberty Mutual Insurance, 4Cite Marketing, BrandVerity, DNA Plc, Sirocco Systems, Gainsight, Blue 449
    Amazon, Adobe, eBay, Facebook, Google, Hulu, IBM, LinkedIn, Microsoft, Spotify, AOL, Twitter, University of Maryland, Yahoo!, Cornell University Web Lab
    Top Industries
    REVIEWERS
    Computer Software Company32%
    Comms Service Provider14%
    Manufacturing Company11%
    Retailer11%
    VISITORS READING REVIEWS
    Educational Organization50%
    Financial Services Firm9%
    Computer Software Company7%
    Manufacturing Company4%
    REVIEWERS
    Financial Services Firm38%
    Comms Service Provider25%
    Hospitality Company6%
    Consumer Goods Company6%
    VISITORS READING REVIEWS
    Financial Services Firm27%
    Computer Software Company10%
    Comms Service Provider6%
    University6%
    Company Size
    REVIEWERS
    Small Business40%
    Midsize Enterprise24%
    Large Enterprise37%
    VISITORS READING REVIEWS
    Small Business10%
    Midsize Enterprise54%
    Large Enterprise36%
    REVIEWERS
    Small Business34%
    Midsize Enterprise23%
    Large Enterprise43%
    VISITORS READING REVIEWS
    Small Business15%
    Midsize Enterprise10%
    Large Enterprise75%
    Buyer's Guide
    Amazon Redshift vs. Apache Hadoop
    March 2024
    Find out what your peers are saying about Amazon Redshift vs. Apache Hadoop and other solutions. Updated: March 2024.
    767,847 professionals have used our research since 2012.

    Amazon Redshift is ranked 4th in Cloud Data Warehouse with 58 reviews while Apache Hadoop is ranked 5th in Data Warehouse with 32 reviews. Amazon Redshift is rated 7.8, while Apache Hadoop is rated 7.8. The top reviewer of Amazon Redshift writes "Provides one place where we can store data, and allows us to easily connect to other services with AWS". On the other hand, the top reviewer of Apache Hadoop writes "A file system for data collection that contains needed information and files". Amazon Redshift is most compared with AWS Lake Formation, Snowflake, Teradata and Vertica, whereas Apache Hadoop is most compared with Azure Data Factory, Microsoft Azure Synapse Analytics, Oracle Exadata, Snowflake and Vertica. See our Amazon Redshift vs. Apache Hadoop report.

    See our list of best Data Warehouse vendors and best Cloud Data Warehouse vendors.

    We monitor all Data Warehouse reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.