Amazon Redshift vs Apache Hadoop comparison

Cancel
You must select at least 2 products to compare!
Amazon Web Services (AWS) Logo
7,785 views|5,798 comparisons
87% willing to recommend
Apache Logo
2,467 views|2,109 comparisons
87% willing to recommend
Comparison Buyer's Guide
Executive Summary

We performed a comparison between Amazon Redshift and Apache Hadoop based on real PeerSpot user reviews.

Find out in this report how the two Data Warehouse solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI.
To learn more, read our detailed Amazon Redshift vs. Apache Hadoop Report (Updated: May 2024).
770,141 professionals have used our research since 2012.
Featured Review
Quotes From Members
We asked business professionals to review the solutions they use.
Here are some excerpts of what they said:
Pros
"The most valuable features are that it's easy to set up and easy to connect the many tools that connect to it.""I have primarily used the Redshift Spectrum feature and found it most valuable.""If the analyst knows SQL, which is comfortable and easy to use to go between all of these tool stacks, I think it's reliable. It's a secure and reliable data warehouse.""This service can merge and integrate well with all databases.""Amazon Redshift offers a relatively flexible structure...I rate the technical support a nine out of ten.""The ability to reload data multiple times at different times.""The product is relatively easy to use because there is no indexing and no partitions.""Redshift's Excel features are handy. Redshift spectrum allows you to directly query the data on an Excel sheet. Now, SQL Server also allows this, but Redshift has many more features."

More Amazon Redshift Pros →

"It's open-source, so it's very cost-effective.""The most valuable features are powerful tools for ingestion, as data is in multiple systems.""It's primarily open source. You can handle huge data volumes and create your own views, workflows, and tables. I can also use it for real-time data streaming.""Apache Hadoop can manage large amounts and volumes of data with relative ease, which is a feature that is beneficial.""I liked that Apache Hadoop was powerful, had a lot of tools, and the fact that it was free and community-developed.""​​Data ingestion: It has rapid speed, if Apache Accumulo is used.""What comes with the standard setup is what we mostly use, but Ambari is the most important.""Since both Apache Hadoop and Amazon EC2 are elastic in nature, we can scale and expand on demand for a specific PoC, and scale down when it's done."

More Apache Hadoop Pros →

Cons
"It would be nice if we could turn off an instance. However, it would retain the instance in history, thus allowing us to restart without beginning from scratch.""It takes a lot of time to ingest and update the data.""One area where Amazon Redshift could improve is in adopting the compute-separate, data-separate architecture, which Delta, Snowflake are adopting, and a few others in the cloud data warehouse spectrum.""The customer support could be more responsive.""There is some missing functionality and sometimes it's so difficult to work in. We need to convert these functionalities using VACUUM inside Amazon Redshift and then it causes some complexity.""This solution lacks integration with non-AWS sources.""The initial setup is a complex process, especially for someone who is not familiar with nodes and configuring terms like RPUs.""It lacks a few features which can be very useful, such as stored procedures"

More Amazon Redshift Cons →

"The price could be better. I think we would use it more, but the company didn't want to pay for it. Hortonworks doesn't exist anymore, and Cloudera killed the free version of Hadoop.""The upgrade path should be improved because it is not as easy as it should be.""It could be more user-friendly.""Hadoop's security could be better.""The main thing is the lack of community support. If you want to implement a new API or create a new file system, you won't find easy support.""It would be good to have more advanced analytics tools.""The solution is not easy to use. The solution should be easy to use and suitable for almost any case connected with the use of big data or working with large amounts of data.""I would like to see more direct integration of visualization applications."

More Apache Hadoop Cons →

Pricing and Cost Advice
  • "Redshift is very cost effective for a cloud based solution if you need to scale it a lot. For smaller data sizes, I would think about using other products."
  • "If you want a fixed price, an to not worry about every query, but you need to manage your nodes personally, use Redshift."
  • "BI is sold to our customer base as a part of the initial sales bundle. A customer may elect to opt for a white labeled site for an up-charge."
  • "One of my customers went with Google Big Query over Redshift because it was significantly cheaper for their project."
  • "Per hour pricing is helpful to keep the costs of a pilot down, but long-term retention is expensive."
  • "It's around $200 US dollars. There are some data transfer costs but it's minimal, around $20."
  • "The best part about this solution is the cost."
  • "The part that I like best is that you only pay for what you are using."
  • More Amazon Redshift Pricing and Cost Advice →

  • "Do take into consider that data storage and compute capacity scale differently and hence purchasing a "boxed" / 'all-in-one" solution (software and hardware) might not be the best idea."
  • "​There are no licensing costs involved, hence money is saved on the software infrastructure​."
  • "This is a low cost and powerful solution."
  • "The price of Apache Hadoop could be less expensive."
  • "If my company can use the cloud version of Apache Hadoop, particularly the cloud storage feature, it would be easier and would cost less because an on-premises deployment has a higher cost during storage, for example, though I don't know exactly how much Apache Hadoop costs."
  • "We don't directly pay for it. Our clients pay for it, and they usually don't complain about the price. So, it is probably acceptable."
  • "The price could be better. Hortonworks no longer exists, and Cloudera killed the free version of Hadoop."
  • "We just use the free version."
  • More Apache Hadoop Pricing and Cost Advice →

    report
    Use our free recommendation engine to learn which Data Warehouse solutions are best for your needs.
    770,141 professionals have used our research since 2012.
    Questions from the Community
    Top Answer:Amazon Redshift is very fast, has a very good response time, and is very user-friendly. The initial setup is very straightforward. This solution can merge and integrate well with many different… more »
    Top Answer:The tool's most valuable feature is its parallel processing capability. It can handle massive amounts of data, even when pushing hundreds of terabytes, and its scaling capabilities are good.
    Top Answer:Tools like Apache Hadoop are knowledge-intensive in nature. Unlike other tools in the market currently, we cannot understand knowledge-intensive products straight away. To use Apache Hadoop, a person… more »
    Ranking
    4th
    Views
    7,785
    Comparisons
    5,798
    Reviews
    25
    Average Words per Review
    497
    Rating
    7.7
    5th
    out of 35 in Data Warehouse
    Views
    2,467
    Comparisons
    2,109
    Reviews
    11
    Average Words per Review
    573
    Rating
    7.9
    Comparisons
    Learn More
    Overview

    What is Amazon Redshift?

    Amazon Redshift is a fully administered, petabyte-scale cloud-based data warehouse service. Users are able to begin with a minimal amount of gigabytes of data and can easily scale up to a petabyte or more as needed. This will enable them to utilize their own data to develop new intuitions on how to improve business processes and client relations.

    Initially, users start to develop a data warehouse by initiating what is called an Amazon Redshift cluster or a set of nodes. Once the cluster has been provisioned, users can seamlessly upload data sets, and then begin to perform data analysis queries. Amazon Redshift delivers super-fast query performance, regardless of size, utilizing the exact SQL-based tools and BI applications that most users are already working with today.

    The Amazon Redshift service performs all of the work of setting up, operating, and scaling a data warehouse. These tasks include provisioning capacity, monitoring and backing up the cluster, and applying patches and upgrades to the Amazon Redshift engine.

    Amazon Redshift Functionalities

    Amazon Redshift has many valuable key functionalities. Some of its most useful functionalities include:

    • Cluster administration: The Amazon Redshift cluster is a group of nodes that contains a leader node and one (or more) compute node(s). The compute nodes needed are dependent on the data size, amount of queries needed, and the query execution functionality desired.
    • Cluster snapshots: Snapshots are backups of a cluster from an exact point in time. Amazon Redshift offers two types of snapshots: manual and automated. Amazon will store these snapshots internally in the Amazon Simple Storage Service (Amazon S3) utilizing an SSL connection. Whenever a Snapshot restore is needed, Amazon Redshift will create a new cluster and will import data from the snapshot as directed. 
    • Cluster access: Amazon Redshift provides several intuitive features to help define connectivity rules, encrypt data and connections, and control the overall access of your cluster.
    • IAM credentials and AWS accounts: The Amazon Redshift cluster is only accessible by the AWS account that created the cluster. This automatically secures the cluster and keeps it safe. Inside the AWS account, users access the AWS Identity and IAM protocol to create additional user accounts and manage permissions, granting specified users the desired access needed to control cluster performance.
    • Encryption: Users have the option to choose to encrypt the clusters for additional added security once the cluster is provisioned. When encryption is enabled, Amazon Redshift will store all the data in user-created tables in a secure encrypted format. To manage Amazon Redshift encryption keys, users will access AWS Key Management Service (AWS KMS).

    Reviews from Real Users

    Redshift's versioning and data security are the two most critical features. When migrating into the cloud, it's vital to secure the data. The encryption and security are there.” - Kundan A., Senior Consultant at Dynamic Elements AS

    “With the cloud version whenever you want to deploy, you can scale up, and down, and it has a data warehousing capability. Redshift has many features. They have enriched and elaborate documentation that is helpful.”- Aishwarya K., Solution Architect at Capgemini

    The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
    Sample Customers
    Liberty Mutual Insurance, 4Cite Marketing, BrandVerity, DNA Plc, Sirocco Systems, Gainsight, Blue 449
    Amazon, Adobe, eBay, Facebook, Google, Hulu, IBM, LinkedIn, Microsoft, Spotify, AOL, Twitter, University of Maryland, Yahoo!, Cornell University Web Lab
    Top Industries
    REVIEWERS
    Computer Software Company32%
    Comms Service Provider14%
    Manufacturing Company11%
    Retailer11%
    VISITORS READING REVIEWS
    Educational Organization50%
    Financial Services Firm9%
    Computer Software Company7%
    Manufacturing Company4%
    REVIEWERS
    Financial Services Firm38%
    Comms Service Provider25%
    Hospitality Company6%
    Consumer Goods Company6%
    VISITORS READING REVIEWS
    Financial Services Firm28%
    Computer Software Company10%
    Comms Service Provider6%
    University6%
    Company Size
    REVIEWERS
    Small Business40%
    Midsize Enterprise24%
    Large Enterprise37%
    VISITORS READING REVIEWS
    Small Business10%
    Midsize Enterprise55%
    Large Enterprise35%
    REVIEWERS
    Small Business34%
    Midsize Enterprise20%
    Large Enterprise46%
    VISITORS READING REVIEWS
    Small Business14%
    Midsize Enterprise11%
    Large Enterprise74%
    Buyer's Guide
    Amazon Redshift vs. Apache Hadoop
    May 2024
    Find out what your peers are saying about Amazon Redshift vs. Apache Hadoop and other solutions. Updated: May 2024.
    770,141 professionals have used our research since 2012.

    Amazon Redshift is ranked 4th in Cloud Data Warehouse with 59 reviews while Apache Hadoop is ranked 5th in Data Warehouse with 33 reviews. Amazon Redshift is rated 7.8, while Apache Hadoop is rated 7.8. The top reviewer of Amazon Redshift writes "Provides one place where we can store data, and allows us to easily connect to other services with AWS". On the other hand, the top reviewer of Apache Hadoop writes "Handles huge data volumes and create your own workflows and tables but you need to have deeper knowledge". Amazon Redshift is most compared with Snowflake, Teradata, AWS Lake Formation, Vertica and Microsoft Azure Synapse Analytics, whereas Apache Hadoop is most compared with Azure Data Factory, Microsoft Azure Synapse Analytics, Oracle Exadata, Snowflake and Dremio. See our Amazon Redshift vs. Apache Hadoop report.

    See our list of best Data Warehouse vendors and best Cloud Data Warehouse vendors.

    We monitor all Data Warehouse reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.