Amazon Redshift vs Apache Hadoop comparison

Cancel
You must select at least 2 products to compare!
Amazon Web Services (AWS) Logo
7,544 views|5,537 comparisons
87% willing to recommend
Apache Logo
2,387 views|2,021 comparisons
87% willing to recommend
Comparison Buyer's Guide
Executive Summary

We performed a comparison between Amazon Redshift and Apache Hadoop based on real PeerSpot user reviews.

Find out in this report how the two Data Warehouse solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI.
To learn more, read our detailed Amazon Redshift vs. Apache Hadoop Report (Updated: May 2024).
772,679 professionals have used our research since 2012.
Featured Review
Quotes From Members
We asked business professionals to review the solutions they use.
Here are some excerpts of what they said:
Pros
"This service can merge and integrate well with all databases.""The initial setup of this solution is straightforward.""It allows for the storage of huge amounts of data.""You can copy JSON to the column and have it analyzed using simple functions.""Easy to build out our snowflake design and load data.""The stability of Amazon Redshift is good.""In terms of valuable features, I like the columnar storage that Redshift provides. The storage is one of the key features that we're looking for. Also, the data updates and the latency between the data-refreshes.""The most valuable feature is its scalability."

More Amazon Redshift Pros →

"One valuable feature is that we can download data.""We selected Apache Hadoop because it is not dependent on third-party vendors.""It is a file system for data collection. There are nodes in this cluster that contain all the information, directories, and other files. The nodes are based on the MySQL database.""What I like about Apache Hadoop is that it's for big data, in particular big data analysis, and it's the easier solution. I like the data processing feature for AI/ML use cases the most because some solutions allow me to collect data from relational databases, while Hadoop provides me with more options for newer technologies.""The most valuable feature is the database.""The best thing about this solution is that it is very powerful and very cheap.""Apache Hadoop is crucial in projects that save and retrieve data daily. Its valuable features are scalability and stability. It is easy to integrate with the existing infrastructure.""Most valuable features are HDFS and Kafka: Ingestion of huge volumes and variety of unstructured/semi-structured data is feasible, and it helps us to quickly onboard a new Big Data analytics prospect."

More Apache Hadoop Pros →

Cons
"It lacks a few features which can be very useful, such as stored procedures""There is some missing functionality and sometimes it's so difficult to work in. We need to convert these functionalities using VACUUM inside Amazon Redshift and then it causes some complexity.""The initial setup is a complex process, especially for someone who is not familiar with nodes and configuring terms like RPUs.""It takes a lot of time to ingest and update the data.""The product must become a bit more serverless.""This solution lacks integration with non-AWS sources.""The speed of the solution and its portability needs improvement.""Infinite storage is available in Snowflake and is not available in Redshift."

More Amazon Redshift Cons →

"The solution could use a better user interface. It needs a more effective GUI in order to create a better user environment.""The main thing is the lack of community support. If you want to implement a new API or create a new file system, you won't find easy support.""From the Apache perspective or the open-source community, they need to add more capabilities to make life easier from a configuration and deployment perspective.""The key shortcoming is its inability to handle queries when there is insufficient memory. This limitation can be bypassed by processing the data in chunks.""It would be good to have more advanced analytics tools.""I would like to see more direct integration of visualization applications.""It would be helpful to have more information on how to best apply this solution to smaller organizations, with less data, and grow the data lake.""Real-time data processing is weak. This solution is very difficult to run and implement."

More Apache Hadoop Cons →

Pricing and Cost Advice
  • "Redshift is very cost effective for a cloud based solution if you need to scale it a lot. For smaller data sizes, I would think about using other products."
  • "If you want a fixed price, an to not worry about every query, but you need to manage your nodes personally, use Redshift."
  • "BI is sold to our customer base as a part of the initial sales bundle. A customer may elect to opt for a white labeled site for an up-charge."
  • "One of my customers went with Google Big Query over Redshift because it was significantly cheaper for their project."
  • "Per hour pricing is helpful to keep the costs of a pilot down, but long-term retention is expensive."
  • "It's around $200 US dollars. There are some data transfer costs but it's minimal, around $20."
  • "The best part about this solution is the cost."
  • "The part that I like best is that you only pay for what you are using."
  • More Amazon Redshift Pricing and Cost Advice →

  • "Do take into consider that data storage and compute capacity scale differently and hence purchasing a "boxed" / 'all-in-one" solution (software and hardware) might not be the best idea."
  • "​There are no licensing costs involved, hence money is saved on the software infrastructure​."
  • "This is a low cost and powerful solution."
  • "The price of Apache Hadoop could be less expensive."
  • "If my company can use the cloud version of Apache Hadoop, particularly the cloud storage feature, it would be easier and would cost less because an on-premises deployment has a higher cost during storage, for example, though I don't know exactly how much Apache Hadoop costs."
  • "We don't directly pay for it. Our clients pay for it, and they usually don't complain about the price. So, it is probably acceptable."
  • "The price could be better. Hortonworks no longer exists, and Cloudera killed the free version of Hadoop."
  • "We just use the free version."
  • More Apache Hadoop Pricing and Cost Advice →

    report
    Use our free recommendation engine to learn which Data Warehouse solutions are best for your needs.
    772,679 professionals have used our research since 2012.
    Questions from the Community
    Top Answer:Amazon Redshift is very fast, has a very good response time, and is very user-friendly. The initial setup is very straightforward. This solution can merge and integrate well with many different… more »
    Top Answer:The tool's most valuable feature is its parallel processing capability. It can handle massive amounts of data, even when pushing hundreds of terabytes, and its scaling capabilities are good.
    Top Answer:It's primarily open source. You can handle huge data volumes and create your own views, workflows, and tables. I can also use it for real-time data streaming.
    Top Answer:Since it is an open-source product, there won't be much support. So, you have to have deeper knowledge. You need to improvise based on that.
    Ranking
    4th
    Views
    7,544
    Comparisons
    5,537
    Reviews
    24
    Average Words per Review
    504
    Rating
    7.8
    6th
    out of 35 in Data Warehouse
    Views
    2,387
    Comparisons
    2,021
    Reviews
    13
    Average Words per Review
    530
    Rating
    7.8
    Comparisons
    Learn More
    Overview

    What is Amazon Redshift?

    Amazon Redshift is a fully administered, petabyte-scale cloud-based data warehouse service. Users are able to begin with a minimal amount of gigabytes of data and can easily scale up to a petabyte or more as needed. This will enable them to utilize their own data to develop new intuitions on how to improve business processes and client relations.

    Initially, users start to develop a data warehouse by initiating what is called an Amazon Redshift cluster or a set of nodes. Once the cluster has been provisioned, users can seamlessly upload data sets, and then begin to perform data analysis queries. Amazon Redshift delivers super-fast query performance, regardless of size, utilizing the exact SQL-based tools and BI applications that most users are already working with today.

    The Amazon Redshift service performs all of the work of setting up, operating, and scaling a data warehouse. These tasks include provisioning capacity, monitoring and backing up the cluster, and applying patches and upgrades to the Amazon Redshift engine.

    Amazon Redshift Functionalities

    Amazon Redshift has many valuable key functionalities. Some of its most useful functionalities include:

    • Cluster administration: The Amazon Redshift cluster is a group of nodes that contains a leader node and one (or more) compute node(s). The compute nodes needed are dependent on the data size, amount of queries needed, and the query execution functionality desired.
    • Cluster snapshots: Snapshots are backups of a cluster from an exact point in time. Amazon Redshift offers two types of snapshots: manual and automated. Amazon will store these snapshots internally in the Amazon Simple Storage Service (Amazon S3) utilizing an SSL connection. Whenever a Snapshot restore is needed, Amazon Redshift will create a new cluster and will import data from the snapshot as directed. 
    • Cluster access: Amazon Redshift provides several intuitive features to help define connectivity rules, encrypt data and connections, and control the overall access of your cluster.
    • IAM credentials and AWS accounts: The Amazon Redshift cluster is only accessible by the AWS account that created the cluster. This automatically secures the cluster and keeps it safe. Inside the AWS account, users access the AWS Identity and IAM protocol to create additional user accounts and manage permissions, granting specified users the desired access needed to control cluster performance.
    • Encryption: Users have the option to choose to encrypt the clusters for additional added security once the cluster is provisioned. When encryption is enabled, Amazon Redshift will store all the data in user-created tables in a secure encrypted format. To manage Amazon Redshift encryption keys, users will access AWS Key Management Service (AWS KMS).

    Reviews from Real Users

    Redshift's versioning and data security are the two most critical features. When migrating into the cloud, it's vital to secure the data. The encryption and security are there.” - Kundan A., Senior Consultant at Dynamic Elements AS

    “With the cloud version whenever you want to deploy, you can scale up, and down, and it has a data warehousing capability. Redshift has many features. They have enriched and elaborate documentation that is helpful.”- Aishwarya K., Solution Architect at Capgemini

    The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
    Sample Customers
    Liberty Mutual Insurance, 4Cite Marketing, BrandVerity, DNA Plc, Sirocco Systems, Gainsight, Blue 449
    Amazon, Adobe, eBay, Facebook, Google, Hulu, IBM, LinkedIn, Microsoft, Spotify, AOL, Twitter, University of Maryland, Yahoo!, Cornell University Web Lab
    Top Industries
    REVIEWERS
    Computer Software Company34%
    Comms Service Provider14%
    Retailer10%
    Manufacturing Company10%
    VISITORS READING REVIEWS
    Educational Organization51%
    Financial Services Firm9%
    Computer Software Company7%
    Manufacturing Company4%
    REVIEWERS
    Financial Services Firm35%
    Comms Service Provider24%
    Hospitality Company6%
    Consumer Goods Company6%
    VISITORS READING REVIEWS
    Financial Services Firm29%
    Computer Software Company11%
    University6%
    Manufacturing Company5%
    Company Size
    REVIEWERS
    Small Business38%
    Midsize Enterprise25%
    Large Enterprise37%
    VISITORS READING REVIEWS
    Small Business10%
    Midsize Enterprise55%
    Large Enterprise35%
    REVIEWERS
    Small Business33%
    Midsize Enterprise19%
    Large Enterprise47%
    VISITORS READING REVIEWS
    Small Business15%
    Midsize Enterprise11%
    Large Enterprise74%
    Buyer's Guide
    Amazon Redshift vs. Apache Hadoop
    May 2024
    Find out what your peers are saying about Amazon Redshift vs. Apache Hadoop and other solutions. Updated: May 2024.
    772,679 professionals have used our research since 2012.

    Amazon Redshift is ranked 4th in Cloud Data Warehouse with 61 reviews while Apache Hadoop is ranked 6th in Data Warehouse with 34 reviews. Amazon Redshift is rated 7.8, while Apache Hadoop is rated 7.8. The top reviewer of Amazon Redshift writes "Provides one place where we can store data, and allows us to easily connect to other services with AWS". On the other hand, the top reviewer of Apache Hadoop writes "Handles huge data volumes and create your own workflows and tables but you need to have deeper knowledge". Amazon Redshift is most compared with Teradata, Vertica, Snowflake, Microsoft Azure Synapse Analytics and AWS Lake Formation, whereas Apache Hadoop is most compared with Azure Data Factory, Microsoft Azure Synapse Analytics, Oracle Exadata, Snowflake and Oracle Big Data Appliance. See our Amazon Redshift vs. Apache Hadoop report.

    See our list of best Data Warehouse vendors and best Cloud Data Warehouse vendors.

    We monitor all Data Warehouse reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.