Apache Hadoop vs Snowflake comparison

Cancel
You must select at least 2 products to compare!
Apache Logo
2,765 views|2,378 comparisons
Snowflake Computing Logo
12,615 views|7,271 comparisons
Comparison Buyer's Guide
Executive Summary

We performed a comparison between Apache Hadoop and Snowflake based on real PeerSpot user reviews.

Find out in this report how the two Data Warehouse solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI.
To learn more, read our detailed Apache Hadoop vs. Snowflake Report (Updated: March 2024).
765,234 professionals have used our research since 2012.
Q&A Highlights
Question: What is the biggest difference between Apache Hadoop and Snowflake?
Answer: Interactive querying as a consumption pattern is something Snowflake handles much better than Hadoop and related query engine options - Impala, Presto, Drill etc. Heavy data scientists query workload can be an expensive query pattern on Snowflake and Hadoop can provide a more cost-efficient solution. Hadoop is also still relevant as a back-end data processing engine, instead of leveraging Snowflake for data transformation due to higher cost as well as limited procedural language capabilities (javascript based stored procedures). Snowflake fares much better than Hadoop in terms of administrative complexity.
Featured Review
Quotes From Members
We asked business professionals to review the solutions they use.
Here are some excerpts of what they said:
Pros
"Since both Apache Hadoop and Amazon EC2 are elastic in nature, we can scale and expand on demand for a specific PoC, and scale down when it's done.""High throughput and low latency. We start with data mashing on Hive and finally use this for KPI visualization.""Initially, with RDBMS alone, we had a lot of work and few servers running on-premise and on cloud for the PoC and incubation. With the use of Hadoop and ecosystem components and tools, and managing it in Amazon EC2, we have created a Big Data "lab" which helps us to centralize all our work and solutions into a single repository. This has cut down the time in terms of maintenance, development and, especially, data processing challenges.""The most important feature is its ability to handle large volumes. Some of our customers have really large volumes, and it is capable of handling their data in terms of the core volume and daily incremental volume. So, its processing power and speed are most valuable.""The most valuable feature is the database.""It's good for storing historical data and handling analytics on a huge amount of data.""Apache Hadoop can manage large amounts and volumes of data with relative ease, which is a feature that is beneficial.""Hadoop is designed to be scalable, so I don't think that it has limitations in regards to scalability."

More Apache Hadoop Pros →

"The cloning functionality has been the most valuable. I have been able to completely copy databases. The data sharing concept is also useful. As compared to, for example, SAP, Snowflake is a lot more open, and it allows a lot more connectivity for other providers than an SAP ecosystem.""The solution's computing time is less.""The speed of data loading and being able to quickly create the environment are most valuable.""It requires no maintenance on our part. They handle all that. The speed is phenomenal. The pricing isn't really anything more than what you would be paying for a SQL server license or another tool to execute the same thing. We have zero maintenance on our side to do anything and the speed at which it performs queries and loads the data is amazing. It handles unstructured data extremely well, too. So, if the data is in a JSON array or an XML, it handles that super well.""I like the ability to work with a managed service on the cloud and that is easy to start with.""The most valuable features are the clustering, LS50, being able to change the size, the pay per use feature, the flexibility with many different sources and analytic applications.""As long as you don't need to worry about the storage or cost, this solution would be one of the best ones on the market for scalability purposes.""Its performance is a big advantage. When you run a query, its performance is very good. The inbound and outbound share features are also very useful for sharing a particular database. By using these features, you can allow others to access the Snowflake database and query it, which is another advantage of this solution. It has good security, and we can easily integrate it. We can connect it with multiple source systems."

More Snowflake Pros →

Cons
"The main thing is the lack of community support. If you want to implement a new API or create a new file system, you won't find easy support.""The solution is very expensive.""I think more of the solution needs to be focused around the panel processing and retrieval of data.""In certain cases, the configurations for dealing with data skewness do not make any sense.""From the Apache perspective or the open-source community, they need to add more capabilities to make life easier from a configuration and deployment perspective.""The solution could use a better user interface. It needs a more effective GUI in order to create a better user environment.""Hadoop's security could be better.""The price could be better. I think we would use it more, but the company didn't want to pay for it. Hortonworks doesn't exist anymore, and Cloudera killed the free version of Hadoop."

More Apache Hadoop Cons →

"Their strategy is just to leverage what you've got and put Snowflake in the middle. It does work well with other tools. You have to buy a separate reporting tool and a separate data loading tool, whereas, in some platforms, these tools are baked in. In the long-term, they'll need to add more direct partnerships to the ecosystem so that it's not like adding on tools around Snowflake to make it work. They can also consider including Snowflake native reporting tools versus partnering with other reporting tools. It would kind of change where they sit in the market.""For the Snowflake database, there should be some third-party features for the ETL. It would also be good to be able to use some kind of controls to get the data either from another database or a flat file. Its price should be improved. It should be cheaper than Microsoft.""Sometimes it can be tricky to manage multiple environments if you're purely using Snowflake as your scripting and pipeline environment.""Snowflake needs to improve its programming part. Though the tool has Snowpath, it doesn’t support all features like its competitor, Databricks. Snowflake doesn’t support external data ingestion capabilities. You need to have third-party tools for that. Also, the tool needs to incorporate data integration features in its future releases.""Snowflake has support for stored procedures, but it is not that powerful.""I would like to see a client version of the GUI.""Snowflake could improve migration. It should be made easier. It would be beneficial if it could offer some OLTP features. One of our customers was using Oracle for both data warehousing and OLTP workloads, and they were able to migrate their data warehousing workloads to Snowflake without major issues. However, for some of their OLTP requirements, such as needing a response time of fewer than 10 milliseconds for certain queries, Snowflake is currently unable to provide that.""Snowflake could improve if they had an Operational Data Store(ODS) space."

More Snowflake Cons →

Pricing and Cost Advice
  • "Do take into consider that data storage and compute capacity scale differently and hence purchasing a "boxed" / 'all-in-one" solution (software and hardware) might not be the best idea."
  • "​There are no licensing costs involved, hence money is saved on the software infrastructure​."
  • "This is a low cost and powerful solution."
  • "The price of Apache Hadoop could be less expensive."
  • "If my company can use the cloud version of Apache Hadoop, particularly the cloud storage feature, it would be easier and would cost less because an on-premises deployment has a higher cost during storage, for example, though I don't know exactly how much Apache Hadoop costs."
  • "We don't directly pay for it. Our clients pay for it, and they usually don't complain about the price. So, it is probably acceptable."
  • "The price could be better. Hortonworks no longer exists, and Cloudera killed the free version of Hadoop."
  • "We just use the free version."
  • More Apache Hadoop Pricing and Cost Advice →

  • "Pricing can be confusing for customers."
  • "The whole licensing system is based on credit points. You can also make a license agreement with the company so that you buy credit points and then you use them. What you do not use in one year can be carried over to the next year."
  • "You pay based on the data that you are storing in the data warehouse and there are no maintenance costs."
  • "It is not cheap."
  • "The pricing for Snowflake is competitive."
  • "On average, with the number of queries that we run, we pay approximately $200 USD per month."
  • "Pricing is approximately $US 50 per DB. Terabyte is around $US 50 per month."
  • "The price of Snowflake is very reasonable."
  • More Snowflake Pricing and Cost Advice →

    report
    Use our free recommendation engine to learn which Data Warehouse solutions are best for your needs.
    765,234 professionals have used our research since 2012.
    Answers from the Community
    Miriam Tover
    it_user1274238 - PeerSpot reviewerit_user1274238 (Director at a tech services company with 10,001+ employees)
    User

    Apache Hadoop is for data lake use cases. But getting data out of Hadoop for meaningful analytics is indeed need quite an amount of work. by either using spark/Hive/presto and so on. The way i look at Snowflake and Hadoop is they complement each other. For data lake you can use hadoop and then for datawarehouse companies can use snowflake. Depending on the size of the company you can turn snowflake into a data lake use case too. Snowflake is SQL friendly and you don't need to carry out any circus to get the data in and out of snowflake.

    Questions from the Community
    Top Answer:Hadoop File System is compatible with almost all the query engines.
    Top Answer:The tool provides functionalities to deal with data skewness or a diverse set of data. There are some configurations that it usually provides. In certain cases, the configurations for dealing with… more »
    Top Answer:The best thing about Snowflake is its flexibility in changing warehouse sizes or computational power.
    Top Answer:The real-time streaming feature is limited with Snowflake and could be improved. Currently, Snowflake doesn't support unstructured data. With Snowflake, you need to be very particular about the type… more »
    Ranking
    5th
    out of 33 in Data Warehouse
    Views
    2,765
    Comparisons
    2,378
    Reviews
    10
    Average Words per Review
    539
    Rating
    8.0
    1st
    out of 33 in Data Warehouse
    Views
    12,615
    Comparisons
    7,271
    Reviews
    40
    Average Words per Review
    455
    Rating
    8.4
    Comparisons
    Also Known As
    Snowflake Computing
    Learn More
    Overview
    The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

    Snowflake is a cloud-based data warehousing solution for storing and processing data, generating reports and dashboards, and as a BI reporting source. It is used for optimizing costs and using financial data, as well as for migrating data from on-premises to the cloud. The solution is often used as a centralized data warehouse, combining data from multiple sources.

    Snowflake has helped organizations improve query performance, store and process JSON and XML, consolidate multiple databases into one unified table, power company-wide dashboards, increase productivity, reduce processing time, and have easy maintenance with good technical support.

    Its platform is made up of three components:

    1. Cloud services - Snowflake uses ANSI SQL to empower users to optimize their data and manage their infrastructure, while Snowflake handles the security and encryption of stored data.
    2. Query processing - Snowflake's compute layer is made up of virtual cloud data warehouses that let you analyze data through requests. Each of the warehouses does not compete for computing resources, nor do they affect the performance of each other.
    3. Database storage - Snowflake automatically manages all parts of the data storage process, including file size, compression, organization, structure, metadata, and statistics.

    Snowflake has many valuable vital features. Some of the most useful ones include:

    • Snowflake architecture provides nearly unlimited scalability and high speed because it uses a single elastic performance engine. The solution also supports unlimited concurrent users and workloads, from interactive to batch.
    • Snowflake makes automation easy and enables enterprises to automate data management, security, governance, availability, and data resiliency.
    • With seamless cross-cloud and cross-region connections, Snowflake eliminates ETL and data silos. Anyone who needs access to shared secure data can get a single copy via the data cloud. In addition, Snowflake makes remote collaboration and decision-making fast and easy via a single shared data source.
    • Snowflake’s Data Marketplace offers third-party data, which allows you to connect with Snowflake customers to extend workflows with data services and third-party applications.

    There are many benefits to implementing Snowflake. It helps optimize costs, reduce downtime, improve operational efficiency, and automate data replication for fast recovery, and it is built for high reliability and availability.

      Below are quotes from interviews we conducted with users currently using the Snowflake solution:

      Sreenivasan R., Director of Data Architecture and Engineering at Decision Minds, says, "Data sharing is a good feature. It is a majorly used feature. The elastic computing is another big feature. Separating computing and storage gives you flexibility. It doesn't require much DBA involvement because it doesn't need any performance tuning. We are not doing any performance tuning, and the entire burden of performance and SQL tuning is on Snowflake. Its usability is very good. I don't need to ramp up any user, and its onboarding is easier. You just onboard the user, and you are done with it. There are simple SQL and UI, and people are able to use this solution easily. Ease of use is a big thing in Snowflake."

      A director of business operations at a logistics company mentions, "It requires no maintenance on our part. They handle all that. The speed is phenomenal. The pricing isn't really anything more than what you would be paying for a SQL server license or another tool to execute the same thing. We have zero maintenance on our side to do anything and the speed at which it performs queries and loads the data is amazing. It handles unstructured data extremely well, too. So, if the data is in a JSON array or an XML, it handles that super well."

      A Solution Architect at a wholesaler/distributor comments, "The ability to share the data and the ability to scale up and down easily are the most valuable features. The concept of data sharing and data plumbing made it very easy to provide and share data. The ability to refresh your Dev or QA just by doing a clone is also valuable. It has the dynamic scale up and scale down feature. Development and deployment are much easier as compared to other platforms where you have to go through a lot of stuff. With a tool like DBT, you can do modeling and transformation within a single tool and deploy to Snowflake. It provides continuous deployment and continuous integration abilities. There is a separation of storage and compute, so you only get charged for your usage. You only pay for what you use. When we share the data downstream with business partners, we can specifically create compute for them, and we can charge back the business."

      Sample Customers
      Amazon, Adobe, eBay, Facebook, Google, Hulu, IBM, LinkedIn, Microsoft, Spotify, AOL, Twitter, University of Maryland, Yahoo!, Cornell University Web Lab
      Accordant Media, Adobe, Kixeye Inc., Revana, SOASTA, White Ops
      Top Industries
      REVIEWERS
      Financial Services Firm40%
      Comms Service Provider27%
      Hospitality Company7%
      Consumer Goods Company7%
      VISITORS READING REVIEWS
      Financial Services Firm27%
      Computer Software Company10%
      Comms Service Provider6%
      University6%
      REVIEWERS
      Computer Software Company29%
      Financial Services Firm20%
      Healthcare Company6%
      Manufacturing Company6%
      VISITORS READING REVIEWS
      Educational Organization26%
      Financial Services Firm13%
      Computer Software Company10%
      Manufacturing Company6%
      Company Size
      REVIEWERS
      Small Business35%
      Midsize Enterprise24%
      Large Enterprise41%
      VISITORS READING REVIEWS
      Small Business15%
      Midsize Enterprise10%
      Large Enterprise75%
      REVIEWERS
      Small Business24%
      Midsize Enterprise20%
      Large Enterprise55%
      VISITORS READING REVIEWS
      Small Business15%
      Midsize Enterprise33%
      Large Enterprise52%
      Buyer's Guide
      Apache Hadoop vs. Snowflake
      March 2024
      Find out what your peers are saying about Apache Hadoop vs. Snowflake and other solutions. Updated: March 2024.
      765,234 professionals have used our research since 2012.

      Apache Hadoop is ranked 5th in Data Warehouse with 31 reviews while Snowflake is ranked 1st in Data Warehouse with 92 reviews. Apache Hadoop is rated 7.8, while Snowflake is rated 8.4. The top reviewer of Apache Hadoop writes "A file system for data collection that contains needed information and files". On the other hand, the top reviewer of Snowflake writes "Good usability, good data sharing and elastic compute features, and requires less DBA involvement". Apache Hadoop is most compared with Microsoft Azure Synapse Analytics, Azure Data Factory, Oracle Exadata, Teradata and BigQuery, whereas Snowflake is most compared with BigQuery, Azure Data Factory, Teradata, Vertica and Teradata Cloud Data Warehouse. See our Apache Hadoop vs. Snowflake report.

      See our list of best Data Warehouse vendors and best Cloud Data Warehouse vendors.

      We monitor all Data Warehouse reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.