We performed a comparison between Apache Hadoop and Snowflake based on real PeerSpot user reviews.
Find out in this report how the two Data Warehouse solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."Since both Apache Hadoop and Amazon EC2 are elastic in nature, we can scale and expand on demand for a specific PoC, and scale down when it's done."
"The most important feature is its ability to handle large volumes. Some of our customers have really large volumes, and it is capable of handling their data in terms of the core volume and daily incremental volume. So, its processing power and speed are most valuable."
"Most valuable features are HDFS and Kafka: Ingestion of huge volumes and variety of unstructured/semi-structured data is feasible, and it helps us to quickly onboard a new Big Data analytics prospect."
"The most valuable features are powerful tools for ingestion, as data is in multiple systems."
"The most valuable feature is scalability and the possibility to work with major information and open source capability."
"Hadoop is extensible — it's elastic."
"I liked that Apache Hadoop was powerful, had a lot of tools, and the fact that it was free and community-developed."
"It's primarily open source. You can handle huge data volumes and create your own views, workflows, and tables. I can also use it for real-time data streaming."
"Can be leveraged with respect to better performance, auto tuning and competition."
"The most valuable feature has been the Snowflake data sharing and dynamic data masking."
"The ability to share the data and the ability to scale up and down easily are the most valuable features. The concept of data sharing and data plumbing made it very easy to provide and share data. The ability to refresh your Dev or QA just by doing a clone is also valuable. It has the dynamic scale up and scale down feature. Development and deployment are much easier as compared to other platforms where you have to go through a lot of stuff. With a tool like DBT, you can do modeling and transformation within a single tool and deploy to Snowflake. It provides continuous deployment and continuous integration abilities. There is a separation of storage and compute, so you only get charged for your usage. You only pay for what you use. When we share the data downstream with business partners, we can specifically create compute for them, and we can charge back the business."
"The overall ecosystem was easy to manage. Given that we weren't a very highly technical group, it was preferable to other things we looked at because it could do all of the cloud tunings. It can tune your data warehouse to an appropriate size for controlled billing, resume and sleep functions, and all such things. It was much more simple than doing native Azure or AWS development. It was stable, and their support was also perfect. It was also very easy to deploy. It was one of those rare times where they did exactly what they said they could do."
"It requires no maintenance on our part. They handle all that. The speed is phenomenal. The pricing isn't really anything more than what you would be paying for a SQL server license or another tool to execute the same thing. We have zero maintenance on our side to do anything and the speed at which it performs queries and loads the data is amazing. It handles unstructured data extremely well, too. So, if the data is in a JSON array or an XML, it handles that super well."
"As long as you don't need to worry about the storage or cost, this solution would be one of the best ones on the market for scalability purposes."
"I like the fact that we don't need a DBA. It automatically scales stuff."
"The speed of data loading and being able to quickly create the environment are most valuable."
"I think more of the solution needs to be focused around the panel processing and retrieval of data."
"The solution needs a better tutorial. There are only documents available currently. There's a lot of YouTube videos available. However, in terms of learning, we didn't have great success trying to learn that way. There needs to be better self-paced learning."
"It would be helpful to have more information on how to best apply this solution to smaller organizations, with less data, and grow the data lake."
"Since it is an open-source product, there won't be much support."
"In certain cases, the configurations for dealing with data skewness do not make any sense."
"It requires a great deal of learning curve to understand. The overall Hadoop ecosystem has a large number of sub-products. There is ZooKeeper, and there are a whole lot of other things that are connected. In many cases, their functionalities are overlapping, and for a newcomer or our clients, it is very difficult to decide which of them to buy and which of them they don't really need. They require a consulting organization for it, which is good for organizations such as ours because that's what we do, but it is not easy for the end customers to gain so much knowledge and optimally use it."
"Real-time data processing is weak. This solution is very difficult to run and implement."
"The solution is very expensive."
"The aspect of it that was more complicated was stored procedures. It does not support SQL language-based stored procedures. You have to write in JavaScript. If they supported SQL language and stored procedures, it would make migration from on-prem much simpler. In most cases, if an on-prem solution has stored procedures, they're usually written in SQL. They're not written as what most on-prem DBMS would refer to as an external stored procedure, which is what these feel like to most people because they're written in a language outside of SQL."
"The product's performance could be improved."
"Product activation queries can't be changed while executing."
"Portability is a big hurdle right now for our clients. Porting all of your existing SQL ecosystem, such as stored procedures, to Snowflake is a major pain point. Currently, Snowflake stored procedures use JavaScript, but they should support SQL-based stored procedures. It would be a huge advantage if you can write your stored procedures using SQL. It seems that they are working on this feature, and they are yet to release it. I remember seeing some notes saying that they were going to do that in the future, but the sooner this feature comes out, it would be better for Snowflake because there are a lot of clients with whom I'm interacting, and their main hurdle is to take their existing Oracle or SQL Server stored procedures and move them into Snowflake. For this, you need to learn JavaScript and how it works, which is not easy and becomes a little tricky. If it supports SQL-based procedures, then you can just cut-paste the SQL code, run it, and easily fix small issues."
"To ensure the proper functioning of Snowflake as an MDS, it relies heavily on other partner tools."
"For the Snowflake database, there should be some third-party features for the ETL. It would also be good to be able to use some kind of controls to get the data either from another database or a flat file. Its price should be improved. It should be cheaper than Microsoft."
"They do have a native connector to connect with integration tools for loading data, but it would be much better to have the functionality built-in."
"We would like to have an on-premises deployment option that has the same features, including scalability."
Apache Hadoop is ranked 5th in Data Warehouse with 34 reviews while Snowflake is ranked 1st in Data Warehouse with 94 reviews. Apache Hadoop is rated 7.8, while Snowflake is rated 8.4. The top reviewer of Apache Hadoop writes "Handles huge data volumes and create your own workflows and tables but you need to have deeper knowledge". On the other hand, the top reviewer of Snowflake writes "Good usability, good data sharing and elastic compute features, and requires less DBA involvement". Apache Hadoop is most compared with Azure Data Factory, Microsoft Azure Synapse Analytics, Oracle Exadata, Teradata and BigQuery, whereas Snowflake is most compared with BigQuery, Azure Data Factory, Teradata, Vertica and Teradata Cloud Data Warehouse. See our Apache Hadoop vs. Snowflake report.
See our list of best Data Warehouse vendors and best Cloud Data Warehouse vendors.
We monitor all Data Warehouse reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.
Apache Hadoop is for data lake use cases. But getting data out of Hadoop for meaningful analytics is indeed need quite an amount of work. by either using spark/Hive/presto and so on. The way i look at Snowflake and Hadoop is they complement each other. For data lake you can use hadoop and then for datawarehouse companies can use snowflake. Depending on the size of the company you can turn snowflake into a data lake use case too. Snowflake is SQL friendly and you don't need to carry out any circus to get the data in and out of snowflake.