What is the major difference between AWS Redshift and Snowflake in terms of performance, simplicity, scalability, stability and pricing?
Snowflake has a fundamentally different architecture in that compute and storage are completely separated allowing you to scale each dynamically and independently
This makes me to get into Snowflake, Almost I am using Snowflake last 8 months. Its awesome.
Although I verified it only in a specific case, I performed performance verification with Redshift, BigQuery, Snowflake.
Redshift has data redistribution occurred when searching under various conditions and performance was not good, but Snowflake holds data in small units called micro partitions, and also manages data for each column Therefore, operation like data redistribution was minimal and high performance was obtained.
Snowflake can also start multiple clusters in the same database, but has an architecture in which conflicts do not occur even when accessing the same data between clusters.
I recommend you to try it.
I am glad that you are already using it.
Wow, that is a loaded question and hard to answer and not sound like a sales pitch (I will try). For one, Snowflake has a fundamentally different architecture in that compute and storage are completely separated allowing you to scale each dynamically and independently. That is as you load data the space just expands - no need to add more clusters, extents, files etc. Likewise if you need more compute power, you can resize the compute clusters on the fly using a drop down in the UI to add more nodes while the process is running. No need to put it in read only mode, export the data, then import to a bigger cluster. This is only possible because of the separation of compute and storage. Most other architectures (based on legacy systems) have the compute and storage more tightly coupled. Along with this architecture you can create multiple independent compute clusters of differing sizes and assign each to different work groups or workloads (with access to the same single data store - no data replication required). With this each group gets its own dedicated compute resource such that what one does will not impact performance of the others.
This is all new code - not a refactoring of any other RDBMS code base so the founders were able to create features that take advantage of the elasticity of the cloud. In addition Snwoflake can ingest JSON data natively into relational table using a new data type designed to hold semi-structured data (which allows true schema on read using SQL).
Very stable - over 400 customers to date. Some with over 1 PB of data and hundreds of users. It also has built in security -256 bit AES encryption of all data in motion and at rest by default with no impact on query performance.
Pricing (I am not in sales!) - the pricing is public and on the website: https://www.snowflake.net/product/pricing/
You can check my post about my favorite features for more details: https://www.snowflake.net/top-10-cool-things-i-like-about-snowflake/
As for comparisons to RedShift you would have to talk with some real customers who have done the POCs with both of us. You might also look at some of the customer videos and case studies, but none of them really call out where we replaced RedShift as that tends to be kept confidential. https://www.snowflake.net/our-customers/
It seems that Snowflake is becoming very popular nowadays. I have done some training and live sessions by Snowflake.
I would like to hear your opinion about the main key parameters we should look for when choosing Snowflake over other cloud-based data lakes, such as costs, ease of use, maintenance, etc?
And what would be success criteria?