We just raised a $30M Series A: Read our story

Amazon Redshift OverviewUNIXBusinessApplication

Amazon Redshift is #4 ranked solution in top Cloud Data Warehouse tools. IT Central Station users give Amazon Redshift an average rating of 8 out of 10. Amazon Redshift is most commonly compared to Snowflake: Amazon Redshift vs Snowflake. The top industry researching this solution is Computer Software Company, accounting for 30% of all views.
What is Amazon Redshift?

Amazon Redshift is a fast and powerful, fully managed, petabyte-scale data warehouse service in the cloud. Customers can start small for just $0.25 per hour with no commitments or upfront costs and scale to a petabyte or more for $1,000 per terabyte per year, less than a tenth of most other data warehousing solutions.

Traditional data warehouses require significant time and resource to administer, especially for large datasets. In addition, the financial cost associated with building, maintaining, and growing self-managed, on-premise data warehouses is very high. Amazon Redshift not only significantly lowers the cost of a data warehouse, but also makes it easy to analyze large amounts of data very quickly.

Amazon Redshift gives you fast querying capabilities over structured data using familiar SQL-based clients and business intelligence (BI) tools using standard ODBC and JDBC connections. Queries are distributed and parallelized across multiple physical resources. You can easily scale an Amazon Redshift data warehouse up or down with a few clicks in the AWS Management Console or with a single API call. Amazon Redshift automatically patches and backs up your data warehouse, storing the backups for a user-defined retention period. Amazon Redshift uses replication and continuous backups to enhance availability and improve data durability and can automatically recover from component and node failures. In addition, Amazon Redshift supports Amazon Virtual Private Cloud (Amazon VPC), SSL, AES-256 encryption and Hardware Security Modules (HSMs) to protect your data in transit and at rest.

As with all Amazon Web Services, there are no up-front investments required, and you pay only for the resources you use. Amazon Redshift lets you pay as you go. You can even try Amazon Redshift for free.

Amazon Redshift Buyer's Guide

Download the Amazon Redshift Buyer's Guide including reviews and more. Updated: October 2021

Amazon Redshift Customers

Liberty Mutual Insurance, 4Cite Marketing, BrandVerity, DNA Plc, Sirocco Systems, Gainsight, Blue 449

Amazon Redshift Video

Archived Amazon Redshift Reviews (more than two years old)

Filter by:
Filter Reviews
Industry
Loading...
Filter Unavailable
Company Size
Loading...
Filter Unavailable
Job Level
Loading...
Filter Unavailable
Rating
Loading...
Filter Unavailable
Considered
Loading...
Filter Unavailable
Order by:
Loading...
  • Date
  • Highest Rating
  • Lowest Rating
  • Review Length
Search:
Showingreviews based on the current filters. Reset all filters
SV
Head of Analytics at a tech services company with 10,001+ employees
Real User
A solution with a straightforward setup, improved stability and good flexibility

What is our primary use case?

We primarily use the solution for analytics.

What is most valuable?

The solution's flexibility is its most valuable feature. It's also easy to scale and has relatively painless pricing. 

What needs improvement?

The speed of the solution and its portability needs improvement.

For how long have I used the solution?

I've been using the solution since 2006.

What do I think about the stability of the solution?

The solution is getting more stable with each new version. It's stable now, but it can always continue to improve.

What do I think about the scalability of the solution?

The solution can scale easily.

How are customer service and technical support?

I've never been in touch with technical support.

How was the initial setup?

The…

What is our primary use case?

We primarily use the solution for analytics.

What is most valuable?

The solution's flexibility is its most valuable feature. It's also easy to scale and has relatively painless pricing. 

What needs improvement?

The speed of the solution and its portability needs improvement.

For how long have I used the solution?

I've been using the solution since 2006.

What do I think about the stability of the solution?

The solution is getting more stable with each new version. It's stable now, but it can always continue to improve.

What do I think about the scalability of the solution?

The solution can scale easily.

How are customer service and technical support?

I've never been in touch with technical support.

How was the initial setup?

The initial setup was straightforward.

What other advice do I have?

We use public and hybrid cloud deployment models. We are Amazon partners.

The solution itself is very popular. Many people use it these days.

I recently went to an AWS summit in Zurich. I was very impressed by AWS and the presentation. Their solutions are very good.

I'd rate this solution eight out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Mediha Šiljić
Chief Information Officer at Sensilab
Real User
Easy to set up and easy to connect the many tools that connect to it

What is our primary use case?

We are using the private cloud model of this solution. Our primary use case is for a data warehouse for BI.

What is most valuable?

The most valuable features are that it's easy to set up and easy to connect the many tools that connect to it.

What needs improvement?

Compatibility with other products, for example, Microsoft and Google, is a bit difficult because each one of them wants to be isolated with their solutions. That's a big problem now.

For how long have I used the solution?

I have been using Redshift for around eight to nine months.

What do I think about the stability of the solution?

It is stable.

What do I think about the scalability of the solution?

Scalability is okay. It's easily scalable. We don't have any plans to increase…

What is our primary use case?

We are using the private cloud model of this solution. Our primary use case is for a data warehouse for BI.

What is most valuable?

The most valuable features are that it's easy to set up and easy to connect the many tools that connect to it.

What needs improvement?

Compatibility with other products, for example, Microsoft and Google, is a bit difficult because each one of them wants to be isolated with their solutions. That's a big problem now.

For how long have I used the solution?

I have been using Redshift for around eight to nine months.

What do I think about the stability of the solution?

It is stable.

What do I think about the scalability of the solution?

Scalability is okay. It's easily scalable. We don't have any plans to increase usage at the moment. We currently have two users directly using this solution. Indirectly we have around 50 users.

We require two staff members for maintenance and others are just consuming data from it.

How are customer service and technical support?

There hasn't been a need to contact technical support at this point. We haven't had any technical issues. 

How was the initial setup?

The initial setup was straightforward. The deployment took a few hours. 

What about the implementation team?

We integrated it ourselves. 

What was our ROI?

We have seen ROI. It's been useful.

What's my experience with pricing, setup cost, and licensing?

It's around $200 US dollars. There are some data transfer costs but it's minimal, around $20.

What other advice do I have?

I would rate it a ten out of ten. 

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Learn what your peers think about Amazon Redshift. Get advice and tips from experienced pros sharing their opinions. Updated: October 2021.
543,424 professionals have used our research since 2012.
Nir Wasserman
BI Manager at jfrog
Real User
Allows you write complex queries and perform row by row processes

What is our primary use case?

We use it to build a data warehouse and a centralized location for all of our data sources, allowing for in-depth analysis by using SQL queries.

How has it helped my organization?

Allows for the storage of huge amounts of data.  Assists users to perform ad hoc analysis on a lot sources together.

What is most valuable?

Windows functions, such as LEAD and LAG. Allows you write complex queries and perform row by row processes.

What needs improvement?

In the next release, a pivot function would be a big help. It could save a lot of time creating a query or process to handle operations.

For how long have I used the solution?

Three to five years.

What is our primary use case?

We use it to build a data warehouse and a centralized location for all of our data sources, allowing for in-depth analysis by using SQL queries.

How has it helped my organization?

  • Allows for the storage of huge amounts of data. 
  • Assists users to perform ad hoc analysis on a lot sources together.

What is most valuable?

  • Windows functions, such as LEAD and LAG.
  • Allows you write complex queries and perform row by row processes.

What needs improvement?

In the next release, a pivot function would be a big help. It could save a lot of time creating a query or process to handle operations.

For how long have I used the solution?

Three to five years.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user869871
Principal Consultant at Inawisdom
User
Easy to load and reload data. Retaining data long-term should be cheaper.

What is our primary use case?

Storing and querying data in near real time. Loading millions of raw CSV records. Running data comparisons and queries, then shutting it all down all within a few hours, once a week.

How has it helped my organization?

Easy to load and reload data.

What is most valuable?

Fast load times  Flexibility in column definitions The ability to reload data multiple times at different times.

What needs improvement?

It would be nice if it was a bit cheaper to retain data long-term.  Should be made available across zones, like other Multi-AZ solutions.

For how long have I used the solution?

Less than one year.

What's my experience with pricing, setup cost, and licensing?

Per hour pricing is helpful to keep the costs of a pilot down, but long-term…

What is our primary use case?

  • Storing and querying data in near real time.
  • Loading millions of raw CSV records.
  • Running data comparisons and queries, then shutting it all down all within a few hours, once a week.

How has it helped my organization?

Easy to load and reload data.

What is most valuable?

  • Fast load times 
  • Flexibility in column definitions
  • The ability to reload data multiple times at different times.

What needs improvement?

  • It would be nice if it was a bit cheaper to retain data long-term. 
  • Should be made available across zones, like other Multi-AZ solutions.

For how long have I used the solution?

Less than one year.

What's my experience with pricing, setup cost, and licensing?

Per hour pricing is helpful to keep the costs of a pilot down, but long-term retention is expensive.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user705738
Senior Solutions Engineer, West at a tech vendor with 5,001-10,000 employees
Vendor
It helped my customers migrate off on-premise platforms

Pros and Cons

  • "Redshift COPY command, because much of my work involved helping customers migrate large amounts of data into Redshift."
  • "Migrating data from other data sources can be challenging when you are working with multibyte character sets."

What is most valuable?

Redshift COPY command, because much of my work involved helping customers migrate large amounts of data into Redshift.

How has it helped my organization?

It helped my customers migrate off on-premise platforms such as Teradata to Redshift, at a fraction of the cost.

What needs improvement?

There are challenges with dealing with character set mismatches. Migrating data from other data sources can be challenging when you are working with multibyte character sets.

For how long have I used the solution?

Two years.

What do I think about the stability of the solution?

No.

What do I think about the scalability of the solution?

I personally haven’t hit scalability issues but at dinner a year ago with a few of my existing customers (all Fortune 500 companies), I was told there are scalability issues once you get to 32-nodes.

One of my previous customers told me they were migrating off Redshift because they hit the ceiling and had scalability issues. They told me the responsiveness they were getting was inferior to alternative solutions once your Redshift gets to a specific size.

How are customer service and technical support?

I never utilized AWS technical support.

Which solution did I use previously and why did I switch?

I’ve helped customers migrate off Teradata, SQL Server , Oracle Exadata, Greenplum, and ParAccel Matrix to Redshift. Some due to cost savings, others because of the EOL of the product.

How was the initial setup?

Setup of Redshift infrastructure is pretty straightforward. I’ve been told that setting up partitions can be tricky in order to ensure good performance.

What's my experience with pricing, setup cost, and licensing?

I have nothing to add here as I wasn’t involved in this part of the process. However, one of my customers went with Google Big Query over Redshift because it was significantly cheaper for their project.

Which other solutions did I evaluate?

I only provided advice to my customers, but some looked at Azure SQL DW , Greenplum, Netezza, and Google Big Query as possible alternatives

What other advice do I have?

Be careful with vendor lock-in! You cannot move your Redshift environment to a different cloud provider or to an on-premise solution.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
Manager Data Services at a logistics company with 201-500 employees
Real User
Top 20
Gives us the ability to increase space requirements

Pros and Cons

  • "Easy to build out our snowflake design and load data."
  • "It would be nice if we could turn off an instance. However, it would retain the instance in history, thus allowing us to restart without beginning from scratch."

What is most valuable?

  • Easy to build out our snowflake design and load data
  • Ability to dynamically increase space requirements
  • Good speed
  • Extremely reliable

How has it helped my organization?

We use Redshift as our primary BI data repository. It provides BI for all four of our product lines, which run on Redshift. It is low cost and highly reliable!

What needs improvement?

It would be nice if we could turn off an instance. However, it would retain the instance in history, thus allowing us to restart without beginning from scratch.

For how long have I used the solution?

We have been using this a little over a year.

What was my experience with deployment of the solution?

It is pretty easy to learn and use. If you've worked with SQL before, you won't have any problems. Also, it works very well with Talend, which is our ETL tool.

What do I think about the stability of the solution?

No issues of stability at this point.

What do I think about the scalability of the solution?

No issues of scalability at this point.

How are customer service and technical support?

Customer Service:

I don't know that we've ever had to contact AWS for Redshift assistance. Everything is straightforward.

Technical Support:

Not applicable.

Which solution did I use previously and why did I switch?

This was our first foray into a data warehouse and customer facing BI.

How was the initial setup?

We went into the project with a design in mind and Redshift provided a great working platform. Our ETL tool helped us out in building the metadata and Birst built out many of the necessary tables.

What about the implementation team?

The primary implementation was done in-house, with assistance from Birst during the Birst installation.

What was our ROI?

The corporate expectation is to break-even after the first year of selling BI to our customer base. We'll know this by end of Q2, 2018. On the positive side, the addition of BI has helped close multiple sales, so we're beating the competitors with our BI tools (running on Redshift).

What's my experience with pricing, setup cost, and licensing?

BI is sold to our customer base as a part of the initial sales bundle. A customer may elect to opt for a white labeled site for an up-charge.

Which other solutions did I evaluate?

We pretty much went with Redshift, as the company migrated everything to AWS. We might have looked at other database options, but we did not put much time into it.

What other advice do I have?

Plan out your our DB design in advance and test your theories on running a small instance first. Use a good ETL tool, like Talend, so updates can be scheduled easily. Don't try to write these from scratch. Redshift has been a great DB for us to date. We haven't seen any slowdowns or outages!

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Nir Wasserman
BI Manager at jfrog
Real User
You can copy JSON to the column and have it analyzed using simple functions

Pros and Cons

  • "You can copy JSON to the column and have it analyzed using simple functions."
  • "It lacks a few features which can be very useful, such as stored procedures"

What is most valuable?

The features I find valuable in Redshift are JSON format support. You can copy JSON to the column and have it analyzed using simple functions. Second, is the parallel off/on where you can choose if you want it to unload to split files or into one file.

How has it helped my organization?

Since we have lots of data sources and high volumes, we needed a unified and organized DB that can handle these amounts and will be our single source of truth for the organization. Therefore, Redshift is the best solution.

What needs improvement?

It lacks a few features which can be very useful, such as stored procedures, Also, one needs to perform Vacuum in order to manage this DB. It would be nice not to worry about that and have this manageable.

For how long have I used the solution?

Three years.

What do I think about the stability of the solution?

Yes. Sometimes, for some reason, Redshift is down (not due to maintenance).

What do I think about the scalability of the solution?

No, cause we know how to use Redshift. We have a cluster of both HDD and SSD for which we keep the maximum data in each, so it would be scalable.

How is customer service and technical support?

Great. They are available and very helpful.

How was the initial setup?

Initial setup is very straightforward, very easy. No need of any side help.

What's my experience with pricing, setup cost, and licensing?

If you want to think of every query you make but want to know that your nodes are fully managed, then use BigQuery Data Analytics. If you want a fixed price, an to not worry about every query, but you need to manage your nodes personally, use Redshift.

Which other solutions did I evaluate?

I did not. we did consider using BigQuery Data Analytics, but eventually, we decided to use Redshift.

What other advice do I have?

My rating would be 8.5. This a great product, but one still needs to know how to manage clusters and nodes.

In order to make your DB scalable and reliable. it has the greatest benefit of build on PostgreSQL, so any data specialist that has SQL experience can handle Redshift.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user689532
Full Stack Engineer at a tech services company with 11-50 employees
Consultant
Valuable features are performance, data compression, and scalability. Query compilation time needs a lot of improvement.

Pros and Cons

  • "The valuable features are performance, data compression, and scalability."
  • "Query compilation time needs a lot of improvement for cases where you are generating queries dynamically."

What is most valuable?

The valuable features are performance, data compression, and scalability.

What needs improvement?

Query compilation time needs a lot of improvement for cases where you are generating queries dynamically. Also, it would help tremendously to have some more user-friendly, query optimization helper tools.

For how long have I used the solution?

We have been using the solution for 24 months now.

What do I think about the stability of the solution?

We have not faced any stability related issues so far.

What do I think about the scalability of the solution?

The time it takes to scale the cluster up or down is not trivial and it can take a while. In case you need to do this fast, you will need to think about other solutions.

How are customer service and technical support?

Apart from the official documentation, we haven't had the need to reach out to technical support yet. The quality of the documentation is very good. There are a lot of very useful articles from the community.

Which solution did I use previously and why did I switch?

Previously, we were using AWS RDS for our use case. We found that we had outgrown it. Our data grew in size and we wanted to still have performance queries.

How was the initial setup?

The initial setup of the cluster was pretty straightforward. The following step, setting the right table configuration, was not so straightforward, though. It required an understanding of how the product works. Sort and distribution keys are required concepts to know about.

What's my experience with pricing, setup cost, and licensing?

Redshift is very cost effective for a cloud based solution if you need to scale it a lot. For smaller data sizes, I would think about using other products.

Which other solutions did I evaluate?

We were thinking about using a self-managed PostgreSQL. We chose Redshift because we didn't need to manage it ourselves and because it integrates with the rest of the AWS services more fluently.

We are currently evaluating Druid.

What other advice do I have?

It is very important to understand how Redshift is designed to work. The database schema design is not trivial and requires an in-depth knowledge about it, especially if your use-case requires it to perform well.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Padmanesh NC
Big Data Solution Architect - Spatial Data Specialist at Sciera, Inc.
Real User
Top 5
It processes petabytes of data and supports many file formats. Restoring huge snapshots takes too long.

What is most valuable?

Scalability: Ability to load huge number of datasets (I have experience with petabytes of data) and process those things. Storage is not limited. We can increase whatever we want.

Performance: The distributed architecture of Redshift has the capacity to process the workflow in a different cluster and coordinate those things in the leader node, making the process much faster.

Flexibility: This feature is helpful for user to increase the node size and config depending on their need. There is no need to wait for hardware to be in place whenever we increase the dataset. Redshift provides the option to increase the node or cluster size whenever required.

Multi-formatted accessibility: The Redshift engine has the capability to read the following file formats: CSV, DELIMITER, FIXEDWIDTH, AVRO, JSON, BZIP2, GZIP, LZOP. The user can choose which is best for their requirements.

VPC configuration: VPC configuration secures our dataset, which we keep inside the Redshift cluster. This VPC config doesn’t allow any third party in or out bound against firewall.

Python UDF calls: This is useful for a user to create their own user-defined function through Python and import that class into Redshift and process the dataset.

How has it helped my organization?

We were using MySQL & MongoDB for our regular operations, but when we grew, we were forced to handle a huge number of datasets. It could be petabytes of data in and out on a regular basis. We struggled a lot to complete the operations in a timely manner. With Amazon Redshift, we gained a lot in terms of timing, as well as project completion.

Some of the scoring mechanism really works well in the distributed architecture of Amazon Redshift.

What needs improvement?

Of course, every product has pluses and minuses. From that perspective, Amazon Redshift has some issues with snapshot restoring when we handle huge datasets. When our snapshot size is really huge, like 20 TB+, we are forced to wait a long time to get it restored. This is reasonable, as they need to transfer the entire dataset to the cluster.

My thought on this issue is that Amazon has their own data centers and they are connecting each region of storage through Direct Connect. The input and output network data transfer might not be a complex thing. For example, if they used 10 Gbps network transfer, they can transfer 1 TB in less than two minutes, but that’s not happening now. To restore 1 TB of data, it takes more than 30-40 minutes.

For how long have I used the solution?

I have used it for the last 3.5 Years.

I am using Amazon Redshift for big data mapping and data aggregation.

We are using most of their products. Specifically, we are using their dedicated data-centre service (Direct Connect). We are using Amazon products such as Amazon EC2, S3, SQS, EMR, ML, CloudWatch, Redshift, DynamoDB, etc., for more than 10-12 years.

What do I think about the stability of the solution?

I have encountered stability issues. A few weeks ago, I encountered an issue with hardware failure and database health status failure. When we face these kind of issues, we can't do anything from our side until the Amazon technical team finds the issue and rectifies it. It takes time to get resolved. If we are in a rush to deliver something for a client and encountered these issue, we are really screwed.

What do I think about the scalability of the solution?

Ofcourse. When the amount of data that we handle in the cluster grew, we need to increase the cluster or node size. Apparently, the size of node or cluster increases the hold time for synchronizing the data (meta data) with the node manager. The initial time increases when we start the cluster.

How are customer service and technical support?

Customer Service:

Customer Service good. But couldn't make direct call to customer service many times. I could catch them through their web UI rather making direct call.

Technical Support:

Technical support is really great, but it’s paid support. The Basic Support plan doesn't have the option for technical support. It’s only providing billing support.

Which solution did I use previously and why did I switch?

I have experience working in Hadoop as well. When I compare the two (Redshift & Hadoop), Redshift is more user friendly in terms of configuration and maintenance.

How was the initial setup?

The initial setup of Amazon Redshift is so simple and straightforward. We do not need to read or understand any of the technical documentation. Simply said, it’s a plug-and-play service or platform.

What about the implementation team?

I have implemented through in-house.

What was our ROI?

In terms of ROI, I can't directly convert to it. Because we are not using only Redshift. We are using multiple product to increase our revenue and decrease time consumption. So It's difficult to calculate ROI of Redshift usage.

What's my experience with pricing, setup cost, and licensing?

Pricing and licensing is so important. In terms of pricing, it's bit high, as they are using standard hardware. My advice to users is: We need to start the cluster when we require it. At the end of the workday, we can just snapshot the clusters and shut them down. And then we restore those snapshots when we need them back. That way, we are charged only for usage rather than spending money on wait time or sleep.

Which other solutions did I evaluate?

I evaluated Hadoop and Spark, along with Redshift. I have no negative comments about those other products. Redshift is flexible in terms of configuration, maintenance and security, especially VPC configuration, which secures our data a lot.

What other advice do I have?

Use this product for huge data mapping or aggregation. Use Redshift through VPC to keep their data very secure and for a long time.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user396519
Director at a tech company with 1,001-5,000 employees
Vendor
Columnar-storage databases leverage the Massively Parallel Processing (MPP) capabilities of its data warehouse architecture.

What is most valuable?

  • Performance: Very fast query performance due to columnar-storage databases that leverage the Massively Parallel Processing (MPP) capabilities of its data warehouse architecture.
  • Petabyte-scale data warehouse, without any loss in performance and low cost: One of our existing customers stores more than 500 terabytes of data in an AWS Redshift database and the warehouse performance was good. We want to highlight that even if the warehouse size increases to petabytes, Redshift would still work fine and there wouldn’t be any performance issues and would cost less also.

How has it helped my organization?

The end users were able to have access to real-time analytics.

What needs improvement?

We would really like to see a few more connectors included that would enable connecting with other databases and services. We have faced some difficulties pulling data from Teradata and storing it in Redshift. There is no direct connector available between Teradata and Redshift.

For how long have I used the solution?

We are working with this product for the past 24 months.

What do I think about the stability of the solution?

We have not faced any stability related issue so far.

What do I think about the scalability of the solution?

We did not encounter any scalability issues in the last 24 months that we have been working with Redshift.

How are customer service and technical support?

We actually had to reach out to technical support a few times and they were really helpful and solved our problems. We would give it 4/5.

Which solution did I use previously and why did I switch?

We were using an on-premise MySQL data warehouse. To reduce the cost and improve scalability, we switched to a cloud version of data warehouse databases.

How was the initial setup?

Initial setup and configuration was pretty straightforward. First, we needed to create a Redshift cluster. Once the cluster was created, we created a database schema based on our need in the Redshift cluster.

What's my experience with pricing, setup cost, and licensing?

AWS Redshift is one of the fastest and most cost-effective cloud-based databases. They have charged $3330 per TB/year for the ds2.8x large instances which have 244 GB RAM, 36-core CPU, 10Gbps network and 16 TB HDD.

What other advice do I have?

You need to design the database structure with best sort and distribution keys, along with primary and foreign keys.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user576450
Data Science Lead at a tech services company with 51-200 employees
Consultant
The PostgreSQL interface is good because you can play with big data with just SQL.

What is most valuable?

The valuable features are:

  • PostgreSQL Interface
  • Scalability
  • Pricing/Maintenance/Setup

The PostgreSQL interface is good because you can play with big data with just SQL. This is one of the reasons why they made Hive.

However, Hive’s SQL is still not as standard as what Redshift provides:
http://docs.aws.amazon.com/red...

How has it helped my organization?

Redshift has been the data warehouse in at least three of my previous companies. The impact is huge to anyone who uses data in any way.

What needs improvement?

I would like to see improvements in the database integrations. Currently, Amazon does not provide real-time/near real-time integration with other products like RDS or DynamoDB out-of-the-box.

We need to either build the integrations ourselves, or rely on third-party services which are not always the best.

For how long have I used the solution?

We have been using this solution for over three years.

What do I think about the stability of the solution?

There were stability issues in the beginning. However, the product has improved quite a lot in the last two years in term of stability.

What do I think about the scalability of the solution?

Redshift can scale up to a petabyte with a few simple clicks.

How are customer service and technical support?

Technical support is good, but similar to any other Amazon Web Service, you have to pay for a good level of technical support.

Which solution did I use previously and why did I switch?

We did not have a previous solution. Redshift worked for us the first time we tried. The pricing could not be beaten by anything else in the market at that time.

How was the initial setup?

The installation was straightforward and only required a few clicks.

What's my experience with pricing, setup cost, and licensing?

Pricing was quite a strong point of Redshift when it was first released. Nowadays, quite a number of other services are very competitive in pricing, such as BigQuery.

What other advice do I have?

Redshift, like any other big data technology, isn’t a silver bullet for everything. The most important thing is to understand your data and your requirements before you make any decision to use any technology.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user576444
Rails Developer at a recruiting/HR firm with 51-200 employees
Vendor
It's based on PostgreSQL, is a managed solution, and has low price per terabyte per year.

What is most valuable?

  • It is based on PostgreSQL.
  • It’s managed. Meaning, AWS takes care of handling infrastructure, deployments, encryption, and uptime for you.
  • It’s cheap when you consider the price per terrabyte per year.
  • It’s integrated into the AWS stack.

How has it helped my organization?

At my previous company that does mobile analytics as its core product, we moved all the analytics backend from MongoDB to Redshift. Where I currently work, we use it as our main data lake/data warehouse.

What needs improvement?

While It's probably the best product of its category (managed SQL-based data warehouse at scale), it has a few shortcomings, although very few.

The main issue people complain about, and I agree with the claim, is that it's hard to load your data into it. You need to first export your data on S3 as CSV, JSON or AVRO. Then you can load it into Redshift. And even then, you have to make sure your data is properly formatted. (you can use the copy options: TRUNCATECOLUMNS to load fields that are too big, and MAXERROR to allow for a given number of errors while loading). In general, ETL and data cleaning is a hurdle in data engineering, and Redshift suffers from it.

For how long have I used the solution?

I have used Redshift for three years.

What do I think about the stability of the solution?

I once had an issue because my data contained a Unicode NULL character in a VARCHAR field ("\u0000"). The AWS support has been very quick and helpful to respond. Other than that, I have had no issues whatsoever.

What do I think about the scalability of the solution?

No scalability issues whatsoever.

How are customer service and technical support?

Technical support is very good.

Which solution did I use previously and why did I switch?

At my previous company, we switched from MongoDB to Redshift. The main reason was price and performance. At my current company, we started a data warehouse (greenfield project). The choice was between Google BigQuery and AWS Redshift. The main criteria was that Redshift was PostgreSQL-based and supports CTE and Window functions (PostgreSQL features).

How was the initial setup?

The big part when using Redshift is setting up the ETLs and doing the data cleaning. It was very hard when moving from MongoDB, because I had to re-discover our data schema (that had no spec). With that said, in both cases (moving from MongoDB and starting from scratch), I had a prototype up in about a day. By that I mean that I had the most important parts of my data loaded into Redshift and I could query it.

What's my experience with pricing, setup cost, and licensing?

The pricing page is explicit. Choose what suits your needs in terms of storage and performance.

Which other solutions did I evaluate?

For setting up a data warehouse, BigQuery was a serious contender. BigQuery is simpler to setup and scale. It's also more of a black box: you worry less what's inside and how it scales and you get charged for what you consume (which is both a pro and a con). With Redshift, you choose in advance the type of machine you want, like EC2 (resizing your cluster is easy).

What other advice do I have?

If you evaluate Redshift, chances are that you should evaluate BigQuery too. So take the time to weigh the pro and cons of each (plenty has been written online about that).

Take a look at the reserved instances pricing. It is very advantageous if you know you will stick with Redshift for some time.

Take the time to learn PostgreSQL (eg: https://www.pgexercises.com/). Redshift, while based on PostgreSQL 8.0, supports a good number of advanced Postgres features.

Do not be afraid of joins. PostgreSQL is performs very well in this regard.
If you need performance, have a look at the suggested optimizations in the official documentation (such as setting up the correct distkeys, sortkeys and compression schemes).

Understand that Redshift has no indexes.

Understand that Redshift is an analytical database with columnar storage, and that it does not enforce constraints.

Redshift plays very well with a PostgreSQL instance in RDS linked to it via DBLINK (see this guide: https://aws.amazon.com/blogs/big-data/join-amazon-redshift-and-amazon-rds-postgresql-with-dblink/). I've used this in production at my current company, and this is tremendously useful. You can have your raw data in Redshift and aggregate it directly into RDS. To do this, insert into RDS what you select from Redshift through the dblink.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user576456
Manager BI Development at a comms service provider with 1,001-5,000 employees
Vendor
The fact that it stores data using a columnar approach allows us to use columns in join conditions.

What is most valuable?

Redshift gives extremely fast response involving large tables. This is the most important feature I look for in data warehouse solutions. Often you came across use cases where it is not possible to distribute data on a certain column, yet you need this column in join conditions. Redshift stores data using a columnar approach, which is useful for data aggregation.

All this at an extremely low price makes it possible for small to medium sized organizations to use Redshift’s power to get business insights.

How has it helped my organization?

One of my clients required large amounts of data but had a low budget. Amazon Redshift was the perfect choice for my client. We joined two tables containing billions of rows each and got results back in 27 seconds with a relatively small cluster of nodes.

What needs improvement?

Amazon should bring more SQL functions that are required in data warehouse implementations. It lacks SQL functions for complex data processing. A very small example is recursive queries. However, Amazon is developing the product at a fast pace and bringing new features with every release.

For how long have I used the solution?

I’ve been using Redshift for more than two years. I created one traditional data warehouse with 3-tier architecture and one big data solution.

What do I think about the stability of the solution?

We have not really had stability problems. The product is mature and can be utilized for production systems.

What do I think about the scalability of the solution?

Since Redshift is on AWS cloud, scalability is not an issue. With a few clicks, cluster size can be increased or reduced. This is useful especially when you expect a large amount of data processing temporarily. For example, on Black Friday retail organizations expect large amounts of data flow/processing. Redshift can be scaled up for few days to accommodate the surge of data and then scaled back to normal cluster size to save OPEX.

How are customer service and technical support?

The AWS team gives special focus to customer support. This is a very big benefit of going to the cloud. You get a reply from AWS in small time frame.

Which solution did I use previously and why did I switch?

I worked on Teradata and IBM solutions. Redshift gives performance similar to these solutions and costs a fraction of the amount.

How was the initial setup?

Your Redshift can be up and running with few clicks and in less than 5 minutes. A big benefit when you shift to cloud.

Which other solutions did I evaluate?

We analyzed Microsoft, Oracle, AWS RDS and Mango DB for our requirements.

What other advice do I have?

Redshift is based on PostgreSQL and adds MPP/columnar features to make it a data warehouse product. It is very easy for developers to adopt this solution. Your existing team can easily work on Redshift with no extra cost of learning.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user572622
BI Architect & Developer (contract) at a retailer with 501-1,000 employees
Vendor
You can configure tables to live in the memory of all of the available cores.

What is most valuable?

Column store and distributed processing is optimized for read access. We grew to 3000+ users with no impact.

Column store is a data compression technique for relational data. I’m using it now in SQL Server 2016. We configured a 16-core VM for handling requests on the DB. The recommendation was to separate inbound data packets into related chunks, which were 1/16th of the size.

This way, the import process could make full use of parallelization, and it worked. We imported 20 million rows of sales facts in less than 15 seconds, and the content was query-able immediately. I’ve never seen that before. This was impressive. This meant that we could completely rebuild the data warehouse to “current” from "scratch" within minutes, assuming that the data was in S3 already.

Tables that would typically be 2GB in size are now about 250MB. This means more data in memory. You can also configure the tables to live in the memory of all of the available cores. This is good for small dimension tables. You can also fragment them across all cores, for the larger fact tables. This allows for distributed query processing. Once you set it up, it just worked. It was all specified in the PG-SQL table statements.

There were two data centers in Sydney that were guaranteeing us a distributed solution. We really didn’t notice this. It was more of a check box situation. At one point, there was an outage at AWS, but it didn’t impact our operations directly.

How has it helped my organization?

This has given us the ability to provide metrics to the large number of company staff on their performance without impacting core systems.

What needs improvement?

I’d like to see these RedShift features arrive in other languages, such as SQL's ColumnStore index.

.

For how long have I used the solution?

I have used this solution for three years.

What do I think about the stability of the solution?

There have been no stability issues.

How are customer service and technical support?

Technical support always met my expectations.

Which solution did I use previously and why did I switch?

I was on a team that was using AWS tools for Dick Smith Electronics (now liquidated). The tools ceased use in February of 2016.

Prior to that, we were using them fully for about 3 years. We loaded data to Redshift according to the best practices included in the online docs and through consultation with the AWS staff. The combination of S3 and Redshift for this purpose was very high in performance. Redshift was used to provide the data model to an instance of MicroStrategy for BI reporting.

We were using MicroStrategy, which generated all the SQL that our reporting services needed.

As such, I could only comment on the data engineering phase. Technically, this was so impressive that I don’t know what to add. I don’t recall feeling that it missed anything. If anything, I was not using all the available features. AWS documentation is great in this regard. You can tell they have put a lot of thought into it.

A lot of the future direction in database technology has to do with memory optimization and concurrency (VoltDB). This is more targeted towards transactional processing, and not data warehousing.

Memory-only data warehousing solves a lot of access issues without having to think too hard about the problem from the consumers' point of view. I am sure that you can already configure this.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user583371
BI Architect at a comms service provider with 5,001-10,000 employees
Vendor
Columnar storage technology is valuable.

What is most valuable?

Columnar storage technology is the most valuable feature of this solution.

How has it helped my organization?

We can get the SLS/SLAs in our daily processes.

What needs improvement?

Some improvements can be brought about in:

Restore table:

I would like to use this option to move data across different clusters. Right now, you can only restore a table from the same cluster.

Right now, the feature only permits bringing the table back in the same cluster, based on the snapshot taken. I would like to have a similar option to move data across different clusters, right now I have to UNLOAD from cluster A and then COPY in cluster B. I would like to use the snapshots taken to bring the data in the cluster I need.
Maybe current design cannot be used, because it is based on nodes and data distribution.

But, our real scenario is: if we lose the data and we need to recover it in other cluster, we have to do:

1) Restore table in current table with a different name

2) Unload data to s3

3) Copy data to a new cluster. When we are talking about billions of records is complex to do.

Vacuum process: The vacuum needs to be segmented. For example, after 24 hours of execution, I had to cancel the process and 0% was sorted (big table).


Vacuum process:

The vacuum needs to be segmented, example after 24 hr of execution, I had to cancel the process and 0 % was sorted (big table)"

For big tables (billions of records). if the table is 100% unsorted, the vacuum can take more than 24hrs. If we don't have this timeframe, we have to work around taking out the data to additional tables and run vacuum by batches in the main table.

Why, because If I run the vacuum directly over the main table, and I stop it after 5 hrs, 0 records will be sorted. I would like to run the vacuum over the main table, stop when I need but get vacuumed some records. Like incremental process.

For how long have I used the solution?

I have used this solution for around three years.

What do I think about the stability of the solution?

We did encounter stability issues, i.e., if you are using more than 25 nodes (ds2.xlarge), the cluster is totally unstable.

What do I think about the scalability of the solution?

I have not experienced any scalability issues.

How are customer service and technical support?

I would rate the technical support a 9/10 for normal issues.

However, for advanced issues, I would give it a 5/10 since I had to go directly with the AWS engineers support.

Which solution did I use previously and why did I switch?

Initially, we were using the Microsoft SQL solution. We decided to move over to this product due to the DWH volume and performance.

How was the initial setup?

In my opinion, the setup was normal.

What's my experience with pricing, setup cost, and licensing?

Based on quality of the product and its price, it is the one of the best options available in the market now.

Which other solutions did I evaluate?

We also looked at the Oracle solution.

What other advice do I have?

You need to make sure that the space used in DWH has to be a maximum of 50% of the total space.

You must create processes to vacuum and analyze tables frequently. Also, before creating the tables, you should choose the right encoding, DISTKEY and sort keys.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user576441
Senior Software Engineer [Redshift Programmer] at a tech services company with 1,001-5,000 employees
Consultant
It supports SCD1 and SCD2, and the star schema. Improvement is needed in the scope of data types and complex RDBMS functionalities.

What is most valuable?

The most valuable features of this product are:

  • Processing huge data in petabytes
  • Massively Parallel Processing (MPP)
  • Concept of data compression
  • The way it stores the data in drives especially with the distribution key
  • Supports BI tools like MicroStrategy (MSTR) and Tableau
  • Supports all the data warehouse core features such as SCD1 and SCD2, and different schemas like the star schema

How has it helped my organization?

It has helped us to understand the response and interest of the customers and the user conversion rate in this competitive world. Thus, it has helped us in the decision-making process.

What needs improvement?

In most of the scenarios, the data source for Redshift will be traditional RDBMS like MySQL, PostgreSQL, SQL server, etc. After migrating to Redshift, we will find few disconnects for w.r.t data types, the stored procedures and other complex functionalities. There is a need for improvement in these aspects, mainly in the scope of data types and some complex functionalities which we can perform in RDBMS.

For how long have I used the solution?

I have used this solution for more than a year.

What do I think about the stability of the solution?

I have not encountered any issues with stability. In terms of performance, Redshift is highly stable.

What do I think about the scalability of the solution?

I have not encountered any issues with scalability. We can easily scale the nodes in AWS only with a few clicks.

How are customer service and technical support?

I would give the technical support a 6 out of 10 rating.

Which solution did I use previously and why did I switch?

We have not used any other solution.

How was the initial setup?

The setup was straightforward for those who know AWS.

What's my experience with pricing, setup cost, and licensing?

The Redshift pricing policy is easy to understand.

Which other solutions did I evaluate?

We did not evaluate other options prior to selecting this solution.

What other advice do I have?

As of now, Redshift is far better than the other products in the market.

Lastly, I would like to mention that Redshift is more about scaling and stabilizing your data. One should also focus on data modeling from time to time.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
Senior Engineer, Big-Data/Data-Warehousing at a manufacturing company with 501-1,000 employees
Vendor
We create different-sized clusters and orchestrate them using the SDK.

What is most valuable?

The most valuable features to us are: speed, DML, the fact that it is cloud-based, the management console, and Boto3.

Because we are dealing with a lot of data, speed is always important. Redshift is blistering fast when doing "deep" copies and inserts. Conceptually, my data-transformation pipelines are a series of proprietary "waves" that leverage Redshift's DML/"deep" copy/insert strengths. Doing all this in the cloud allows us to easily test alternatives. We create different sized Redshift clusters and orchestrate them by using the SDK (Python Boto3). We go beyond the traditional DWH to "infrastructure-as-software".

How has it helped my organization?

Redshift has helped to transform Makerbot into a data-driven company.

What needs improvement?

Integrating database security/access rights with AWS IAM would be great. I would also like to see more DML features that might aid in processing unstructured or log-file data. This would allow us to avoid having to use EMR/Hadoop.

For how long have I used the solution?

We’ve used Amazon Redshift for 3 years.

What was my experience with deployment of the solution?

We did not encounter any deployment issues.

What do I think about the stability of the solution?

We did not encounter any issues with stability.

What do I think about the scalability of the solution?

We did not encounter any issues with scalability.

How are customer service and technical support?

Customer Service:

I think the customer services is adequate.

Technical Support:

The level of technical support is good.

Which solution did I use previously and why did I switch?

We tried prior solutions, but they had limited or no scalability/agility.

How was the initial setup?

The initial setup was straightforward.

What was our ROI?

It took less than a year for the product to pay for itself.

What's my experience with pricing, setup cost, and licensing?

Regarding pricing and licensing, I advise to start small and have your developers/DBA use table compression and partitioning from the start.

Which other solutions did I evaluate?

We have used different options over the last 20 years. We found AWS Redshift to be the leader in capability and provides an ecosystem of related services from AWS, many of which are free.

What other advice do I have?

My advice to other is to prototype, prototype, prototype! Everything depends on your data and what you need to do to it. No two projects are the same.

Disclosure: I am a real user, and this review is based on my own experience and opinions.