Padmanesh NC - PeerSpot reviewer
Big Data Solution Architect - Spatial Data Specialist at SCIERA, INC
Reseller
Top 5Leaderboard
It processes petabytes of data and supports many file formats. Restoring huge snapshots takes too long.

What is most valuable?

Scalability: Ability to load huge number of datasets (I have experience with petabytes of data) and process those things. Storage is not limited. We can increase whatever we want.

Performance: The distributed architecture of Redshift has the capacity to process the workflow in a different cluster and coordinate those things in the leader node, making the process much faster.

Flexibility: This feature is helpful for user to increase the node size and config depending on their need. There is no need to wait for hardware to be in place whenever we increase the dataset. Redshift provides the option to increase the node or cluster size whenever required.

Multi-formatted accessibility: The Redshift engine has the capability to read the following file formats: CSV, DELIMITER, FIXEDWIDTH, AVRO, JSON, BZIP2, GZIP, LZOP. The user can choose which is best for their requirements.

VPC configuration: VPC configuration secures our dataset, which we keep inside the Redshift cluster. This VPC config doesn’t allow any third party in or out bound against firewall.

Python UDF calls: This is useful for a user to create their own user-defined function through Python and import that class into Redshift and process the dataset.

How has it helped my organization?

We were using MySQL & MongoDB for our regular operations, but when we grew, we were forced to handle a huge number of datasets. It could be petabytes of data in and out on a regular basis. We struggled a lot to complete the operations in a timely manner. With Amazon Redshift, we gained a lot in terms of timing, as well as project completion.

Some of the scoring mechanism really works well in the distributed architecture of Amazon Redshift.

What needs improvement?

Of course, every product has pluses and minuses. From that perspective, Amazon Redshift has some issues with snapshot restoring when we handle huge datasets. When our snapshot size is really huge, like 20 TB+, we are forced to wait a long time to get it restored. This is reasonable, as they need to transfer the entire dataset to the cluster.

My thought on this issue is that Amazon has their own data centers and they are connecting each region of storage through Direct Connect. The input and output network data transfer might not be a complex thing. For example, if they used 10 Gbps network transfer, they can transfer 1 TB in less than two minutes, but that’s not happening now. To restore 1 TB of data, it takes more than 30-40 minutes.

For how long have I used the solution?

I have used it for the last 3.5 Years.

I am using Amazon Redshift for big data mapping and data aggregation.

We are using most of their products. Specifically, we are using their dedicated data-centre service (Direct Connect). We are using Amazon products such as Amazon EC2, S3, SQS, EMR, ML, CloudWatch, Redshift, DynamoDB, etc., for more than 10-12 years.

Buyer's Guide
Amazon Redshift
April 2024
Learn what your peers think about Amazon Redshift. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
769,334 professionals have used our research since 2012.

What do I think about the stability of the solution?

I have encountered stability issues. A few weeks ago, I encountered an issue with hardware failure and database health status failure. When we face these kind of issues, we can't do anything from our side until the Amazon technical team finds the issue and rectifies it. It takes time to get resolved. If we are in a rush to deliver something for a client and encountered these issue, we are really screwed.

What do I think about the scalability of the solution?

Ofcourse. When the amount of data that we handle in the cluster grew, we need to increase the cluster or node size. Apparently, the size of node or cluster increases the hold time for synchronizing the data (meta data) with the node manager. The initial time increases when we start the cluster.

How are customer service and support?

Customer Service:

Customer Service good. But couldn't make direct call to customer service many times. I could catch them through their web UI rather making direct call.

Technical Support:

Technical support is really great, but it’s paid support. The Basic Support plan doesn't have the option for technical support. It’s only providing billing support.

Which solution did I use previously and why did I switch?

I have experience working in Hadoop as well. When I compare the two (Redshift & Hadoop), Redshift is more user friendly in terms of configuration and maintenance.

How was the initial setup?

The initial setup of Amazon Redshift is so simple and straightforward. We do not need to read or understand any of the technical documentation. Simply said, it’s a plug-and-play service or platform.

What about the implementation team?

I have implemented through in-house.

What was our ROI?

In terms of ROI, I can't directly convert to it. Because we are not using only Redshift. We are using multiple product to increase our revenue and decrease time consumption. So It's difficult to calculate ROI of Redshift usage.

What's my experience with pricing, setup cost, and licensing?

Pricing and licensing is so important. In terms of pricing, it's bit high, as they are using standard hardware. My advice to users is: We need to start the cluster when we require it. At the end of the workday, we can just snapshot the clusters and shut them down. And then we restore those snapshots when we need them back. That way, we are charged only for usage rather than spending money on wait time or sleep.

Which other solutions did I evaluate?

I evaluated Hadoop and Spark, along with Redshift. I have no negative comments about those other products. Redshift is flexible in terms of configuration, maintenance and security, especially VPC configuration, which secures our data a lot.

What other advice do I have?

Use this product for huge data mapping or aggregation. Use Redshift through VPC to keep their data very secure and for a long time.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user576444 - PeerSpot reviewer
Rails Developer at a recruiting/HR firm with 51-200 employees
Vendor
It's based on PostgreSQL, is a managed solution, and has low price per terabyte per year.

What is most valuable?

  • It is based on PostgreSQL.
  • It’s managed. Meaning, AWS takes care of handling infrastructure, deployments, encryption, and uptime for you.
  • It’s cheap when you consider the price per terrabyte per year.
  • It’s integrated into the AWS stack.

How has it helped my organization?

At my previous company that does mobile analytics as its core product, we moved all the analytics backend from MongoDB to Redshift. Where I currently work, we use it as our main data lake/data warehouse.

What needs improvement?

While It's probably the best product of its category (managed SQL-based data warehouse at scale), it has a few shortcomings, although very few.

The main issue people complain about, and I agree with the claim, is that it's hard to load your data into it. You need to first export your data on S3 as CSV, JSON or AVRO. Then you can load it into Redshift. And even then, you have to make sure your data is properly formatted. (you can use the copy options: TRUNCATECOLUMNS to load fields that are too big, and MAXERROR to allow for a given number of errors while loading). In general, ETL and data cleaning is a hurdle in data engineering, and Redshift suffers from it.

For how long have I used the solution?

I have used Redshift for three years.

What do I think about the stability of the solution?

I once had an issue because my data contained a Unicode NULL character in a VARCHAR field ("\u0000"). The AWS support has been very quick and helpful to respond. Other than that, I have had no issues whatsoever.

What do I think about the scalability of the solution?

No scalability issues whatsoever.

How are customer service and technical support?

Technical support is very good.

Which solution did I use previously and why did I switch?

At my previous company, we switched from MongoDB to Redshift. The main reason was price and performance. At my current company, we started a data warehouse (greenfield project). The choice was between Google BigQuery and AWS Redshift. The main criteria was that Redshift was PostgreSQL-based and supports CTE and Window functions (PostgreSQL features).

How was the initial setup?

The big part when using Redshift is setting up the ETLs and doing the data cleaning. It was very hard when moving from MongoDB, because I had to re-discover our data schema (that had no spec). With that said, in both cases (moving from MongoDB and starting from scratch), I had a prototype up in about a day. By that I mean that I had the most important parts of my data loaded into Redshift and I could query it.

What's my experience with pricing, setup cost, and licensing?

The pricing page is explicit. Choose what suits your needs in terms of storage and performance.

Which other solutions did I evaluate?

For setting up a data warehouse, BigQuery was a serious contender. BigQuery is simpler to setup and scale. It's also more of a black box: you worry less what's inside and how it scales and you get charged for what you consume (which is both a pro and a con). With Redshift, you choose in advance the type of machine you want, like EC2 (resizing your cluster is easy).

What other advice do I have?

If you evaluate Redshift, chances are that you should evaluate BigQuery too. So take the time to weigh the pro and cons of each (plenty has been written online about that).

Take a look at the reserved instances pricing. It is very advantageous if you know you will stick with Redshift for some time.

Take the time to learn PostgreSQL (eg: https://www.pgexercises.com/). Redshift, while based on PostgreSQL 8.0, supports a good number of advanced Postgres features.

Do not be afraid of joins. PostgreSQL is performs very well in this regard.
If you need performance, have a look at the suggested optimizations in the official documentation (such as setting up the correct distkeys, sortkeys and compression schemes).

Understand that Redshift has no indexes.

Understand that Redshift is an analytical database with columnar storage, and that it does not enforce constraints.

Redshift plays very well with a PostgreSQL instance in RDS linked to it via DBLINK (see this guide: https://aws.amazon.com/blogs/big-data/join-amazon-redshift-and-amazon-rds-postgresql-with-dblink/). I've used this in production at my current company, and this is tremendously useful. You can have your raw data in Redshift and aggregate it directly into RDS. To do this, insert into RDS what you select from Redshift through the dblink.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Amazon Redshift
April 2024
Learn what your peers think about Amazon Redshift. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
769,334 professionals have used our research since 2012.
it_user583371 - PeerSpot reviewer
BI Architect at a comms service provider with 5,001-10,000 employees
Vendor
Columnar storage technology is valuable.

What is most valuable?

Columnar storage technology is the most valuable feature of this solution.

How has it helped my organization?

We can get the SLS/SLAs in our daily processes.

What needs improvement?

Some improvements can be brought about in:

Restore table:

I would like to use this option to move data across different clusters. Right now, you can only restore a table from the same cluster.

Right now, the feature only permits bringing the table back in the same cluster, based on the snapshot taken. I would like to have a similar option to move data across different clusters, right now I have to UNLOAD from cluster A and then COPY in cluster B. I would like to use the snapshots taken to bring the data in the cluster I need.
Maybe current design cannot be used, because it is based on nodes and data distribution.

But, our real scenario is: if we lose the data and we need to recover it in other cluster, we have to do:

1) Restore table in current table with a different name

2) Unload data to s3

3) Copy data to a new cluster. When we are talking about billions of records is complex to do.

Vacuum process: The vacuum needs to be segmented. For example, after 24 hours of execution, I had to cancel the process and 0% was sorted (big table).


Vacuum process:

The vacuum needs to be segmented, example after 24 hr of execution, I had to cancel the process and 0 % was sorted (big table)"

For big tables (billions of records). if the table is 100% unsorted, the vacuum can take more than 24hrs. If we don't have this timeframe, we have to work around taking out the data to additional tables and run vacuum by batches in the main table.

Why, because If I run the vacuum directly over the main table, and I stop it after 5 hrs, 0 records will be sorted. I would like to run the vacuum over the main table, stop when I need but get vacuumed some records. Like incremental process.

For how long have I used the solution?

I have used this solution for around three years.

What do I think about the stability of the solution?

We did encounter stability issues, i.e., if you are using more than 25 nodes (ds2.xlarge), the cluster is totally unstable.

What do I think about the scalability of the solution?

I have not experienced any scalability issues.

How are customer service and technical support?

I would rate the technical support a 9/10 for normal issues.

However, for advanced issues, I would give it a 5/10 since I had to go directly with the AWS engineers support.

Which solution did I use previously and why did I switch?

Initially, we were using the Microsoft SQL solution. We decided to move over to this product due to the DWH volume and performance.

How was the initial setup?

In my opinion, the setup was normal.

What's my experience with pricing, setup cost, and licensing?

Based on quality of the product and its price, it is the one of the best options available in the market now.

Which other solutions did I evaluate?

We also looked at the Oracle solution.

What other advice do I have?

You need to make sure that the space used in DWH has to be a maximum of 50% of the total space.

You must create processes to vacuum and analyze tables frequently. Also, before creating the tables, you should choose the right encoding, DISTKEY and sort keys.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user576441 - PeerSpot reviewer
Senior Software Engineer [Redshift Programmer] at a tech services company with 1,001-5,000 employees
Consultant
It supports SCD1 and SCD2, and the star schema. Improvement is needed in the scope of data types and complex RDBMS functionalities.

What is most valuable?

The most valuable features of this product are:

  • Processing huge data in petabytes
  • Massively Parallel Processing (MPP)
  • Concept of data compression
  • The way it stores the data in drives especially with the distribution key
  • Supports BI tools like MicroStrategy (MSTR) and Tableau
  • Supports all the data warehouse core features such as SCD1 and SCD2, and different schemas like the star schema

How has it helped my organization?

It has helped us to understand the response and interest of the customers and the user conversion rate in this competitive world. Thus, it has helped us in the decision-making process.

What needs improvement?

In most of the scenarios, the data source for Redshift will be traditional RDBMS like MySQL, PostgreSQL, SQL server, etc. After migrating to Redshift, we will find few disconnects for w.r.t data types, the stored procedures and other complex functionalities. There is a need for improvement in these aspects, mainly in the scope of data types and some complex functionalities which we can perform in RDBMS.

For how long have I used the solution?

I have used this solution for more than a year.

What do I think about the stability of the solution?

I have not encountered any issues with stability. In terms of performance, Redshift is highly stable.

What do I think about the scalability of the solution?

I have not encountered any issues with scalability. We can easily scale the nodes in AWS only with a few clicks.

How are customer service and technical support?

I would give the technical support a 6 out of 10 rating.

Which solution did I use previously and why did I switch?

We have not used any other solution.

How was the initial setup?

The setup was straightforward for those who know AWS.

What's my experience with pricing, setup cost, and licensing?

The Redshift pricing policy is easy to understand.

Which other solutions did I evaluate?

We did not evaluate other options prior to selecting this solution.

What other advice do I have?

As of now, Redshift is far better than the other products in the market.

Lastly, I would like to mention that Redshift is more about scaling and stabilizing your data. One should also focus on data modeling from time to time.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Consultant at a tech services company with 51-200 employees
Consultant
High performance, efficient, and helpful support
Pros and Cons
  • "The most valuable features of Amazon Redshift are that its fast and efficient. We have lots of TBs of data and it's very fast."
  • "Amazon Redshift could improve the user interface support."

What is our primary use case?

We are using Amazon Redshift services to query the data and to perform certain data science operations on that data, such as applying a machine learning algorithm or doing an analysis.

What is most valuable?

The most valuable features of Amazon Redshift are that its fast and efficient. We have lots of TBs of data and it's very fast.

What needs improvement?

Amazon Redshift could improve the user interface support.

For how long have I used the solution?

I have been using Amazon Redshift for approximately one year.

What do I think about the stability of the solution?

Amazon Redshift is a stable solution. However, there are many times the environment configuration changes very quickly without any intimidation and it creates a lot of problems for running our codes.

What do I think about the scalability of the solution?

The scalability of Amazon Redshift is good. The solution is best suited for larger-scale businesses because the price is affordable for them and they need the complexity.

How are customer service and support?

The support from Amazon Redshift is very good.

How was the initial setup?

Amazon Redshift is somewhat complex to deploy. The process could improve.

What's my experience with pricing, setup cost, and licensing?

Amazon Redshift is an expensive solution. Larger organizations can afford this solution, but smaller businesses would struggle to afford it.

What other advice do I have?

I rate Amazon Redshift an eight out of ten.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: My company has a business relationship with this vendor other than being a customer: partner
PeerSpot user
it_user576456 - PeerSpot reviewer
Manager BI Development at a comms service provider with 1,001-5,000 employees
Vendor
The fact that it stores data using a columnar approach allows us to use columns in join conditions.

What is most valuable?

Redshift gives extremely fast response involving large tables. This is the most important feature I look for in data warehouse solutions. Often you came across use cases where it is not possible to distribute data on a certain column, yet you need this column in join conditions. Redshift stores data using a columnar approach, which is useful for data aggregation.

All this at an extremely low price makes it possible for small to medium sized organizations to use Redshift’s power to get business insights.

How has it helped my organization?

One of my clients required large amounts of data but had a low budget. Amazon Redshift was the perfect choice for my client. We joined two tables containing billions of rows each and got results back in 27 seconds with a relatively small cluster of nodes.

What needs improvement?

Amazon should bring more SQL functions that are required in data warehouse implementations. It lacks SQL functions for complex data processing. A very small example is recursive queries. However, Amazon is developing the product at a fast pace and bringing new features with every release.

For how long have I used the solution?

I’ve been using Redshift for more than two years. I created one traditional data warehouse with 3-tier architecture and one big data solution.

What do I think about the stability of the solution?

We have not really had stability problems. The product is mature and can be utilized for production systems.

What do I think about the scalability of the solution?

Since Redshift is on AWS cloud, scalability is not an issue. With a few clicks, cluster size can be increased or reduced. This is useful especially when you expect a large amount of data processing temporarily. For example, on Black Friday retail organizations expect large amounts of data flow/processing. Redshift can be scaled up for few days to accommodate the surge of data and then scaled back to normal cluster size to save OPEX.

How are customer service and technical support?

The AWS team gives special focus to customer support. This is a very big benefit of going to the cloud. You get a reply from AWS in small time frame.

Which solution did I use previously and why did I switch?

I worked on Teradata and IBM solutions. Redshift gives performance similar to these solutions and costs a fraction of the amount.

How was the initial setup?

Your Redshift can be up and running with few clicks and in less than 5 minutes. A big benefit when you shift to cloud.

Which other solutions did I evaluate?

We analyzed Microsoft, Oracle, AWS RDS and Mango DB for our requirements.

What other advice do I have?

Redshift is based on PostgreSQL and adds MPP/columnar features to make it a data warehouse product. It is very easy for developers to adopt this solution. Your existing team can easily work on Redshift with no extra cost of learning.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user705738 - PeerSpot reviewer
Senior Solutions Engineer, West at a tech vendor with 5,001-10,000 employees
Vendor
It helped my customers migrate off on-premise platforms
Pros and Cons
  • "Redshift COPY command, because much of my work involved helping customers migrate large amounts of data into Redshift."
  • "Migrating data from other data sources can be challenging when you are working with multibyte character sets."

What is most valuable?

Redshift COPY command, because much of my work involved helping customers migrate large amounts of data into Redshift.

How has it helped my organization?

It helped my customers migrate off on-premise platforms such as Teradata to Redshift, at a fraction of the cost.

What needs improvement?

There are challenges with dealing with character set mismatches. Migrating data from other data sources can be challenging when you are working with multibyte character sets.

For how long have I used the solution?

Two years.

What do I think about the stability of the solution?

No.

What do I think about the scalability of the solution?

I personally haven’t hit scalability issues but at dinner a year ago with a few of my existing customers (all Fortune 500 companies), I was told there are scalability issues once you get to 32-nodes.

One of my previous customers told me they were migrating off Redshift because they hit the ceiling and had scalability issues. They told me the responsiveness they were getting was inferior to alternative solutions once your Redshift gets to a specific size.

How are customer service and technical support?

I never utilized AWS technical support.

Which solution did I use previously and why did I switch?

I’ve helped customers migrate off Teradata, SQL Server , Oracle Exadata, Greenplum, and ParAccel Matrix to Redshift. Some due to cost savings, others because of the EOL of the product.

How was the initial setup?

Setup of Redshift infrastructure is pretty straightforward. I’ve been told that setting up partitions can be tricky in order to ensure good performance.

What's my experience with pricing, setup cost, and licensing?

I have nothing to add here as I wasn’t involved in this part of the process. However, one of my customers went with Google Big Query over Redshift because it was significantly cheaper for their project.

Which other solutions did I evaluate?

I only provided advice to my customers, but some looked at Azure SQL DW , Greenplum, Netezza, and Google Big Query as possible alternatives

What other advice do I have?

Be careful with vendor lock-in! You cannot move your Redshift environment to a different cloud provider or to an on-premise solution.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
BI Manager at jfrog
Real User
You can copy JSON to the column and have it analyzed using simple functions
Pros and Cons
  • "You can copy JSON to the column and have it analyzed using simple functions."
  • "It lacks a few features which can be very useful, such as stored procedures"

What is most valuable?

The features I find valuable in Redshift are JSON format support. You can copy JSON to the column and have it analyzed using simple functions. Second, is the parallel off/on where you can choose if you want it to unload to split files or into one file.

How has it helped my organization?

Since we have lots of data sources and high volumes, we needed a unified and organized DB that can handle these amounts and will be our single source of truth for the organization. Therefore, Redshift is the best solution.

What needs improvement?

It lacks a few features which can be very useful, such as stored procedures, Also, one needs to perform Vacuum in order to manage this DB. It would be nice not to worry about that and have this manageable.

For how long have I used the solution?

Three years.

What do I think about the stability of the solution?

Yes. Sometimes, for some reason, Redshift is down (not due to maintenance).

What do I think about the scalability of the solution?

No, cause we know how to use Redshift. We have a cluster of both HDD and SSD for which we keep the maximum data in each, so it would be scalable.

How is customer service and technical support?

Great. They are available and very helpful.

How was the initial setup?

Initial setup is very straightforward, very easy. No need of any side help.

What's my experience with pricing, setup cost, and licensing?

If you want to think of every query you make but want to know that your nodes are fully managed, then use BigQuery Data Analytics. If you want a fixed price, an to not worry about every query, but you need to manage your nodes personally, use Redshift.

Which other solutions did I evaluate?

I did not. we did consider using BigQuery Data Analytics, but eventually, we decided to use Redshift.

What other advice do I have?

My rating would be 8.5. This a great product, but one still needs to know how to manage clusters and nodes.

In order to make your DB scalable and reliable. it has the greatest benefit of build on PostgreSQL, so any data specialist that has SQL experience can handle Redshift.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Download our free Amazon Redshift Report and get advice and tips from experienced pros sharing their opinions.
Updated: April 2024
Product Categories
Cloud Data Warehouse
Buyer's Guide
Download our free Amazon Redshift Report and get advice and tips from experienced pros sharing their opinions.