We just raised a $30M Series A: Read our story

VMware Tanzu Greenplum OverviewUNIXBusinessApplication

VMware Tanzu Greenplum is #10 ranked solution in top Data Warehouse tools. IT Central Station users give VMware Tanzu Greenplum an average rating of 8 out of 10. VMware Tanzu Greenplum is most commonly compared to Snowflake:VMware Tanzu Greenplum vs Snowflake. The top industry researching this solution are professionals from a computer software company, accounting for 32% of all views.
What is VMware Tanzu Greenplum?

Parallel Postgres for enterprise analytics at scale
With improved transaction processing capability and support for streaming ingest, Greenplum can address workloads across a spectrum of analytic and operational contexts, from traditional business intelligence to deep learning.

VMware Tanzu Greenplum is also known as Greenplum, Pivotal Greenplum.

Buyer's Guide

Download the Data Warehouse Buyer's Guide including reviews and more. Updated: November 2021

VMware Tanzu Greenplum Customers

General Electric, Conversant, China CITIC Bank, Aridhia, Purdue University

VMware Tanzu Greenplum Video

Archived VMware Tanzu Greenplum Reviews (more than two years old)

Filter by:
Filter Reviews
Industry
Loading...
Filter Unavailable
Company Size
Loading...
Filter Unavailable
Job Level
Loading...
Filter Unavailable
Rating
Loading...
Filter Unavailable
Considered
Loading...
Filter Unavailable
Order by:
Loading...
  • Date
  • Highest Rating
  • Lowest Rating
  • Review Length
Search:
Showingreviews based on the current filters. Reset all filters
RA
Site Manager at a tech services company with 51-200 employees
Real User
Helpful for horizontal scaling, but the deployment process can be made easier

What is our primary use case?

The primary use of this solution is for horizontal scalability. We have an on-premises deployment.

What is most valuable?

The most valuable feature for us is horizontal scaling.

What needs improvement?

The deployment process for this solution could be made easier. I saw some limitation with respect to the column store, and removing this would be an improvement.

For how long have I used the solution?

We have been using this solution for about two months.

What do I think about the stability of the solution?

This solution is stable enough. 

What do I think about the scalability of the solution?

We are not yet sure about scalability, but we will see in the future.

How are customer service and technical support?

We have not been in touch…

What is our primary use case?

The primary use of this solution is for horizontal scalability.

We have an on-premises deployment.

What is most valuable?

The most valuable feature for us is horizontal scaling.

What needs improvement?

The deployment process for this solution could be made easier.

I saw some limitation with respect to the column store, and removing this would be an improvement.

For how long have I used the solution?

We have been using this solution for about two months.

What do I think about the stability of the solution?

This solution is stable enough. 

What do I think about the scalability of the solution?

We are not yet sure about scalability, but we will see in the future.

How are customer service and technical support?

We have not been in touch with technical support for this solution.

Which solution did I use previously and why did I switch?

We were using MySQL prior to this solution.

How was the initial setup?

The initial setup of this solution was not simple, but it was not too complex, either. I would say that it was of medium complexity.

Our deployment took approximately two days, although this was only a partial deployment. We have yet to add another node.

What about the implementation team?

We performed the deployment in-house.

What's my experience with pricing, setup cost, and licensing?

We are using the open-source version of this solution.

Which other solutions did I evaluate?

Prior to choosing this solution, we evaluated MySQL and PostgreSQL.

What other advice do I have?

I would rate this solution a seven out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
President of the Board at a tech services company with 51-200 employees
Real User
Top 5
An excellent open-source solution with great analytical performance

Pros and Cons

  • "Scalability is simple because it's an MPP database. If you need more processing power or you need more storage, you just add a few more nodes in the cluster. It works on common commodity hardware. You can use any type of server. You don't need to have proprietary hardware. It's fairly flexible."
  • "Some integration with other platforms like design tools, and ETL development tools, that will enable some advanced functionality, like fully down processing, etc."

What is most valuable?

It's a good core database. Scalability and performance are very good. I also like the fact the solution is open-source, so you can use it free of charge.

What needs improvement?

Some integration with other platforms like design tools, and ETL development tools, that will enable some advanced functionality (like fully down processing, etc.) would be helpful in future releases. Also, if the solution could offer automated creation of DDL statements from power designers, for example, it would be very useful. 

For how long have I used the solution?

I've been using the solution for five years.

What do I think about the stability of the solution?

It's a software, and like any software, it has some bugs. However, you can add new features to improve it. Overall, our customers, who are big telcos, have been very satisfied with the platform and with its stability and performance

What do I think about the scalability of the solution?

Scalability is simple because it's an MPP database. If you need more processing power or you need more storage, you just add a few more nodes in the cluster. It works on common commodity hardware. You can use any type of server. You don't need to have proprietary hardware. It's fairly flexible.

The solution requires a minimum amount of downtime when scaling. You can even add additional nodes without any downtime at all. I'm not 100% sure, but I think you can just reconfigure it and the background processes, and Greenplum will do the redistribution of the data.

How are customer service and technical support?

Technical support is very good. The model they are using to fund the development of their open-source product is via revenue from support for enterprise customers, so they are very attentive when issues arise.

How was the initial setup?

The solution is very straightforward to set up and is also easy to administer and develop using other open-source tools.

What's my experience with pricing, setup cost, and licensing?

It's open-source, so it's free to use.

What other advice do I have?

I'm a partner that works mainly with enterprises. Mostly the partners are big telcos and we deal with tens of terabytes of data.

MPP and columnar databases are the future of the analytical landscape. The era of appliances is over, so implementation of an MPP database on-premises or on the cloud is the way to go. Greenplum is definitely one of the leaders in this area.

I would rate the solution eight out of ten. If they improved the integration with other platforms in the landscape I would rate it higher. 

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Find out what your peers are saying about VMware, Apache, Micro Focus and others in Data Warehouse. Updated: November 2021.
552,695 professionals have used our research since 2012.
it_user1127370
Co-Founder, Chief of Operations with 10,001+ employees
Real User
A scalable and future-proof solution for data warehousing

Pros and Cons

  • "We chose Greenplum because of the architecture in terms of clustering databases and being able to have, or at least utilize the resources that are sitting on a database."
  • "The installation is difficult and should be made easier."

What is our primary use case?

We install this solution for our clients. At the moment we are in the middle of an installation for a data warehouse that will be used by a telecommunications company that is based in Lesotho. We have not gone into production yet, but we have used it in a test environment and it works very well.

We are a technology company, so we handle software development, software implementation, data warehousing, and business intelligence.

We are using the on-premise deployment model. In Africa, there isn't much adoption of cloud services, so most of our clients are expecting on-premise implementation.

What is most valuable?

We chose Greenplum because of the architecture in terms of clustering databases and being able to have, or at least utilize the resources that are sitting on a database.

What needs improvement?

The installation is difficult and should be made easier. Maybe if the process was simpler it would have a quicker adoption by other developers. This could also be accomplished by providing training aids, such as videos to help with installation or using certain features. There are resources currently available on their website, but you have to search through a lot of documentation.

For how long have I used the solution?

We are currently implementing this solution.

What do I think about the scalability of the solution?

Our expectation is that the scalability will be good, as it is one of the main reasons that we have invested in this solution.

How are customer service and technical support?

To this point, I have referenced the material on the website but have not really interacted with technical support.

How was the initial setup?

The initial setup of this solution is not very simple. You need to properly follow the steps in terms of getting the whole architecture put together. We have a team of five people who are working on different aspects of the implementation.

Currently, we are focusing on the data layer. Next will be the ETL layer.

What about the implementation team?

We are using our in-house team to implement this solution for our client.

Which other solutions did I evaluate?

We have used Oracle and Microsoft SQL, but we haven't had much success. We found that Oracle was not as scalable and we were having some performance bottlenecks. Also, from a licensing perspective, Greenplum was a better choice. For all of these reasons, we have chosen to invest heavily in Greenplum.

What other advice do I have?

I would recommend this solution specifically for the scalability. This solution has a more futuristic technology, as opposed to the old school kind of data warehousing. If people are interested in getting something that is more future-proof, then I would recommend this solution.

So far, we're comfortable with what we've seen. What we have configured is addressing our needs at the moment.

I would rate this solution an eight out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
Information Architecture Specialist (TOGAF Certified) at a comms service provider
Vendor
Handles complex queries and report production efficiently, integrates with Hadoop

Pros and Cons

  • "It's one of the fastest databases in the market. It's easy to use. From a maintenance perspective it's a good product. The segmentation, or architecture of the product is different than other databases such as Oracle. So even in 10 years, the data distribution for such segments will not affect other segments. The query performance of the product, for complex queries, is very good. It has good integration with Hadoop."
  • "Implementation takes a long time."
  • "One of the disadvantages, not a disadvantage with the product itself, but overall, is the expertise in the marketplace. It's not easy to find a Greenplum administrator in the market, compared to other products such as Oracle."
  • "they need to interact more with customers. They need to explain the features, especially when there are new releases of Greenplum. I know just from information I've found that it has other features, it can be used to for analytics, for integration with Big Data, Hadoop. They need to focus on this part with the customer."
  • "They need to enhance integration with other Big Data products... to integrate with Big Data platforms, and to open a bi-directional connection between Greenplum and Big Data."

What is our primary use case?

We use it for data warehousing.

How has it helped my organization?

For complex queries, which would normally take a long time, and for reporting, it is very efficient. It doesn't take a long time for the execution of any report for the end-user.

What is most valuable?

  • It's one of the fastest databases in the market.
  • It's easy to use.
  • From a maintenance perspective it's a good product.
  • The segmentation, or architecture of the product is different than other databases such as Oracle. So even in 10 years, the data distribution for such segments will not affect other segments.
  • The query performance of the product, for complex queries, is very good.
  • It has good integration with Hadoop and Big Data.

What needs improvement?

The implementation of an upgrade takes a long time. But maybe it's different from one instance to another, I'm not sure.

Also, one of the disadvantages, not a disadvantage with the product itself, but overall, is the expertise in the marketplace. It's not easy to find a Greenplum administrator in the market, compared to other products such as Oracle. We used to work with such products, but for Greenplum, it's not easy to find resources with the knowledge of administration of the database.

For how long have I used the solution?

More than five years.

What do I think about the stability of the solution?

If we face any issues they're normal and we open tickets.

What do I think about the scalability of the solution?

It's scalable. I would rate scalability seven out of 10.

How are customer service and technical support?

We hired one DB admin for Greenplum. If he faces any issues he opens tickets with the vendor, but most of the issues, 90% of them, he is able to solve without support.

Which solution did I use previously and why did I switch?

We used to other products before, but when we worked with Greenplum, as compared to other products on the market, we found it's a good product.

Before Greenplum, we used Oracle but it was mostly obsolete. So we had to upgrade our tools. We needed to have a database with an API tool.

How was the initial setup?

I'm not a professional in the setup but setup of the environment itself was managed by us. We managed between development, testing, and production servers. We are able to maintain it. I don't think it is complicated.

Most of the issues can be solved without referring back to support. A very small minority of issues required support from the vendor.

What's my experience with pricing, setup cost, and licensing?

Pricing is good compared to other products. It's fine.

Which other solutions did I evaluate?

We did a comparison among some databases, one of them Greenplum. We assessed features, did a comparison in terms of the price, then we chose Greenplum. And we've retained it. We've found it's a good product, to date.

Oracle Exadata was part of the comparison, as was IBM Netazza. In terms of quality and the price, compared to the other products, we chose Greenplum. Also, to be honest, at that time we got a good offer: Use it for the first year with a minimal price. Then they opened a support contract with us, later. That was one of the advantages.

What other advice do I have?

I give it an eight out of 10. To bring it up to a 10, they need to interact more with customers. They need to explain the features, especially when there are new releases of Greenplum. I know just from information I've found that it has other features, it can be used to for analytics, for integration with Big Data, Hadoop. They need to focus on this part with the customer. 

Also they need to enhance integration with other Big Data products. They need to adapt more, give more features, because customers are looking for these things in the market now. They have the product itself already, but they need to integrate with Big Data platforms and to open a bi-directional connection between Greenplum and Big Data. They need to focus on these features more.

But, from my perspective, for what I'm looking for, I can say it's a good product. Most of the features I'm looking for are available.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user776835
Senior Data Warehouse Developer at a comms service provider with 51-200 employees
Vendor
Provides polymorphic storage and very fast query processing

Pros and Cons

  • "Very fast for query processing."
  • "It will be very useful if we could communicate with other database types from Greenplum (using a database link)."

What is our primary use case?

It is a very good appliance for data warehouse (DWH) usage.

What is most valuable?

  • Very fast for query processing
  • Parallel load
  • Polymorphic storage

How has it helped my organization?

Before we had Oracle Exadata, some queries would take more than 20 hours of execution. With Greenplum, it take a few minutes.

What needs improvement?

It will be very useful if we could communicate with other database types from Greenplum (using a database link).

For how long have I used the solution?

Four years.

What do I think about the stability of the solution?

No issues.

What do I think about the scalability of the solution?

No issues.

How are customer service and technical support?

Good. I would give their technical support a seven out of 10.

Which solution did I use previously and why did I switch?

Yes, Oracle Exadata. Performance was the main criteria for switching to Greenplum.

How was the initial setup?

It was a simple setup.

What's my experience with pricing, setup cost, and licensing?

It is the best product with best fit for price/performance customer objectives.

Which other solutions did I evaluate?

We evaluated Oracle technology that we used before.

What other advice do I have?

I encourage other customer to try Greenplum, specifically for DWH use. It is a very useful product.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
Senior Enterprise Technical Architect at a computer software company with 10,001+ employees
Real User
We experience performance of approximately 1TB per hour loading data to Greenplum without the use of specialized hardware.

Pros and Cons

  • "Scalable (Massive) Parallel Processing (MPP) – The ability to bring to bear large amounts of compute against large data sets with Greenplum and the EMC DCA has proven itself to be very effective."
  • "We would like to see Greenplum maintain a closer relationship with and parity to features implemented in PostgreSQL."

What is most valuable?

Of particular value to our environment and applications are the following Greenplum capabilities:

  1. Scalable (Massive) Parallel Processing (MPP) – The ability to bring to bear large amounts of compute against large data sets with Greenplum and the EMC DCA has proven itself to be very effective.
  2. Fast load of data into Greenplum – We experience performance of approximately 1TB per hour loading data to Greenplum without the use of specialized hardware.
  3. MADlib (madlib.net) – There are a number of statistical and analytical functions available within MADlib upon which we rely. Among these are linear regression, logistic regression, apriori, k-means, principle component analysis, etc.
  4. User Defined Functions in Python (UDFs in PL/Python) – Where MADlib does not provide a direct solution to an application problem, the ability to quickly prototype and deploy user defined functions with Python has been effective.

What needs improvement?

We would like to see Greenplum maintain a closer relationship with and parity to features implemented in PostgreSQL. The current version of Greenplum is based on a fork of PostgreSQL v8.2.15. This edition of PostgreSQL was EOL by the PostgreSQL project on Dec 2011. The current version of PostgreSQL is v9.5.

For how long have I used the solution?

We began production use in November, 2011. Alongside Greenplum, we're also using EMC Data Computing Appliance v2.3.3 (8/10), of which we have two and a half racks in production, and one and a quarter racks in dev/tests.

What was my experience with deployment of the solution?

We had no issues with the deployment.

What do I think about the stability of the solution?

The only issues with stability we’ve experience have been the sporadic fail over of primary to mirror segments. The environment continues to operate in this instance with the failure of queries that were in flight at the time of the fail-over.

What do I think about the scalability of the solution?

We have had no issues with scalability whatsoever.

How are customer service and technical support?

The service and support we’ve received from both Pivotal and EMC has been exemplary. The exceptions to this would be:

  1. The EMC Request for Product Qualification (RPQ) process – EMC DCA support is contingent upon EMC approval of all third party software installed onto a DCA. There have been times that this approval has taken as long as 60 days to process.
  2. Root Cause Analysis of Greenplum Database Incidents – When Greenplum Database incidents have occurred (e.g. primary database segments failing over to their backup), and Pivotal has been called for support, the response has been near immediate (30 minutes or less). Additionally, the incident resolution provided has been equally expedient. Where this has caused some disappointment is the response to our request for a root cause of the incident. These requests tend to queue up and we don’t seem to get answers beyond the typical vendor response of “that’s been fixed in the next release”.

Which solution did I use previously and why did I switch?

The purchase of Greenplum was our first interaction with Pivotal. We have been a customer of EMC for a very long time.

What other advice do I have?

My primary reason for reducing points on this rating is due to the fact that Greenplum is based on a fork of PostgreSQL v8.2.15 (EOL by the PostgreSQL project on Dec 2011). The current version of PostgreSQL is v9.5. There are a number of current PostgreSQL features of which we would like to take advantage (JSON support, materialized views, full text search, XML support, column-based permissions, row-based permissions, etc.).

Disclosure: I am a real user, and this review is based on my own experience and opinions.
MB
Statistician at a financial services firm with 1,001-5,000 employees
Real User
We were able to analyze and produce output on large volumes of data very quickly which saved us lots of time.

What is most valuable?

Greenplum is an MPP architecture database. Data can be distributed across multiple nodes and strong distribution will allow queries to execute on all segments at once, which is very powerful. As long as we have good SQL knowledge, we can start playing in the platform. Greenplum uses Postgres and ANSI Standard SQL. Also, it supports many other procedural languages, such as Python, C++, and Pearl.

How has it helped my organization?

Greenplum is a high powered, multi-node database that was chosen for its capacity to ingest and query data at extremely high rates of speed, enabling in Database Analytics and Statistical output on granular levels of data that was otherwise inaccessible before its deployment. We were able to analyze and produce output on large volumes of data very quickly which saved us lots of time (we used to wait for hours to get the same output). The management was able to get insights very quickly so that they can make informed decisions.

For how long have I used the solution?

I used Greenplum between Aug 2011 – Aug 2015. Almost all the members in the analytics team used Greenplum on a daily basis.

What do I think about the stability of the solution?

There were no issues, and it was doing what it was supposed to do.

What do I think about the scalability of the solution?

There were no issues, and it was doing what it was supposed to do.

How are customer service and technical support?

We had a pre-sales consultant who provided end-end solution about the product. Also, he was working with our data and clearly demonstrated the advantages of Greenplum. After we purchased the product, we were provided a full time consultant who had extensive knowledge about the product. He was primarily responsible for providing hands on experience on projects and also did excellent job of teaching everyone and bringing everyone up to the speed on the new platform. We also had a technical person offshore who was responsible for fixing things if something breaks up or any other issues.

Which solution did I use previously and why did I switch?

We did use other products in the company but it wasn’t an MPP architecture database. Our data was getting bigger so we needed something with MPP architecture to tackle big data challenges so Greenplum was considered. It was a management decision to purchase this product (not sure whether other similar products were considered or not)

What about the implementation team?

All the set up I believe was done by Greenplum team.

What was our ROI?

I didn't have any visibility on the pricing and licensing. But I can say that, we needed product like Greenplum to store, manage and analyze huge volumes of data which can be daunting task

What other advice do I have?

Greenplum is a MPP (massively parallel processing) database which is extremely fast. If people are dealing with very high volume of data, it is definitely a product to consider seriously

Disclosure: My company has a business relationship with this vendor other than being a customer: To my knowledge, we were one of the biggest customers in Canada, they were looking for our feedback to improve the product offerings.
ITCS user
Consultant at a financial services firm with 5,001-10,000 employees
Consultant
The MPP element is crucial, so far as it allows us to query millions of rows at a time, at speed.

What is most valuable?

The MPP element is crucial, so far as it allows us to query millions of rows at a time, at speed.

How has it helped my organization?

The previous data warehouse was built in Oracle. One of the things which has improved in GreenPlum is that we can query millions of rows at speed, without creating lags. We’ve also built far more views; slowly changing dimensions can instantaneously update without creating the issue of having to rebuild tables to reflect new hierarchies, for example.

What needs improvement?

We found some issues with larger tables that have daily data appended, where after a while this seems to create lag in the query speed. This might just have to do with local knowledge rather than the product itself.

We have a table which is currently contains 27.6m rows and has a daily delta added to it of roughly 16.5k rows per day. While this isn’t particularly large, we have noticed the table begins to perform poorly when queried, in spite of having set up a VACUUM process to be performed weekly. It may be that the VACUUM process needs to be performed more frequently (like daily), but we’ve not yet found the optimal way of maintaining this particular table.

It’s worth saying that this is one table out of over 400 perfectly well performant tables and views in the same database. Hope that helps,

For how long have I used the solution?

I have used for approximately 30 months.

What was my experience with deployment of the solution?

I have not encountered any deployment, stability or scalability issues.

How are customer service and technical support?

I have not raised any service issues/tech queries, so I can’t really say.

Which solution did I use previously and why did I switch?

We used Oracle previously. We based our choice on expertise in our US operation, where we have a GreenPlum expert who provided some amazing use case examples to help us in our selection process.

What about the implementation team?

Implementation was done in-house.

What was our ROI?

Not within my area I’m afraid, but I understand that this was a very good fit from an ROI point of view

What other advice do I have?

Investigate whether this solution works for you. It is worth creating a rating matrix to compare other similar products, and it is very useful to look deeply at whether the third-generation MPP software might be a good fit.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Barath Ravichander
Data Engineer at Broadridge Financial Solutions
Consultant
A good warehouser, compressor, and an in house ETL.

What is most valuable?

I've found that the database warehouse, data compression, and ETL to be the most valuable features for us.

How has it helped my organization?

Loading batch data has really improved the efficiency of our organization.

What needs improvement?

I'd like so see better scaling, better performance from in-memory databases, and a higher compression rate. We have been facing some performance issue when doing batch loading with optimizer the scaling does works fine. They are working on having optimization techniques which made me write room for improvement.

For how long have I used the solution?

I've used it for over two years. I have been working very closely with the EMC folks.

What was my experience with deployment of the solution?

Yes, at times, but it depends on your modeling and data retrieval.

What do I think about the stability of the solution?

It's been stable for us.

What do I think about the scalability of the solution?

Its scalability needs to be improved.

How are customer service and technical support?

I would rate technical support as good and there is not much technical expertise at the start of the SR.

Which solution did I use previously and why did I switch?

We tried other MPP’s.

How was the initial setup?

It was complex, but there was a change in the setup.

What about the implementation team?

We got support from the vendor at the start.

What other advice do I have?

If you want to implement this product, you would need to scale your product well before trying to implement.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Barath Ravichander
Data Engineer at Broadridge Financial Solutions
Consultant
Strong integration with Greenplum Servers.

What is most valuable?

Strong integration with Greenplum Servers.

How has it helped my organization?

Loading data to Greenplum server after batch processing.

For how long have I used the solution?

6 months

What was my experience with deployment of the solution?

Yes, issues with HA and issues in syncing with the Greenplum.

What do I think about the scalability of the solution?

You cannot expect a split second response.

How are customer service and technical support?

Customer Service: On a scale of 10 I would rate it as 7 Technical Support: On a scale of 10 I would rate it as 8

Which solution did I use previously and why did I switch?

This has just been used for POC period not for a regular use.

How was the initial setup?

Initial setup works absolutely well.…

What is most valuable?

Strong integration with Greenplum Servers.

How has it helped my organization?

Loading data to Greenplum server after batch processing.

For how long have I used the solution?

6 months

What was my experience with deployment of the solution?

Yes, issues with HA and issues in syncing with the Greenplum.

What do I think about the scalability of the solution?

You cannot expect a split second response.

How are customer service and technical support?

Customer Service:

On a scale of 10 I would rate it as 7

Technical Support:

On a scale of 10 I would rate it as 8

Which solution did I use previously and why did I switch?

This has just been used for POC period not for a regular use.

How was the initial setup?

Initial setup works absolutely well. We were using with only Greenplum.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Barath Ravichander
Data Engineer at Broadridge Financial Solutions
Consultant
Using Greenplum has given a good boost for bulk processing.

What is most valuable?

I've found that the data compression and ETL are the most valuable features for us.

In 4.3.8.1 Pivotal confirmed that even restoring schema level backup is possible from a DB backup.

- restoring schema from a DB level backup has been tested and working fine .

ORCA - the Pivotal Optimizer does a good query plan but does not works with all business logics. This needs to be tested based on your requirement.


How has it helped my organization?

Loading batch data has really improved the efficiency of our organization.

Running Extracts has drastically improved the timings. Being MPP which is a bulk operator - we were able to do 1.5 million calculation in 15 minutes.

What needs improvement?

Scaling of the solution needs to be improved.

HD connection is available where as, not to any file system.

Connecting Greenplum with Gemfire(In-Memory) to load, sync, and reconcile data would be really valuable.

For how long have I used the solution?

I've used it for nearly for 3 years

What was my experience with deployment of the solution?

We had deployment issues after installing new patches. Every new patches has some or other business hit where the release notes needs to be reviewed.

What do I think about the stability of the solution?

It's been stable for us.

How are customer service and technical support?

Customer Service:

They have a quick turn around but to dig into the actual information takes time, based on the Severity.

Technical Support:

First level of technical support would not be that effective (based on own observation).

Which solution did I use previously and why did I switch?

We were using Sybase and handling massive data, bulk operation was not possible.

How was the initial setup?

It was simple.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
Technology Architect at Broadridge Financial Solutions
Vendor
Valuable features for us: Append Only tables, data compression and bulk load and extraction using External Tables.

What is most valuable?

Append Only tables, data compression and bulk load and extraction using External Tables are very valuable features for us.

How has it helped my organization?

We have improved our quarterly statements turnaround dramatically and could sustain for increasing data.

What needs improvement?

With the ORCA optimizer the earlier Append-Only feature has been upgraded to Append-Optimized where now we can update the data on earlier Append-Only tables just like any other heap tables. But I found this has increased the time taken for Vacuum Analyze operation on these tables like from 10 mins to 1 hr + (on large tables). In our case we don't need an update on our Append Only tables and hence this became a drawback. VA on Append-Optimized tables need to be improved.

Backup & Restore performance need to be improved.

ORCA optimizer when turned on is not showing consistency. Some workloads shows improved performance and some workloads became very slow. This need to be improved for consistency.

For how long have I used the solution?

I have used it for about 4 years now.

What do I think about the stability of the solution?

Pre ORCA version was stable. ORCA release is not stable. Some workloads slowed down with new release even when the new optimizer is not turned ON.

How are customer service and technical support?

Tech support is average. They lack information about new features in the new releases and the possible impact of them.

Which solution did I use previously and why did I switch?

Earlier we were using OLTP based RDBMS solution. We realized we needed a OLAP solution and also something that can scale horizontally.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user371805
Senior Business Intelligence Developer at a tech services company with 501-1,000 employees
Consultant
With this solution, we've reduced load on the OLTP systems.

Valuable Features

The most valuable feature of Greenplum is the Massively Parallel Processing (MPP).

Improvements to My Organization

With this solution, we've reduced load on the OLTP systems.

Room for Improvement

The fact GreenPlum is using an older version of Postgres means developers coming from other products will find many missing features in PostgreSQL, features which you would assume are standard.

Greenplum is based on Postgres 8.2.15 which was released in 2009. While the SQL syntax and functionality has continued to evolve in other platforms in the ensuing years it appears to have stagnated in Greenplum.

Deployment Issues

We haven't had any issues with deployment.

Stability Issues

It's been stable for us.

Scalability Issues

It's scaled for our needs.

Customer Service and Technical Support

The community around GreenPlum is very small, making it difficult to learn from others experience via forums or blog posts.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
Technical Lead at a tech services company with 1,001-5,000 employees
Consultant
​Installation is very simple, make sure to set the configuration values based on the requirement.​

What is most valuable?

We can integrate the Hadoop with DCA V2. This will be huge development in the big data technologies.

How has it helped my organization?

It increased the read/write process because of it MPP architecture.

What needs improvement?

EMC already developed DCA V3, But if the hardware is little stable, I prefer DCA V2.

For how long have I used the solution?

I am from a support background, and have used this on multiple accounts, for the last four years.

What was my experience with deployment of the solution?

There have been no issues with the deployment.

What do I think about the stability of the solution?

Hardware failure is a concern.

What do I think about the scalability of the solution?

We have had no issues scaling it for our needs.

How are customer service and technical support?

Technical support is excellent.

Which solution did I use previously and why did I switch?

I know many customers are migrating from Oracle to Greenplum due to its faster processing.

How was the initial setup?

It is straightforward,open source system.

What about the implementation team?

Better chose EMC to perform the implementation. More over, it is not complex and we can do it easily in our environment with a little knowledge.

What's my experience with pricing, setup cost, and licensing?

Greenplum is an opensource system, but they do charge for support.

What other advice do I have?

Installation is very simple, make sure to set the configuration values based on the requirement.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user374739
Data Warehouse Specialist at a comms service provider with 1,001-5,000 employees
Vendor
The most valuable features are MPP and that it's polymorphic.​ ​It doesn't work as efficiently as we'd like because it requires more segment node capacity than we currently have.

What is most valuable?

The most valuable features are MPP and that it's polymorphic.

How has it helped my organization?

We have a set of workflow flows that takes 10 hours in Oracle Exadata, now it takes 4 hours with EMC Greenplum.

What needs improvement?

It doesn't work as efficiently as we'd like because it requires more segment node capacity (size, RAM, CPU) than we currently have.

For how long have I used the solution?

I've used it since 2013.

What was my experience with deployment of the solution?

At the moment, we don't have any issues with deployment.

What do I think about the stability of the solution?

At the moment, we don't have any issues with stability.

What do I think about the scalability of the solution?

At the moment, we don't have any issues with…

What is most valuable?

The most valuable features are MPP and that it's polymorphic.

How has it helped my organization?

We have a set of workflow flows that takes 10 hours in Oracle Exadata, now it takes 4 hours with EMC Greenplum.

What needs improvement?

It doesn't work as efficiently as we'd like because it requires more segment node capacity (size, RAM, CPU) than we currently have.

For how long have I used the solution?

I've used it since 2013.

What was my experience with deployment of the solution?

At the moment, we don't have any issues with deployment.

What do I think about the stability of the solution?

At the moment, we don't have any issues with stability.

What do I think about the scalability of the solution?

At the moment, we don't have any issues with scalability.

How are customer service and technical support?

Customer Service:

8/10

Technical Support:

8/10

Which solution did I use previously and why did I switch?

Yes, we have Oracle Exadata.

How was the initial setup?

The initial setup is straightforward. It was simple for our personnel to install on the Unix system and network.

What about the implementation team?


What was our ROI?

It's a good product when comparing the price to its functionalities.

What other advice do I have?

You should consider the capacity of your data before you buy it.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user374715
BI Data Engineer at a tech vendor with 51-200 employees
Vendor
It has HA built-in with mirroring and many other tuning features that make it highly configurable.

What is most valuable?

  • Parallel Processing and Data Distribution based architecture.
  • HA built-in with mirroring.
  • Highly configurable and lots of tuning features.

How has it helped my organization?

  • This has helped us bring down our end-to-end EDW load time to 1/3 the time.
  • It has enabled faster and efficient data analysis.
  • Scalable environment without adding too much cost.

What needs improvement?

  • It needs a much more robust and user friendly monitoring and management front-end tool.
  • More stability and auto-recovery with the segments.
  • Report generations on system health and recommendations.

For how long have I used the solution?

I've used it for two years.

What was my experience with deployment of the solution?

Up to now, we've had no issues with deployment.

What do I think about the stability of the solution?

Up to now, we've had no issues with stability.

What do I think about the scalability of the solution?

Up to now, we've had no issues with scalability.

How are customer service and technical support?

The response is fairly good but would like more support from the R&D on more complex issues. Also, they need to ensure there are logs that can be used without causing any downtimes to the system for any case analysis.

Which solution did I use previously and why did I switch?

It's a highly efficient and faster DB with lot of features at much less cost to that of other MPP DB’s evaluated.

How was the initial setup?

It was complex as we have to code convert everything into GP functions so as to best be able to use the GP parallel. Pushdown feature was not available via Informatica. The initial parameter setup took quite some time to test to get the sweet spot for performance.

What about the implementation team?

In-house with vendor support.

What other advice do I have?

Make sure you have the designs, approaches and architecture in place before kicking of the implementation. Its best to have someone involved with prior migration experience.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user373128
Data Architect & ETL Lead at a financial services firm with 1,001-5,000 employees
Vendor
Processing speed of queries used for ‘Reporting’ solutions is the most valuable feature.

Valuable Features:

Processing speed of queries used for ‘Reporting’ solutions is the most valuable feature.

Improvements to My Organization:

Not Applicable for the area I was responsible for, as we ended up migrating away from Greenplum.

Room for Improvement:

Stability and scalability for large number of concurrent applications & their users. The results we got were very inconsistent, depending on number of connections taken up by multiple applications and users.

When our application was first deployed using Greenplum, the number of users of the rrack on which Greenplum was deployed was very limited. We got excellent query performance results at that time. But as more applications started getting deployed, we started getting very inconsistent performance results. Sometimes the queries would run in sub-seconds, and sometimes same queries would run 10 times longer. The reason we found this was that Greenplum limits the number of active concurrent connections. Once all connections are being used, any new query gets queued, and thus response time suffers.

The impression we got was that the EMC Sales team that sold Greenplum to the organization did a great job. But later on the ball was dropped when it came to educating on which type of applications are suitable to Greenplum , and how to configure it to get optimal performance. When Pivotal took over support of Greenplum, their consultant visited us to go over the issues we were having. He advised us that Greenplum is not the best environment for our application needs. We ended up migrating our application out of Greenplum, along with a few other applications.

Deployment Issues:

There was no issue with the deployment.

Stability Issues:

There were issues with the stability.

Scalability Issues:

There were issues with the scalability.

Other Advice:

Ensure that this is the right tool for your needs. For instance, Greenplum is not the best tool for cases where data has to be kept up to date in real time. Capacity planning is key to success, once you do decide it is the right tool for you.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user372360
Sr. Software Engineer at a financial services firm with 1,001-5,000 employees
Vendor
It provides file loaders to reduce dependency on Informatica.

Valuable Features

  • Distributed data for performance
  • File loaders to reduce dependency on Informatica

Improvements to My Organization

Batch processing times have dramatically decreased from over 12 hours to under three hours. Much of this is converting off of Informatica and using distributed processing. Reporting performance also improved greatly.

Room for Improvement

Since we are upgrading to a new version at this time, it’s hard to say. But we seem to be replacing a disk on the appliance every week.

Use of Solution

I've used it for six years.

Deployment Issues

There's no issues, although we are currently in the middle of an upgrade to v4.3.5.

Stability Issues

It seems like we are replacing a disk on the appliance every week. Not a noticeable issue for users and batch processing is not adversely impacted.

Customer Service and Technical Support

Very good. Response times are good for service calls.

Initial Setup

I believe the set-up was straightforward. I don’t remember any issues.

Implementation Team

We used a vendor team. My advice is that the planning is critical. When converting processing jobs, convert them to PostgreSQL for better performance. Yes they could work as written previously, but you benefit using the features of the product immediately if you adjust the processing to match appliance. Use the bulk processing to your advantage.

Other Solutions Considered

We evaluated other solutions. Not sure of the reasons why this product was chosen.

Other Advice

It’s a good product. We moved from Oracle and I don’t want to go back.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user372336
Senior Technical Engineer at a tech company with 1,001-5,000 employees
MSP
The loaders – their methods, insert, update and merge – were most valuable.

What is most valuable?

The Greenplum loaders – their methods, insert, update and merge – were most valuable. The external table feature within Greenplum was also very valuable in achieving performance and scalability

How has it helped my organization?

A lot of customers use Greenplum and for their ETL use cases they use Informatica PWX for Greenplum, which helps them achieve transfer of data from a number of data sources into Greenplum. We can help customers to load data effortlessly and quickly.

What needs improvement?

It would be best if Greenplum would support array writes through ODBC drivers . Currently through Data Direct Greenplum drivers we can have single row inserts into Greenplum only. It would help if array loading is supported.

For how long have I used the solution?

I have been using Greenplum loaders to load data into GreenPlum for three years.

What was my experience with deployment of the solution?

There were no issues in deploying it.

What do I think about the stability of the solution?

The external table feature helps to make this a stable solution.

What do I think about the scalability of the solution?

It scales to our and our customers needs as required.

How are customer service and technical support?

The ongoing response from their support on certain technical issues has been slow. It would help if we can have a faster turn-around here.

Which solution did I use previously and why did I switch?

There was no other solution used previously.

How was the initial setup?

The installation and configuration is straightforward.

What about the implementation team?

We implemented through a in-house team . The overall solution will be useful and is not complex

What other advice do I have?

If the use case is going to be requiring a huge data transfer and big data analytics. This is a good product to use.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user371898
Sr ETL Developer at a financial services firm with 1,001-5,000 employees
Vendor
It only takes minutes to process millions of record. The bug fixes come as many patches like a start up instead of having scheduled release with proper improvements.

What is most valuable?

  • Parallel processing
  • Takes minutes to process millions of records

How has it helped my organization?

This has improved our daily load process reducing the run time at least by three to four hours which made other departments within the organization to look for data from the Enterprise Data Warehouse.

What needs improvement?

It acts like a mainstream product not a novice any more. There are a lot of areas that can be improved. The bug fixes come as many patches like a start up instead of having scheduled release with proper improvements.

For how long have I used the solution?

I've used it for five years.

What was my experience with deployment of the solution?

It took a while to fully take advantage of this as we had to come up with lots proprietary solutions when linking to other products.

What do I think about the stability of the solution?

There is no issue with its stability.

What do I think about the scalability of the solution?

There is no issue scaling it.

How are customer service and technical support?

8/10 we had a technician come out on a new year in 2012 to fix some hardware failure.

Which solution did I use previously and why did I switch?

We were one of pioneers in adopting this solution. So we got the best deal when compared to the competitors in several areas.

How was the initial setup?

The initial move was a bulk transfer from the old system to new Greenplum based solutions. It was all done by Greenplum contractors. But to get it working with other products was challenging.

What about the implementation team?

Greenplum employees developed and supported the initial move. Later they became remote consultants with support through phone and in-person as needed.

What was our ROI?

They gave a really good deal and we have been with them for five years even though the product got bought over by couple of different companies.

What other advice do I have?

You need strong DBAs and architects to support the initial transfer.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user371766
Sr. Data Operation Engineer at a marketing services firm with 501-1,000 employees
Vendor
MPP architecture is important to process data in such volume.​ Better integration with big data tech stack is needed.

What is most valuable?

MPP (Massive Parallel Processing); processing large amount of data.

How has it helped my organization?

We process billions of rows of data every hour; MPP architecture is important to process data in such volume.

What needs improvement?

  • Better integration with big data tech stack
  • Scalability: for example, system schema (pg_catalog) is one bottleneck for scalability

For how long have I used the solution?

I've used it for four to five years.

What do I think about the scalability of the solution?

The system has some bottleneck, like system schema; some commands (batch loading) is bottlenecked by master DB architecture

How are customer service and technical support?

Customer Service:

It's very poor.

Technical Support:

It's very poor.

Which solution did I use previously and why did I switch?

Used Greenplum originally. We have evaluated other products; the final decision is based on ROI

How was the initial setup?

Products was installed by vendor; all ETL scripts and schema are designed and implemented in-house.

What about the implementation team?

The product was installed by vendor; all ETL scripts and schema are designed and implemented in-house.

What's my experience with pricing, setup cost, and licensing?

Greenplum has an open source version now.

What other advice do I have?

Scalability is one major concern; once data reach certain level; the performance is dropped and much more issues will be triggered, like disk error; out of memory; etc. Be sure that you proper scope your need (including growth) before the decision of system and its size.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user371724
DB Manager at a marketing services firm with 501-1,000 employees
Vendor
​The reports are running very fast, a matter of minutes instead of hours as it was previously.

What is most valuable?

The MPP – fast aggregation and that I still reuse the PL/SQL code.

How has it helped my organization?

The reports are running very fast, a matter of minutes instead of hours as it was previously.

What needs improvement?

It should support more feature that do exist on Postgres – like JSON datatype.

For how long have I used the solution?

I've used it for three years.

What was my experience with deployment of the solution?

Not more than any new DB – not something particular.

What do I think about the stability of the solution?

Not more than any new DB – not something particular.

What do I think about the scalability of the solution?

Not more than any new DB – not something particular.

How are customer service and technical support?

Customer Service:

It's good.

Technical Support:

It's good.

Which solution did I use previously and why did I switch?

I used Oracle Enterprise edition – moving to Greenplum due to the MPP (it’s like Map-Reduce). I prefer a SQL solution rather Hadoop or whatever – especially if you’re moving from RDBMS. Now I’m also using Amazon Redshift from the same reason and also we don’t need to manage any server, it’s only a service and this is our best option especially because we’re already using Amazon.

How was the initial setup?

It's straightforward.

What about the implementation team?

In-house.

What other advice do I have?

Read the documentation and always perform tests.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user371457
IT Consultant at a retailer with 10,001+ employees
Vendor
There are many valuable features, such as parallel loading and the solution's scalability.

What is most valuable?

There are many valuable features, such as parallel loading and the solution's scalability.

How has it helped my organization?

It's allowed us to do a lot of data analytics with it that we weren't able to do before.

What needs improvement?

The performance needs to be improved.

For how long have I used the solution?

As a whole, two years on multiple versions.

What was my experience with deployment of the solution?

We've had no issues with deployment.

What do I think about the stability of the solution?

It's stable, but slowness is an issue.

What do I think about the scalability of the solution?

It's scaled find for us.

How are customer service and technical support?

Customer Service: Customer service is OK. Technical Support: Technical…

What is most valuable?

There are many valuable features, such as parallel loading and the solution's scalability.

How has it helped my organization?

It's allowed us to do a lot of data analytics with it that we weren't able to do before.

What needs improvement?

The performance needs to be improved.

For how long have I used the solution?

As a whole, two years on multiple versions.

What was my experience with deployment of the solution?

We've had no issues with deployment.

What do I think about the stability of the solution?

It's stable, but slowness is an issue.

What do I think about the scalability of the solution?

It's scaled find for us.

How are customer service and technical support?

Customer Service:

Customer service is OK.

Technical Support:

Technical support is OK.

Which solution did I use previously and why did I switch?

No solution was used previously.

What about the implementation team?

Implementation was done by a vendor team.

What's my experience with pricing, setup cost, and licensing?

Pricing is pretty much OK compared to others.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user369321
Senior Director & Global Lead, Big Data Center of Excellence at a pharma/biotech company with 10,001+ employees
Vendor
The loading and transformation of large data sets is valuable.

Valuable Features:

Processing speed – especially loading and transformation of large data sets.

Improvements to My Organization:

Before we implemented Greenplum, our weekly data loads (for third party marketing data sets) were taking over three days. (We also had some monthly data that could take up to 3 days to load and transform via Informatica.) After we implemented Greenplum, the loads were reduced to less than nine hours. Previously, we were receiving data early Wed a.m. and not getting out to the salesforce (if we were lucky) until noon on the following Monday. Now we get the data to the field early Friday mornings before they wake up.

Room for Improvement:

The Greenplum appliance itself has had some reliability issues, so it would be great if that could be improved in the next version. More critical, though, is that the latest devices are not backward compatible. i.e., We have to replace our entire environment to upgrade. That’s quite an expense. I would hope they could improve the upgrade roadmap in the future.

Implementation Team:

We have used EMC Consulting for some projects, and we have lots of EMC storage.

Other Advice:

If you can, do a benchmark with other MPP options including cloud alternatives. Although our Greenplum implementation was very successful (going on 4 years ago), I wish we had benchmarked against Teradata and Netezza (now IBM) at least. Today, I would consider not even buying hardware… just doing it all in the cloud.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user259983
Lead Consultant with 501-1,000 employees
Vendor
Session management needs improvement, but performs well with larger data volumes because it uses massively parallel processing.

What is most valuable?

The performance is the most valuable.

How has it helped my organization?

There is only a fraction of performance tuning that you need to be careful about, compared to Oracle DB.

What needs improvement?

Session management for client tools needs work.

For how long have I used the solution?

I've used it for nine months.

How are customer service and technical support?

Customer Service:

Nothing to complain about.

Technical Support:

Nothing to complain about.

Which solution did I use previously and why did I switch?

Previously, there was Oracle database, and the EMC Greenplum database was used for better performance from larger data volumes because it uses massively parallel processing. This increased the performances multiple times while reducing time needed for optimization of database queries.

What about the implementation team?

We had a team from EMC, who had a very high level of expertise.

What other advice do I have?

It's a very good product for reducing the time and man power needed for database optimization for datawarehousing purposes.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ITCS user
Technical Lead with 1,001-5,000 employees
Vendor
Needs more stability in terms of query results but we have improved our query response time.

What is most valuable?

External tables to load the data from flat files.

How has it helped my organization?

It helped us in loading millions of data in quick time & improved query response.

What needs improvement?

More stability in terms of query result.

For how long have I used the solution?

Five years - DCA V2.0 EMC & Greenplum Database

What was my experience with deployment of the solution?

No.

What do I think about the stability of the solution?

Not usually, but sometimes we do.

What do I think about the scalability of the solution?

No issues encountered.

How are customer service and technical support?

Customer Service: Excellent. Technical Support: Excellent.

Which solution did I use previously and why did I switch?

We applied MPP Appliance for the…

What is most valuable?

External tables to load the data from flat files.

How has it helped my organization?

It helped us in loading millions of data in quick time & improved query response.

What needs improvement?

More stability in terms of query result.

For how long have I used the solution?

Five years - DCA V2.0 EMC & Greenplum Database

What was my experience with deployment of the solution?

No.

What do I think about the stability of the solution?

Not usually, but sometimes we do.

What do I think about the scalability of the solution?

No issues encountered.

How are customer service and technical support?

Customer Service:

Excellent.

Technical Support:

Excellent.

Which solution did I use previously and why did I switch?

We applied MPP Appliance for the first time earlier we were running on SMP’s.

How was the initial setup?

It was straightforward.

What about the implementation team?

Vendor- they had good expertise.

What was our ROI?

Good.

Which other solutions did I evaluate?

We also looked at Teradata.

What other advice do I have?

Very good and cost effective with very good customer support.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user76890
Engineer at a marketing services firm with 51-200 employees
Vendor
We use Greenplum for production but not for development and testing because it consumes too much resources
We use the scrum methodology to manage our engineering work. And while this does give us the flexibility to quickly respond to changing business needs, it also means that our databases’ schemas are not set in stone. To accomodate frequent changes to the way we structure our data, we’ve adapted Ruby on Rails database migrations for use beyond our Rails apps. Our first challenge was to find and extract the migration’s functionality from Rails so that we could use it without needing to bring in the other features of Rails we didn’t need. Thankfully, this turned out to be relatively simple because almost all of the functionality can already be accessed by Rake tasks, so it was just a matter of building a Rake file that contains the tasks we wanted from Rails. The only…

We use the scrum methodology to manage our engineering work. And while this does give us the flexibility to quickly respond to changing business needs, it also means that our databases’ schemas are not set in stone. To accomodate frequent changes to the way we structure our data, we’ve adapted Ruby on Rails database migrations for use beyond our Rails apps.

Our first challenge was to find and extract the migration’s functionality from Rails so that we could use it without needing to bring in the other features of Rails we didn’t need. Thankfully, this turned out to be relatively simple because almost all of the functionality can already be accessed by Rake tasks, so it was just a matter of building a Rake file that contains the tasks we wanted from Rails. The only thing we had to add to get started was a Rake task for creating new migrations, since Rails creates them either automatically when creating new model objects or through the `rails generate migration` command.

We quickly ran into problems, though, because we use Greenplum, a DBMS built on Postgres that adds support for features that help accomodate big data. Unlike standard Postgres, Greenplum adds features, such as table partitions, not found in most DBMSes, but Rails is designed to be DBMS agnostic so you can easily switch between, say, Postgres and MySQL and SQLite without any more trouble than changing a single configuration file. So we decided to cut around Rails’s database abstractions and instead directly write SQL in our migrations and dump SQL structure files rather than Rails schema files.

Only this didn’t work because, to complicate matters further, although we use Greenplum in production and staging environments, for local testing and development we use Postgres because Greenplum has minimum requirements that ended up consuming too much of our workstations’ resources. So we ultimately ended up developing a hybrid solution that allows us to support Greenplum and Postgres in the same migrations.

The two main aspects of the solution are selective application of options when making changes to the database and keeping two separate dumps of the database, one for Greenplum with Greemplum-only syntax and one for Postgres with Greenplum-only syntax removed. For example, we rewrote the Rails `create_table` migration function to add support for Greenplum options like partitions, data distribution, and append-only tables, but then have the function ignore those options when running the migrations against our development Postgres databases. This allows us to use a single set of migrations on all of our databases, drastically simplifying what could have otherwise been a gnarly challenge.

By no means is our solution complete or perfect. It only supports those features of Greenplum that we use, and new features only get added in as we need them, plus there is some risk associated with keeping two separate schema dumps and the potential for them to get out-of-sync. Still, compared to maintaining a database with frequently changing schemas without the aid migrations, it’s lightyears better.

Disclosure: I am a real user, and this review is based on my own experience and opinions.