What is most valuable?
Of particular value to our environment and applications are the following Greenplum capabilities:
- Scalable (Massive) Parallel Processing (MPP) – The ability to bring to bear large amounts of compute against large data sets with Greenplum and the EMC DCA has proven itself to be very effective.
- Fast load of data into Greenplum – We experience performance of approximately 1TB per hour loading data to Greenplum without the use of specialized hardware.
- MADlib (madlib.net) – There are a number of statistical and analytical functions available within MADlib upon which we rely. Among these are linear regression, logistic regression, apriori, k-means, principle component analysis, etc.
- User Defined Functions in Python (UDFs in PL/Python) – Where MADlib does not provide a direct solution to an application problem, the ability to quickly prototype and deploy user defined functions with Python has been effective.
What needs improvement?
We would like to see Greenplum maintain a closer relationship with and parity to features implemented in PostgreSQL. The current version of Greenplum is based on a fork of PostgreSQL v8.2.15. This edition of PostgreSQL was EOL by the PostgreSQL project on Dec 2011. The current version of PostgreSQL is v9.5.
For how long have I used the solution?
We began production use in November, 2011. Alongside Greenplum, we're also using EMC Data Computing Appliance v2.3.3 (8/10), of which we have two and a half racks in production, and one and a quarter racks in dev/tests.
What was my experience with deployment of the solution?
We had no issues with the deployment.
What do I think about the stability of the solution?
The only issues with stability we’ve experience have been the sporadic fail over of primary to mirror segments. The environment continues to operate in this instance with the failure of queries that were in flight at the time of the fail-over.
What do I think about the scalability of the solution?
We have had no issues with scalability whatsoever.
How are customer service and technical support?
The service and support we’ve received from both Pivotal and EMC has been exemplary. The exceptions to this would be:
- The EMC Request for Product Qualification (RPQ) process – EMC DCA support is contingent upon EMC approval of all third party software installed onto a DCA. There have been times that this approval has taken as long as 60 days to process.
- Root Cause Analysis of Greenplum Database Incidents – When Greenplum Database incidents have occurred (e.g. primary database segments failing over to their backup), and Pivotal has been called for support, the response has been near immediate (30 minutes or less). Additionally, the incident resolution provided has been equally expedient. Where this has caused some disappointment is the response to our request for a root cause of the incident. These requests tend to queue up and we don’t seem to get answers beyond the typical vendor response of “that’s been fixed in the next release”.
Which solution did I use previously and why did I switch?
The purchase of Greenplum was our first interaction with Pivotal. We have been a customer of EMC for a very long time.
What other advice do I have?
My primary reason for reducing points on this rating is due to the fact that Greenplum is based on a fork of PostgreSQL v8.2.15 (EOL by the PostgreSQL project on Dec 2011). The current version of PostgreSQL is v9.5. There are a number of current PostgreSQL features of which we would like to take advantage (JSON support, materialized views, full text search, XML support, column-based permissions, row-based permissions, etc.).