Pivotal Greenplum Review
We use Greenplum for production but not for development and testing because it consumes too much resources


We use the scrum methodology to manage our engineering work. And while this does give us the flexibility to quickly respond to changing business needs, it also means that our databases’ schemas are not set in stone. To accomodate frequent changes to the way we structure our data, we’ve adapted Ruby on Rails database migrations for use beyond our Rails apps.

Our first challenge was to find and extract the migration’s functionality from Rails so that we could use it without needing to bring in the other features of Rails we didn’t need. Thankfully, this turned out to be relatively simple because almost all of the functionality can already be accessed by Rake tasks, so it was just a matter of building a Rake file that contains the tasks we wanted from Rails. The only thing we had to add to get started was a Rake task for creating new migrations, since Rails creates them either automatically when creating new model objects or through the `rails generate migration` command.

We quickly ran into problems, though, because we use Greenplum, a DBMS built on Postgres that adds support for features that help accomodate big data. Unlike standard Postgres, Greenplum adds features, such as table partitions, not found in most DBMSes, but Rails is designed to be DBMS agnostic so you can easily switch between, say, Postgres and MySQL and SQLite without any more trouble than changing a single configuration file. So we decided to cut around Rails’s database abstractions and instead directly write SQL in our migrations and dump SQL structure files rather than Rails schema files.

Only this didn’t work because, to complicate matters further, although we use Greenplum in production and staging environments, for local testing and development we use Postgres because Greenplum has minimum requirements that ended up consuming too much of our workstations’ resources. So we ultimately ended up developing a hybrid solution that allows us to support Greenplum and Postgres in the same migrations.

The two main aspects of the solution are selective application of options when making changes to the database and keeping two separate dumps of the database, one for Greenplum with Greemplum-only syntax and one for Postgres with Greenplum-only syntax removed. For example, we rewrote the Rails `create_table` migration function to add support for Greenplum options like partitions, data distribution, and append-only tables, but then have the function ignore those options when running the migrations against our development Postgres databases. This allows us to use a single set of migrations on all of our databases, drastically simplifying what could have otherwise been a gnarly challenge.

By no means is our solution complete or perfect. It only supports those features of Greenplum that we use, and new features only get added in as we need them, plus there is some risk associated with keeping two separate schema dumps and the potential for them to get out-of-sync. Still, compared to maintaining a database with frequently changing schemas without the aid migrations, it’s lightyears better.

Disclosure: I am a real user, and this review is based on my own experience and opinions.

Add a Comment

Guest
Why do you like it?

Sign Up with Email