The Migrationless Migration

I’ve been reading the excellent blog Revision-Zero lately, and just a whole bunch of other relational algebra stuff. If I may impose upon you for a minute, relational algebra is one of those things that, like linear algebra, is a shining intellectual gem. It is at once pretty in its own right but also tremendously practically useful. I strongly suggest that any working programmer should acquaint themselves with it. You should also learn about linear algebra, but that’s another article!

Modularity

In programming we spend a lot of time concerning ourselves with modularity. This quest for modularity is why we divide our code up into lines, blocks, modules, libraries of modules, and even whole programs. It is a solid engineering practice to encapsulate and then “black-boxify” that encapsulated module. Also, you should note that I’m talking aboutmodularity and module in the most general sense. I’m not referring to any specific language-supplied code organization features (e.g. Java packages).

The article above talks about something that I had never considered which is logical data independence. That is, in the same way that changing the internals of one class shouldn’t affect a different class, changing the internals of the schema shouldn’t affect the application. What this means in Rails terms is, could you migrate your Rails databasewithout having to change any application code?

An example

Let’s say that I have a rails schema something like this:

ActiveRecord::Schema.define(version: 20140115195300) do
  create_table "suppliers", force: true do |t|
    t.string    "sname"
    t.string    "city"
    t.integer   "status"
    t.datetime  "created_at"
    t.datetime  "updated_at"
  end
end

At this point let’s say that it is decided that “location” is a better term than “city.” And I add a migration like the following:

class RenameCityToLocation < ActiveRecord::Migration
  def change
    rename_column :suppliers, :city, :location
  end
end

Change made! We’ll have to go and find any uses of city and update them to use the new argot of the application,location.

Modularity at the data layer

And that’s just how I thought it must be. But what if, like a private method, we could contain our changes to the database within the database? This is what the idea of logical data independence is getting at. Changes can be made to the schema independently of the application’s code. We ought to be able to run some kind of migration and yet present an identical API to the application.

Let’s check this out:

class RenameSuppliersTableToSuppliersV1 < ActiveRecord::Migration
  def up
    rename_table :suppliers, :suppliers_v1
    rename_column :suppliers_v1, :city, :location
    execute <<-SQL
      CREATE VIEW suppliers AS
      SELECT id, sname, status, created_at, updated_at, location AS city
      FROM suppliers_v1
    SQL
  end

  def down
    execute 'DROP VIEW suppliers'
    rename_table :suppliers_v1, :suppliers
    rename_column :suppliers, :location, :city
  end
end

There are a couple of moving parts, but the gist is simple. First, we rename the suppliers table to a whole new table! The new table, suppliers_v1 has a new column, location, rather than city. Our next trick is that we create aVIEW that looks just like the old suppliers table. The particularly fun part (for me) is the location AS citywhich renames the offending column.

But “hey”, you may be saying to yourself, “that’s fine to query the database, but it’ll blow up once we run a .save or something like that.” Not necessarily. In the above example, I’m using PostgreSQL, which supports a limited form of what’s known as an updatable view. There are a few conditions that need to be met for PostgreSQL to be able to, in effect reverse the column rename, but our example meets them all.

Think about what we’ve done here. The application will be entirely unaware that any underlying database change has occurred!

The example that I gave above is a bit contrived and there are some valid concerns about muddying the schema. After these changes, you’ll be left with a suppliers_v1 table and a suppliers view. This is not a faithful representation of the application’s data. However, I think it still illustrates the point. What if the database were decoupled from the beginning? What if what the application always saw were just views? In that case, we could present a nice API to the app while keeping the DB just how we like it. We’d need a really good RDMS to keep the updatable view abstraction from being leaky, but otherwise it seems intriguing.


Category: Development
Tags: Databases