The Migrationless Migration
I’ve been reading the excellent blog Revision-Zero lately, and just a whole bunch of other relational algebra stuff. If I may impose upon you for a minute, relational algebra is one of those things that, like linear algebra, is a shining intellectual gem. It is at once pretty in its own right but also tremendously practically useful. I strongly suggest that any working programmer should acquaint themselves with it. You should also learn about linear algebra, but that’s another article!
In programming we spend a lot of time concerning ourselves with modularity. This quest for modularity is why we divide our code up into lines, blocks, modules, libraries of modules, and even whole programs. It is a solid engineering practice to encapsulate and then “black-boxify” that encapsulated module. Also, you should note that I’m talking aboutmodularity and module in the most general sense. I’m not referring to any specific language-supplied code organization features (e.g. Java packages).
The article above talks about something that I had never considered which is logical data independence. That is, in the same way that changing the internals of one class shouldn’t affect a different class, changing the internals of the schema shouldn’t affect the application. What this means in Rails terms is, could you migrate your Rails databasewithout having to change any application code?
Let’s say that I have a rails schema something like this:
ActiveRecord::Schema.define(version: 20140115195300) do create_table "suppliers", force: true do |t| t.string "sname" t.string "city" t.integer "status" t.datetime "created_at" t.datetime "updated_at" end end
At this point let’s say that it is decided that “location” is a better term than “city.” And I add a migration like the following:
class RenameCityToLocation < ActiveRecord::Migration def change rename_column :suppliers, :city, :location end end
Change made! We’ll have to go and find any uses of
city and update them to use the new argot of the application,
Modularity at the data layer
And that’s just how I thought it must be. But what if, like a private method, we could contain our changes to the database within the database? This is what the idea of logical data independence is getting at. Changes can be made to the schema independently of the application’s code. We ought to be able to run some kind of migration and yet present an identical API to the application.
Let’s check this out:
class RenameSuppliersTableToSuppliersV1 < ActiveRecord::Migration def up rename_table :suppliers, :suppliers_v1 rename_column :suppliers_v1, :city, :location execute <<-SQL CREATE VIEW suppliers AS SELECT id, sname, status, created_at, updated_at, location AS city FROM suppliers_v1 SQL end def down execute 'DROP VIEW suppliers' rename_table :suppliers_v1, :suppliers rename_column :suppliers, :location, :city end end
There are a couple of moving parts, but the gist is simple. First, we rename the
suppliers table to a whole new table! The new table,
suppliers_v1 has a new column,
location, rather than
city. Our next trick is that we create a
VIEW that looks just like the old
suppliers table. The particularly fun part (for me) is the
location AS citywhich renames the offending column.
But “hey”, you may be saying to yourself, “that’s fine to query the database, but it’ll blow up once we run a
.save or something like that.” Not necessarily. In the above example, I’m using PostgreSQL, which supports a limited form of what’s known as an updatable view. There are a few conditions that need to be met for PostgreSQL to be able to, in effect reverse the column rename, but our example meets them all.
Think about what we’ve done here. The application will be entirely unaware that any underlying database change has occurred!
The example that I gave above is a bit contrived and there are some valid concerns about muddying the schema. After these changes, you’ll be left with a
suppliers_v1 table and a
suppliers view. This is not a faithful representation of the application’s data. However, I think it still illustrates the point. What if the database were decoupled from the beginning? What if what the application always saw were just views? In that case, we could present a nice API to the app while keeping the DB just how we like it. We’d need a really good RDMS to keep the updatable view abstraction from being leaky, but otherwise it seems intriguing.