Tuesday, September 29, 2009

operations sanity for your databases

Permit me to put on an operations hat for a moment and provide budding developers with some thoughts on how to keep out of trouble in your database environment. Here are some rules that I've found to be useful:

1. track database changes (versions etc.) in the database.

2. make all database changes backward compatible.

3. avoid triggers.

4. use referential integrity sparingly.

Now I'm sure some eyebrows will be raised by the last two recommendations. After all, how can this possibly be related to the operation of your production database. Let's look at these one at a time.

1. track database changes. I have found that trying to figure out what version, release, etc. is in my database is usually hard. You normally can't just look at a table and say, "oh, yea, that's release 2.01". An easy way to fix this is to incorporate a small table in each database (or schema) that tracks the changes. Here's the table (using a mysql format) I've used with some success.

create table db_history (

id int not null auto_increment,

version varchar(20) not null,

backed_out datetime null,

created datetime not null,

comment varchar(50) null,

primary key (id))

When I install a new release I automatically do an insert into this table to note the release installation. In the event I back it out, then I update the backed_out column. Quick and easy and any time I want to see what's been installed I simply do a select * from db_history. I can even use this table to document non-release type changes that might be made in the database simply by adding a comment.

For those of you using ruby-on-rails, they automatically implement a version table but the data contained there is pretty minimal. Of course you can backtrack from the version information to the sql but even in this environment I've found my table to be of more help.

2. make all database changes backward compatible. If I deploy a new version of my favorite web application, and later decide it wasn't quite up to snuff, then backing it out is usually as easy as moving a symlink or downloading a new tar ball. Unfortunately, this may be impossible or impractical in the case of a database change. For example, changing a column type or dropping a column can easily make your application fail. If the change requires an hour to apply to your large table, then it can easily take that long or longer to back it out. The rules to avoid this issue are pretty simple:

a. new columns and new tables should never break your code.

b. changed columns should be added and referenced by the new name but retain the old column for backward compatability and remove it in the next release.

c. dropped columns should only be removed after the second release which no longer uses them.

d. if a procedure or function has new parameters, in or out, then create a new procedure or function (e.g., foo_v2) and keep the old procedure or function around.

3. avoid triggers. Usually the functionality of a trigger can be duplicated in a procedure. Then when there is a problem, the code is right in front of the dba trying to debug the issue. This is a religious discussion and many people will disagree but I have found at 2:00 AM when things are broke that I can easily overlook a trigger.

4. use referential integrity sparingly. Again, this is a religious argument and some will disagree. But in my experience I have found many cases where an unanticipated table scan due to some RI goes un-noticed during development but haunts you in production. Sometimes RI cannot be avoided but by using it sparingly you can save yourself lots of late night head-scratching wondering why you are suddenly scanning our favorite 1M row table. Usually a check for existance of a key before an insert is all that's needed.

Have fun.