Tuesday, September 29, 2009

operations sanity for your databases

Permit me to put on an operations hat for a moment and provide budding developers with some thoughts on how to keep out of trouble in your database environment. Here are some rules that I've found to be useful:

1. track database changes (versions etc.) in the database.

2. make all database changes backward compatible.

3. avoid triggers.

4. use referential integrity sparingly.

Now I'm sure some eyebrows will be raised by the last two recommendations. After all, how can this possibly be related to the operation of your production database. Let's look at these one at a time.

1. track database changes. I have found that trying to figure out what version, release, etc. is in my database is usually hard. You normally can't just look at a table and say, "oh, yea, that's release 2.01". An easy way to fix this is to incorporate a small table in each database (or schema) that tracks the changes. Here's the table (using a mysql format) I've used with some success.

create table db_history (

id int not null auto_increment,

version varchar(20) not null,

backed_out datetime null,

created datetime not null,

comment varchar(50) null,

primary key (id))

When I install a new release I automatically do an insert into this table to note the release installation. In the event I back it out, then I update the backed_out column. Quick and easy and any time I want to see what's been installed I simply do a select * from db_history. I can even use this table to document non-release type changes that might be made in the database simply by adding a comment.

For those of you using ruby-on-rails, they automatically implement a version table but the data contained there is pretty minimal. Of course you can backtrack from the version information to the sql but even in this environment I've found my table to be of more help.

2. make all database changes backward compatible. If I deploy a new version of my favorite web application, and later decide it wasn't quite up to snuff, then backing it out is usually as easy as moving a symlink or downloading a new tar ball. Unfortunately, this may be impossible or impractical in the case of a database change. For example, changing a column type or dropping a column can easily make your application fail. If the change requires an hour to apply to your large table, then it can easily take that long or longer to back it out. The rules to avoid this issue are pretty simple:

a. new columns and new tables should never break your code.

b. changed columns should be added and referenced by the new name but retain the old column for backward compatability and remove it in the next release.

c. dropped columns should only be removed after the second release which no longer uses them.

d. if a procedure or function has new parameters, in or out, then create a new procedure or function (e.g., foo_v2) and keep the old procedure or function around.

3. avoid triggers. Usually the functionality of a trigger can be duplicated in a procedure. Then when there is a problem, the code is right in front of the dba trying to debug the issue. This is a religious discussion and many people will disagree but I have found at 2:00 AM when things are broke that I can easily overlook a trigger.

4. use referential integrity sparingly. Again, this is a religious argument and some will disagree. But in my experience I have found many cases where an unanticipated table scan due to some RI goes un-noticed during development but haunts you in production. Sometimes RI cannot be avoided but by using it sparingly you can save yourself lots of late night head-scratching wondering why you are suddenly scanning our favorite 1M row table. Usually a check for existance of a key before an insert is all that's needed.

Have fun.

Sunday, July 19, 2009

how to use replication to forward transactions

Normal replication goes from host A to host B. But using the mysql my.cnf directive log_slave_updates you can store and forward transactions from an intermediary log to another host. So in this example, B is a slave of A; B specifies log_slave_updates, and then C as a slave of B gets all transactions from both A and B. Not quite multi-master support.

I have used this when cutting over a new system where existing hosts X & Y are replicating and I want to add a new host that will set up a new pair A & B.

Hope this is useful.

Friday, July 10, 2009

corrupt frm files for innodb tables

I've encountered a new mysql error for many (all) tables when innodb was starved for memory. The error:

Incorrect information in file: '...frm'
and any attempt to repair the table failed.

Much consternation!

But this occurred not because the table was corrupt but because I tried to give too much memory to the innodb buffer pool.

The real culprit in the error log was

InnoDB: Error: cannot allocate 2516598784 bytes of InnoDB: memory with malloc!
InnoDB: by InnoDB 228832092 bytes. Operating system errno: 12

Reducing the innodb-buffer-pool allocation to a more sane level for the box eliminated the apparent table corruption.



Thursday, July 9, 2009

database tables as log files

I have noticed over the years, a tendency of some developers to put logging information into database tables. Since the database is supposed to be the repository of corporate information and not a dumping ground, this is arguably not a good idea.

A better approach is to use flat files appropriate to your environment for logging. If you need them to be searchable either in or out of the application, then use a search engine technology (Lucene comes to mind) to index the log information and make it available to your consumers.


Friday, June 19, 2009

Quick and easy sql server to flat file

There are many times you need a flat file (csv for instance) to move from one db type to another. SQL Server bcp is just the ticket and combined with select into lets you create just what you want for those pesky data format translations.

The trick is to create a table using select into then use bcp to output it. Any data transformations you need to make (i.e., changing date formats) should be done when you build the temp table. In the simplest form the steps are:

  1. select * into tempdb..mynewtable (This both creates mynewtable and puts the data into it.)
  2. from a dos command line: bcp tempdb..mynewtable out mynewtable.csv -c -S server -U user -P password -t',' The -n flag tells SQL Server to just ouptut the data in character format.
  3. do what you will with the csv file. Import it into a spreadsheet, load it into mysql using load data, or load it into Oracle using sqlldr.
So one issue you may run into, especially with dates, is that one system's format isn't necessarily the others. The easiest way to handle this is to make your target tempdb data column match what you need. For SQL Server just use the convert function and one of the many date flags. To do this you may have to create your tempdb table manually then simply insert data into it.

Thursday, June 18, 2009

couldn't resist this picture


In case of network connectivty issues.....check your cable.

Wednesday, June 17, 2009

Why I love this job.

I'm just getting started using Oracle. So, naturally I need a nice visual client to work with. Even though I run a Mac it's easier to run this on my Parallels Windows/XP session due to VPN issues (don't ask). So, off I go to install a client.

Get Oracle Insta-client 11.01.
Get sqlplus too while I'm at it.
Find and change environment settings in Windows.
Get tnsnames.ora file from a unix box that I know works.
Test sqlplus. Yippee it works!
Get Toad.
Toad crashes; it won't support 11.x. Gotta downgrade to 10.x
Get another Oracle Insta-client 10.2
Change environment settings in Windows again.
Start Toad.
Uh oh. missing DLL.
Find DLL on web, download, install, reboot.
Yea! Toad works and connects.

Life should not be this hard.

Tuesday, June 16, 2009

database diagrams

At one time or another we have all been handed a new system and its related database(s). How do you wrap your arms around the entities and grasp the main concepts. For me, nothing beats an ERD and although Erwin and Microsoft Visio do good jobs at presenting these, the best product I've worked with for reverse-engineering a database is SchemaSpy. And it's free! As in beer. Check it out the next time you need a diagram. (I have no relationship with the author of SchemaSpy; I just think the product is really cool and wish I could write Java like this.


Monday, June 15, 2009

do you really need that left join?

I've been working with mySql and ruby on rails developers for the last several years and found that the generated code seems to like left joins. Something like

select * from foo
left join bar on foo.id=bar.foo_id

In most cases I've found they rarely need the left join and a simple eqi-join will work as well and significantly reduce the noise in the mysql slow query log. Check those slow queries and if you see lots of left joins and left outer joins have a conversation with your friendly developer.

Sunday, June 14, 2009

Random thoughts for the dba on various platforms.

sql server (and others) insert returns duplicate key and you are sure it's not duplicate. Check your collation set; case insensitive character sets make Foo, foo, fOo, or FoO all the same and an attempt to insert any of these values in a unique indexed column will give you the useless "duplicate key" error.