Wednesday, July 18, 2012

Product Suggestion: WITH AUTO NAME ON MISSING

Automated schema upgrades are often a requirement for embedded databases, requiring miles and miles of custom code... either that, or you don't allow any schema changes after Version 1.

Here's what makes embedded databases different from central databases:

  • Hundreds or thousands of copies of the database exist, rather than just a few (or one).

  • Embedded databases don't have database administrators... they have doctors, lawyers, truckers, accountants, salesfolk, but no DBAs.

  • Embedded databases often don't even have an IT department... not when the database is buried in a trench or hurtling down the highway.
So, when the time comes to
  • add a column,

  • or a table,

  • or an index,

  • or split a table into two,

  • or combine two tables into one,

  • or . . . re . . . design . . . every . . . single . . . thing,
what's a developer to do?

Say no?


Well, that sometimes works, and if synchronization is involved that's always the first choice.

Seriously...


Let's not talk about synchronization, just a regular embedded database: maybe one user, maybe more than one, but no MobiLink.

There are two fundamentally different approaches for an automated schema upgrade:
  1. Upgrade the old database in-place using a variety of custom ALTER statements and other code to deal with the differences, versus

  2. Deliver a new database and copy the data with custom code to deal with the differences.
Both approaches are difficult to implement and very difficult to maintain as the variety of old versions in the field multiplies and grows... with success (more sales, more databases, more upgrades, more old versions) making the situation worse.

Frankly, I don't know how Intuit does it, and I bow down before their vast superiority... I am not worthy!
It doesn't matter which approach you take, or even if you use a combination of both approaches, eventually you're going to want to use INSERT SELECT to copy data from one location to another.

...and the INSERT WITH AUTO NAME SELECT feature makes it easy(ier) to deal with schema differences between the INSERT thing and the SELECT thing.

Let's call it a "Featurette"


Not having to code long lists of column names makes INSERT SELECT easier to write and maintain: SELECT * helps, and so does INSERT ( without, the, long, list, of, column, names ).

Right now, DBAs in the audience are cringing in horror at the mere suggestion of SELECT *... that's OK, the world needs SELECT * more than it needs DBAs :)

In other words, INSERT new_table SELECT * FROM old_table is a wonderful thing.

Here'e the problem, though: if the new table has even one single additional column you can no longer use INSERT new_table SELECT * FROM old_table.
CREATE TABLE old_table (
   pkey  INTEGER NOT NULL PRIMARY KEY,
   a     INTEGER NOT NULL DEFAULT 0,
   b     INTEGER NOT NULL DEFAULT 0 );

INSERT old_table VALUES ( 1, 2, 3 );
COMMIT;

CREATE TABLE new_table (
   pkey  INTEGER NOT NULL PRIMARY KEY,
   a     INTEGER NOT NULL DEFAULT 0,
   x     INTEGER NOT NULL DEFAULT 0,
   b     INTEGER NOT NULL DEFAULT 0 );

INSERT new_table SELECT * FROM old_table;

Could not execute statement.
Wrong number of values for INSERT
SQLCODE=-207, ODBC 3 State="21S01"
Line 15, column 1
What you have to do, even if you're OK with the DEFAULT value being assigned to the new column, is list all the column names being copied from the old table:
INSERT new_table ( pkey, a, b ) SELECT * FROM old_table;
Sure, that's easy when there are only three columns, but when there are 50 columns increasing to 51, and it's happening with dozens of tables, and code has to deal with several different old versions each having its own set of changes, it's not just hard, it's boring, and that means it's error prone.

Let's face it: database upgrades, file conversions, whatever they've been called over the years, have never been assigned much importance by management, never been given much of the budget or much time in the schedule, or... dare it be said... never been assigned to the Alpha Engineers.

Alphas can juggle dozens of differences in their head at one time...

...the rest of us need help


The WITH AUTO NAME clause lets you code INSERT new_table SELECT * FROM old_table without the column name list:
CREATE TABLE old_table (
   pkey  INTEGER NOT NULL PRIMARY KEY,
   a     INTEGER NOT NULL DEFAULT 0,
   b     INTEGER NOT NULL DEFAULT 0 );

INSERT old_table VALUES ( 1, 2, 3 );
COMMIT;

CREATE TABLE new_table (
   pkey  INTEGER NOT NULL PRIMARY KEY,
   a     INTEGER NOT NULL DEFAULT 0,
   x     INTEGER NOT NULL DEFAULT 0,
   b     INTEGER NOT NULL DEFAULT 0 );

INSERT new_table WITH AUTO NAME SELECT * FROM old_table;

SELECT * FROM new_table ORDER BY pkey;

pkey,a,x,b
1,2,0,3
So far, so good, but what if you are dropping a column? The WITH AUTO NAME clause doesn't help:
CREATE TABLE old_table (
   pkey  INTEGER NOT NULL PRIMARY KEY,
   a     INTEGER NOT NULL DEFAULT 0,
   b     INTEGER NOT NULL DEFAULT 0 );

INSERT old_table VALUES ( 1, 2, 3 );
COMMIT;

CREATE TABLE new_table (
   pkey  INTEGER NOT NULL PRIMARY KEY,
   b     INTEGER NOT NULL DEFAULT 0 );

INSERT new_table WITH AUTO NAME SELECT * FROM old_table;

Could not execute statement.
Column 'a' not found
SQLCODE=-143, ODBC 3 State="42S22"
Line 13, column 1
OK, you could decide to never drop a column, but what if it's a big fat blob that's moving to another table (and separate code deals with that process)?

What if you really hate having unused columns hanging around? WITH AUTO NAME won't let you do a rename:
CREATE TABLE old_table (
   pkey  INTEGER NOT NULL PRIMARY KEY,
   a     INTEGER NOT NULL DEFAULT 0,
   b     INTEGER NOT NULL DEFAULT 0 );

INSERT old_table VALUES ( 1, 2, 3 );
COMMIT;

CREATE TABLE new_table (
   pkey    INTEGER NOT NULL PRIMARY KEY,
   unused  INTEGER NOT NULL DEFAULT 0,
   b       INTEGER NOT NULL DEFAULT 0 );

INSERT new_table WITH AUTO NAME SELECT * FROM old_table;

Could not execute statement.
Column 'a' not found
SQLCODE=-143, ODBC 3 State="42S22"
Line 14, column 1

Here's the Featurette Suggestion...


Allow the ON MISSING IGNORE clause on WITH AUTO NAME so that any column names in the SELECT * list that don't match column names in the INSERT table are ignored:
INSERT new_table 
   WITH AUTO NAME ON MISSING IGNORE 
   SELECT * FROM old_table;
Optional, of course, and the default would be ON MISSING ERROR just like it is now.

Give that task to one of the Alphas and it'll take less time to implement than this request took to write :)


No comments: