Friday, March 4, 2011

Assertion Relief

The way SQL Anywhere handles assertions is in direct conflict with the goal of providing a self-managing database (see Automatic Restart).

Further evidence of this conflict may be found in the facts that this feature request has received 8 positive votes plus one positive comment in the past year, but no replies:


Recover from single-row assertion errors

Sometimes the server raises an assertion error for a problem with a single row. Here is a really obscure example, but there are other more common ones:
   *** ERROR *** Assertion failed: 200610 (11.0.1.2276)

Attempting to normalize a non-continued row (0x12345:0x0)

When this happens it is very difficult if not impossible to rescue the database. In this case any attempt to SELECT that row causes the server to stop, and because there is no indication what row is involved, and because there are millions of rows in that table, the database had to be restored from a backup.

Here's an analogy: A airliner is halfway across the Pacific when the light in one of the lavatories burns out (the assertion); this in turn causes all the engines to shut down and the plane to crash in the middle of the ocean.

In the real world, one of the flight attendants puts an "out of order" sign on the lavatory and makes a note for the aircraft maintenance crew to "replace bulb".

In the SQL Anywhere world, the server should throw an exception to the affected client, log the problem for later attention... AND CARRY ON!

...no need to spread baggage and body parts over miles of ocean!
Comment...

I think this one deserves a stir-up. The whole business of assertions needs some examination. If we are going down the self-managing database route and (let's say) an index gets corrupted, isn't there an argument that the engine should just get on and fix it. Certainly flash some red lights and let people know, but do the fix anyway. After all, what's the dba going to do - drop and recreate the index, and then keep an eye on things. Obviously harder / not possible if it's real data corruption.


Analogies Prove Nothing

A good analogy can be used to explain an important point or position, but even the best analogy breaks down when closely examined. The airliner analogy is no exception; here's a problem report that indicates the trouble with assertions is worse than the analogy indicates: a single assertion can bring a whole fleet of aircraft crashing down:

"We have an SQLAnywhere infrastructure setup where we have one database service that runs multiple database instances. This has been working well for quite some time..."

"However, yesterday we experienced an "Assertion Failed" error that took down the whole database service, but we could not find which database had caused this problem. Our troubleshooting process eventually led us to having to split each database into its own separate service which then got our customers up and running in a stable way and when the crash happened again we were able to pinpoint the single database with the problem."
This response indicates that some relief is on the way... if crashing only one aircraft instead of the whole fleet can be called relief:

"...we plan to make changes to allow corrupted databases to be stopped without bringing down the server. Printing database information to related assertion failures should be part of that."

Good news

News that some action, any action on assertions is being taken is good news in itself. It sure beats the old excuse that assertions are necessary to protect data integrity.

More, however, is necessary... much more.

2 comments:

Anonymous said...

I suggest this rant:)

Regards
Volker

Bill said...

I am glad to say assertion failures seldom ever happen to us!! But when they do, it would certainly be nice to have some better options than "restore backup and see if it happens again"!!