Wednesday, December 21, 2011

Let's play "Gotcha!" - the ACID Edition

Welcome to the ACID Edition of "Gotcha!", the self-testing quiz game for SQL Anywhere developers.

Let's get started right away...

Are your transactions atomic?


Are they consistent? isolated? durable?

Are they reentrant?

Are you sure?


Let's say you have created a SQL Anywhere web service to update your database, and this web service is launched when the user clicks on an INPUT TYPE="BUTTON" on a web page.

Let's also say you've designed the code carefully so all the database updates are performed within a single database transaction, with a single COMMIT after the updates are complete.

Furthermore, let's say you have carefully inspected the code, and you are sure of the following facts:
  • Each time your web service is executed, it either runs on a brand-new database connection or a pool connection that has been freshly sanitized (because that's how SQL Anywhere web services work),

  • if the web service finishes normally, by definition the COMMIT will have been executed, which means all the updates performed during the database transaction will have been successfully written to the database, and

  • if the web service terminates abnormally before reaching the COMMIT, SQL Anywhere will roll back all the updates performed during the database transaction (because that's what SQL Anywhere does when a connection terminates without doing a COMMIT).
Here's a simple demonstration that's just one step up from "Hello World":



Here's what the service does when it is invoked from the browser:
  • It deletes all 100 rows from a table called t, then

  • it re-inserts all 100 rows and finally

  • it displays them as ordered pairs inside parentheses.
Here's the code: One table, two procedures and one service; remember, this is a demonstration, so don't expect anything useful:

CREATE TABLE t (
   pkey   INTEGER NOT NULL PRIMARY KEY,
   data   INTEGER NOT NULL );

INSERT t SELECT row_num, row_num FROM sa_rowgenerator ( 1, 100 );
COMMIT;

CREATE PROCEDURE p()
BEGIN

DECLARE @sqlcode   INTEGER;
DECLARE @sqlstate  VARCHAR ( 5 );
DECLARE @errormsg  VARCHAR ( 32767 );

DELETE t;

WAITFOR DELAY '00:00:00.001';

INSERT t SELECT row_num, row_num FROM sa_rowgenerator ( 1, 100 );

WAITFOR DELAY '00:00:00.001';

COMMIT;

EXCEPTION
   WHEN OTHERS THEN
      SELECT SQLCODE, SQLSTATE, ERRORMSG() 
        INTO @sqlcode, @sqlstate, @errormsg;
      MESSAGE STRING ( 
         'DIAG ', @@SPID, ' ', CURRENT TIMESTAMP, ' EXCEPTION in p: SQLCODE = ', @sqlcode,  
         ', SQLSTATE = ', @sqlstate,  
         ', ERRORMSG() = ', @errormsg ) 
         TO CONSOLE;
      RESIGNAL;

END;

CREATE SERVICE s 
   TYPE 'RAW' AUTHORIZATION OFF USER DBA
   AS CALL s();

CREATE PROCEDURE s()
RESULT ( html_string LONG VARCHAR )
BEGIN
DECLARE @sqlcode   INTEGER;
DECLARE @sqlstate  VARCHAR ( 5 );
DECLARE @errormsg  VARCHAR ( 32767 );

MESSAGE STRING ( 'DIAG ', @@SPID, ' ', CURRENT TIMESTAMP, ' s started...' ) TO CONSOLE;

CALL p();

CALL dbo.sa_set_http_header( 'Content-Type', 'text/html' );

WAITFOR DELAY '00:00:00.001';

SELECT STRING (
   ' ',
   ' ',
   'Service s ',
   ' ',
   ' ',
   '
', LIST ( STRING ( '( ', t.pkey, ', ', t.data, ' )' ), ', ' ORDER BY t.pkey ), '

', ' ', '

', ' ', ' ' ) FROM t; EXCEPTION WHEN OTHERS THEN SELECT SQLCODE, SQLSTATE, ERRORMSG() INTO @sqlcode, @sqlstate, @errormsg; MESSAGE STRING ( 'DIAG ', @@SPID, ' ', CURRENT TIMESTAMP, ' EXCEPTION in s: SQLCODE = ', @sqlcode, ', SQLSTATE = ', @sqlstate, ', ERRORMSG() = ', @errormsg ) TO CONSOLE; ROLLBACK; RESIGNAL; END;

As you can see, the code also explicitly captures any EXCEPTION raised in the code:
  • The outer EXCEPTION handler in procedure s displays the connection number for the service

  • together with the current connection number (@@SPID) and the SQLCODE, and then

  • it does a ROLLBACK and RESIGNAL before the service terminates.
The inner EXCEPTION handler in procedure p does the same thing except no ROLLBACK.

So far, so good


Every time you click on the "Save" button, it does the same thing and displays the same output:



What happens if you start pounding on the Save button, and the service gets cancelled and restarted before it can finish?

That's not a problem, right? The first execution of the service gets rolled back if it hasn't done the COMMIT yet, and the second service runs... unless it gets cancelled too.

But that's not a problem either, right?



Here's the skill-testing question:


What do you see in the browser window if your really start pounding on the Save button as fast as you can?

Don't rush, take your time...



Your code's been reviewed, it's ACID, reentrant, all those good things?

Right?

Shouldn't you see this no matter what?



Are you sure? Take your time...



Gotcha!


If you keep pounding on the Save button, eventually you will see this in the browser:



Here are all the diagnostic messages from the SQL Anywhere database server console:

DIAG 365 2011-12-19 10:05:00.924 s started...
DIAG 365 2011-12-19 10:05:01.093 EXCEPTION in p: SQLCODE = -299, SQLSTATE = 57014, ERRORMSG() = Statement interrupted by user
DIAG 366 2011-12-19 10:05:01.102 s started...
DIAG 365 2011-12-19 10:05:01.102 EXCEPTION in s: SQLCODE = -299, SQLSTATE = 57014, ERRORMSG() = Statement interrupted by user
DIAG 366 2011-12-19 10:05:01.208 EXCEPTION in p: SQLCODE = -193, SQLSTATE = 23W01, ERRORMSG() = Primary key for table 't' is not unique : Primary key value ('1')
DIAG 367 2011-12-19 10:05:01.253 s started...
DIAG 366 2011-12-19 10:05:01.256 EXCEPTION in s: SQLCODE = -193, SQLSTATE = 23W01, ERRORMSG() = Primary key for table 't' is not unique : Primary key value ('1')
DIAG 367 2011-12-19 10:05:01.419 EXCEPTION in p: SQLCODE = -299, SQLSTATE = 57014, ERRORMSG() = Statement interrupted by user
DIAG 368 2011-12-19 10:05:01.425 s started...
DIAG 367 2011-12-19 10:05:01.429 EXCEPTION in s: SQLCODE = -299, SQLSTATE = 57014, ERRORMSG() = Statement interrupted by user
DIAG 368 2011-12-19 10:05:01.539 EXCEPTION in p: SQLCODE = -193, SQLSTATE = 23W01, ERRORMSG() = Primary key for table 't' is not unique : Primary key value ('1')
DIAG 368 2011-12-19 10:05:01.547 EXCEPTION in s: SQLCODE = -193, SQLSTATE = 23W01, ERRORMSG() = Primary key for table 't' is not unique : Primary key value ('1')
DIAG 369 2011-12-19 10:05:01.589 s started...
DIAG 369 2011-12-19 10:05:01.740 EXCEPTION in p: SQLCODE = -299, SQLSTATE = 57014, ERRORMSG() = Statement interrupted by user
DIAG 370 2011-12-19 10:05:01.746 s started...
DIAG 369 2011-12-19 10:05:01.750 EXCEPTION in s: SQLCODE = -299, SQLSTATE = 57014, ERRORMSG() = Statement interrupted by user
DIAG 370 2011-12-19 10:05:01.864 EXCEPTION in p: SQLCODE = -193, SQLSTATE = 23W01, ERRORMSG() = Primary key for table 't' is not unique : Primary key value ('1')
DIAG 370 2011-12-19 10:05:01.872 EXCEPTION in s: SQLCODE = -193, SQLSTATE = 23W01, ERRORMSG() = Primary key for table 't' is not unique : Primary key value ('1')

Some of the exceptions are expected: SQLCODE = -299 happens when the service is cancelled by the browser. The second column (365, 366, etc) shows the connection number so you can see how the executions overlap.

The other exceptions are not expected: the SQLCODE = -193 primary key violations mean the DELETE / INSERT logic doesn't work properly when the Save button is given a vigorous workout.

Here's another interesting observation: the "SQL error" doesn't reach the browser until after the primary key exception has been raised several times. How do we know that? Because once the browser displays "SQL error" there's no Save button any more, and that clearly wasn't the case: the repeated "s started..." messages prove there was a Save button on screen long after the first primary key exception.

Locking, blocking, the atomic nature of transactions and the reentrant nature of SQL Anywhere executable code... none of that helps this application avoid a showstopper exception.

For extra points, explain why it happens and how to fix it.


4 comments:

Thomas Dümesnil said...

Did you try
CREATE PROCEDURE p()
BEGIN ATOMIC

?

Anonymous said...

That basically sounds like the question raised in the SQL Anywhere Forum:

http://sqlanywhere-forum.sybase.com/questions/7971/how-to-prevent-another-transaction-from-inserting-the-row-im-about-to-insert.

I think you will have to use isolation level 3 to prevent another transaction from trying to insert rows with the same PK (thereby preventing phantom rows), i.e. something like a

SELECT * FROM t WITH (HOLDLOCK)

just before the DELETE.
(That would block the other transactions.)

Since the same rows will be re-entered, even isolation level 2 might be sufficient.

Just my guesses

Volker

P.S. I'm assuming you're not using AUTO COMMIT mode:)

MC said...

The browser is firing off requests as fast as it can cancel the previous request... but the browser doesn't wait for the previous request to finish cancelling (and it doesn't know when the database is finished cancelling the request anyway), so you effectively get many web service calls being executed simultaneously. Net result is that two connections will (a) execute "delete t", and then (b) execute "insert t select ..." and therefore they get the primary key error. I.e. Don't assume that web service requests from a browser are serialized; they are not!

Ivan T. Bowman said...

I must confess I did not read the entire repro, but I looked at the procedure P():
s1: DELETE t;
s2: WAITFOR...
s3: INSERT ...

If we have two transactions T1 and T2 running concurrently, we could have a schedule like this:
T1: s1: DELETE t
T2: s1: DELETE t ; -- no rows in T now, no blocking
...
T1: INSERT...
T2: INSERT...

Since the read_past_deleted option is On by default, T2 will not block when it tries to delete from table T, even though T1 has locked all those rows for delete and has not committed nor rolled back.

You could change that option, or design the procedure so that it modifies a shared row (not one that is deleted), or you might use isolation 3.