Monday, October 17, 2011

Set UPDATE Versus WHERE CURRENT OF Cursor

Question: What is the upper limit on UPDATE?

How many rows can the technique described in Using LockCount To Display UPDATE Progress be used for?

Answer: It depends!

At some point, surely, a set-oriented UPDATE statement affecting millions of rows will fall over in a heap because too many row locks are being held.

Epiphany Time: What about LOCK TABLE? (that ... did ... not ... occur ... to ... me ... until ... this ... moment).
But never mind, let's just pretend that LOCK TABLE can't be used in this particular case (even ... though ... it ... can ... sigh).
Yeah, sure, that's it... let's talk about updating millions of rows but not the whole table, and the whole table can't be locked because other people are using it! (that's my story now, ok?)
Then there's the rollback log... surely a set-oriented UPDATE can't go on forever, can it, bloating up the database file?
Yup, that's where the "rollback log" goes, at the end of the database file. Don't get the rollback log confused with the transaction log which is really a "redo log", the exact opposite of a "rollback log".

OK, there's a limit

Let's accept there's a limit.

Maybe we don't see the dreaded "User 'another user' has the row in 'xxx' locked" message much any more, but let's just accept that we can't let a set-oriented UPDATE run on forever without doing a COMMIT.

What about UPDATE WHERE CURRENT OF Cursor?

A cursor FOR loop does require a bit of effort to split the original set-oriented UPDATE into separate SELECT and UPDATE WHERE CURRENT OF cursor statements, but it has several advantages:
  • A FOR loop will let you issue a COMMIT every once in a while, say, after a million rows, or five million, whatever,

  • which in turn allows the FOR loop to update an unlimited number of rows, plus

  • it's a lot easier to display progress messages than the funky technique described in Using LockCount To Display UPDATE Progress.

Before...

Here's a real world "before" example, a set-oriented UPDATE with a single COMMIT before it was changed into a FOR loop:
UPDATE rroad_group_2_property_pivot
          INNER JOIN rroad_sample_set
             ON rroad_sample_set.sample_set_number = rroad_group_2_property_pivot.sample_set_number
          LEFT OUTER JOIN ( SELECT rroad_group_2_property_pivot.sample_set_number AS sample_set_number,
                                   rroad_group_2_property_pivot.BlockedOn         AS blocking_connection_number,
                                   COUNT(*)                                       AS blocked_connection_count
                              FROM rroad_group_2_property_pivot
                             WHERE rroad_group_2_property_pivot.BlockedOn > 0
                             GROUP BY rroad_group_2_property_pivot.sample_set_number,
                                      rroad_group_2_property_pivot.BlockedOn          ) AS blocking_connection
             ON blocking_connection.sample_set_number          = rroad_group_2_property_pivot.sample_set_number
            AND blocking_connection.blocking_connection_number = rroad_group_2_property_pivot.connection_number

   SET rroad_group_2_property_pivot.blocked_connection_count
             = COALESCE ( blocking_connection.blocked_connection_count, 0 ),

       rroad_group_2_property_pivot.current_req_status
             = IF TRIM ( COALESCE ( rroad_group_2_property_pivot.ReqStatus, '' ) ) = ''
                  THEN '-'
                  ELSE CASE TRIM ( rroad_group_2_property_pivot.ReqStatus )
                          WHEN 'Idle'              THEN 'Idle'
                          WHEN 'Unscheduled'       THEN 'Waiting for thread'
                          WHEN 'BlockedIO'         THEN 'Waiting for I/O'
                          WHEN 'BlockedContention' THEN 'Waiting for shared resource'
                          WHEN 'BlockedLock'       THEN 'Blocked by lock'
                          WHEN 'Executing'         THEN 'Executing'
                          ELSE TRIM ( rroad_group_2_property_pivot.ReqStatus )
                       END
               ENDIF,

       rroad_group_2_property_pivot.cache_satisfaction
             = LESSER ( GREATER (
                  COALESCE (
                     CAST ( ROUND ( CAST ( rroad_group_2_property_pivot.CacheHits AS DECIMAL ( 30, 6 ) )
                                       / CAST ( GREATER ( 1, rroad_group_2_property_pivot.CacheRead ) AS DECIMAL ( 30, 6 ) )
                                       * 100.0,
                                    0 )
                            AS DECIMAL ( 30, 0 ) ),
                     0.0 ),
                  0.0 ), 100.0 ), -- limit the cache_satisfaction to between 0 and 100

       rroad_group_2_property_pivot.time_connected
             = IF TRIM ( COALESCE ( rroad_group_2_property_pivot.LoginTime, '' ) ) = ''
                  THEN 0
                  ELSE DATEDIFF ( MILLISECOND,
                          rroad_f_dateadd_msec ( 3,
                                                 rroad_group_2_property_pivot.LoginTime ),
                          rroad_sample_set.sample_finished_at ) -- also called "sample_recorded_at"
               ENDIF,

       rroad_group_2_property_pivot.total_waits
             = COALESCE ( rroad_group_2_property_pivot.ReqCountBlockContention, 0 )
                  + COALESCE ( rroad_group_2_property_pivot.ReqCountBlockIO, 0 )
                  + COALESCE ( rroad_group_2_property_pivot.ReqCountBlockLock, 0 )
                  + COALESCE ( rroad_group_2_property_pivot.ReqCountUnscheduled, 0 ),

       rroad_group_2_property_pivot.waiting_time
             = COALESCE ( rroad_group_2_property_pivot.ReqTimeBlockContention, 0.0 )
                  + COALESCE ( rroad_group_2_property_pivot.ReqTimeBlockIO, 0.0 )
                  + COALESCE ( rroad_group_2_property_pivot.ReqTimeBlockLock, 0.0 )
                  + COALESCE ( rroad_group_2_property_pivot.ReqTimeUnscheduled, 0.0 ),

       rroad_group_2_property_pivot.transaction_running_time
             = IF TRIM ( COALESCE ( rroad_group_2_property_pivot.TransactionStartTime, '' ) ) = ''
                  THEN 0
                  ELSE DATEDIFF ( MILLISECOND,
                          rroad_f_dateadd_msec ( rroad_sample_set.datediff_msec_between_target_and_local,
                                                 rroad_group_2_property_pivot.TransactionStartTime ),
                          rroad_sample_set.sample_finished_at ) -- also called "sample_recorded_at"
               ENDIF,

       rroad_group_2_property_pivot.time_since_last_request
             = IF TRIM ( COALESCE ( rroad_group_2_property_pivot.LastReqTime, '' ) ) = ''
                  THEN 0
                  ELSE DATEDIFF ( MILLISECOND,
                          rroad_f_dateadd_msec ( rroad_sample_set.datediff_msec_between_target_and_local,
                                                 rroad_group_2_property_pivot.LastReqTime ),
                          rroad_sample_set.sample_finished_at ) -- also called "sample_recorded_at"
               ENDIF,

       rroad_group_2_property_pivot.index_satisfaction
             = LESSER ( GREATER (
                  COALESCE (
                     CAST ( ROUND ( CAST ( rroad_group_2_property_pivot.IndLookup
                                              - LESSER ( rroad_group_2_property_pivot.IndLookup, rroad_group_2_property_pivot.FullCompare )
                                           AS DECIMAL ( 30, 6 ) )
                                       / CAST ( GREATER ( 1, rroad_group_2_property_pivot.IndLookup )
                                                AS DECIMAL ( 30, 6 ) )
                                       * 100.0,
                                    0 )
                            AS DECIMAL ( 30, 0 ) ),
                     0.0 ),
                  0.0 ), 100.0 );  -- limit the index_satisfaction to between 0 and 100

COMMIT;

...and After

Here's the same update after it was changed into a FOR loop as follows:
  • The CHECKPOINT and SELECT SYSTAB.count statements on lines 4 through 9 are extras, added to facilitate the "Row x of y updated" MESSAGE statement on line 141.

  • The SELECT on lines 14 through 32 was formed by taking every table_name.column_name referenced in the original UPDATE SET clause and turning it into a local FOR loop variable using the AS @column_name alias specification.

  • The FROM clause on lines 33 through 44 was copied directly from the original set-oriented UPDATE statement.

  • The UPDATE SET clause on lines 52 through 131 was edited to replace every table_name.column_name with the corresponding @column_name defined in the SELECT.

  • The IF statement on lines 135 through 148 issues a COMMIT for every one million rows updated, as well as producing a "Row x of y updated" MESSAGE. The code for the function rroad_f_number_with_commas() isn't shown here, but all it does is insert commas in big numbers like 9,999,999.

DECLARE @rroad_group_2_property_pivot_row_count      BIGINT;
DECLARE @fetch_counter                               BIGINT;

CHECKPOINT;

SELECT SYSTAB.count
  INTO @rroad_group_2_property_pivot_row_count
  FROM SYSTAB
 WHERE SYSTAB.table_name = 'rroad_group_2_property_pivot';

SET @fetch_counter = 0;

FOR f_fetch AS c_fetch NO SCROLL CURSOR FOR
SELECT blocking_connection.blocked_connection_count              AS @blocked_connection_count,
       rroad_group_2_property_pivot.ReqStatus                    AS @ReqStatus,
       rroad_group_2_property_pivot.CacheHits                    AS @CacheHits,
       rroad_group_2_property_pivot.CacheRead                    AS @CacheRead,
       rroad_group_2_property_pivot.LoginTime                    AS @LoginTime,
       rroad_sample_set.sample_finished_at                       AS @sample_finished_at,
       rroad_group_2_property_pivot.ReqCountBlockContention      AS @ReqCountBlockContention,
       rroad_group_2_property_pivot.ReqCountBlockIO              AS @ReqCountBlockIO,
       rroad_group_2_property_pivot.ReqCountBlockLock            AS @ReqCountBlockLock,
       rroad_group_2_property_pivot.ReqCountUnscheduled          AS @ReqCountUnscheduled,
       rroad_group_2_property_pivot.ReqTimeBlockContention       AS @ReqTimeBlockContention,
       rroad_group_2_property_pivot.ReqTimeBlockIO               AS @ReqTimeBlockIO,
       rroad_group_2_property_pivot.ReqTimeBlockLock             AS @ReqTimeBlockLock,
       rroad_group_2_property_pivot.ReqTimeUnscheduled           AS @ReqTimeUnscheduled,
       rroad_group_2_property_pivot.TransactionStartTime         AS @TransactionStartTime,
       rroad_sample_set.datediff_msec_between_target_and_local   AS @datediff_msec_between_target_and_local,
       rroad_group_2_property_pivot.LastReqTime                  AS @LastReqTime,
       rroad_group_2_property_pivot.IndLookup                    AS @IndLookup,
       rroad_group_2_property_pivot.FullCompare                  AS @FullCompare
  FROM rroad_group_2_property_pivot
             INNER JOIN rroad_sample_set
                ON rroad_sample_set.sample_set_number = rroad_group_2_property_pivot.sample_set_number
             LEFT OUTER JOIN ( SELECT rroad_group_2_property_pivot.sample_set_number AS sample_set_number,
                                      rroad_group_2_property_pivot.BlockedOn         AS blocking_connection_number,
                                      COUNT(*)                                       AS blocked_connection_count
                                 FROM rroad_group_2_property_pivot
                                WHERE rroad_group_2_property_pivot.BlockedOn > 0
                                GROUP BY rroad_group_2_property_pivot.sample_set_number,
                                         rroad_group_2_property_pivot.BlockedOn          ) AS blocking_connection
                ON blocking_connection.sample_set_number          = rroad_group_2_property_pivot.sample_set_number
               AND blocking_connection.blocking_connection_number = rroad_group_2_property_pivot.connection_number
FOR UPDATE
DO

   SET @fetch_counter = @fetch_counter + 1;

   UPDATE rroad_group_2_property_pivot

      SET rroad_group_2_property_pivot.blocked_connection_count
                = COALESCE ( @blocked_connection_count, 0 ),

          rroad_group_2_property_pivot.current_req_status
                = IF TRIM ( COALESCE ( @ReqStatus, '' ) ) = ''
                     THEN '-'
                     ELSE CASE TRIM ( @ReqStatus )
                             WHEN 'Idle'              THEN 'Idle'
                             WHEN 'Unscheduled'       THEN 'Waiting for thread'
                             WHEN 'BlockedIO'         THEN 'Waiting for I/O'
                             WHEN 'BlockedContention' THEN 'Waiting for shared resource'
                             WHEN 'BlockedLock'       THEN 'Blocked by lock'
                             WHEN 'Executing'         THEN 'Executing'
                             ELSE TRIM ( @ReqStatus )
                          END
                  ENDIF,

          rroad_group_2_property_pivot.cache_satisfaction
                = LESSER ( GREATER (
                     COALESCE (
                        CAST ( ROUND ( CAST ( @CacheHits AS DECIMAL ( 30, 6 ) )
                                          / CAST ( GREATER ( 1, @CacheRead ) AS DECIMAL ( 30, 6 ) )
                                          * 100.0,
                                       0 )
                               AS DECIMAL ( 30, 0 ) ),
                        0.0 ),
                     0.0 ), 100.0 ), -- limit the cache_satisfaction to between 0 and 100

          rroad_group_2_property_pivot.time_connected
                = IF TRIM ( COALESCE ( @LoginTime, '' ) ) = ''
                     THEN 0
                     ELSE DATEDIFF ( MILLISECOND,
                             rroad_f_dateadd_msec ( 3,
                                                    @LoginTime ),
                             @sample_finished_at ) -- also called "sample_recorded_at"
                  ENDIF,

          rroad_group_2_property_pivot.total_waits
                = COALESCE ( @ReqCountBlockContention, 0 )
                     + COALESCE ( @ReqCountBlockIO, 0 )
                     + COALESCE ( @ReqCountBlockLock, 0 )
                     + COALESCE ( @ReqCountUnscheduled, 0 ),

          rroad_group_2_property_pivot.waiting_time
                = COALESCE ( @ReqTimeBlockContention, 0.0 )
                     + COALESCE ( @ReqTimeBlockIO, 0.0 )
                     + COALESCE ( @ReqTimeBlockLock, 0.0 )
                     + COALESCE ( @ReqTimeUnscheduled, 0.0 ),

          rroad_group_2_property_pivot.transaction_running_time
                = IF TRIM ( COALESCE ( @TransactionStartTime, '' ) ) = ''
                     THEN 0
                     ELSE DATEDIFF ( MILLISECOND,
                             rroad_f_dateadd_msec ( @datediff_msec_between_target_and_local,
                                                    @TransactionStartTime ),
                             @sample_finished_at ) -- also called "sample_recorded_at"
                  ENDIF,

          rroad_group_2_property_pivot.time_since_last_request
                = IF TRIM ( COALESCE ( @LastReqTime, '' ) ) = ''
                     THEN 0
                     ELSE DATEDIFF ( MILLISECOND,
                             rroad_f_dateadd_msec ( @datediff_msec_between_target_and_local,
                                                    @LastReqTime ),
                             @sample_finished_at ) -- also called "sample_recorded_at"
                  ENDIF,

          rroad_group_2_property_pivot.index_satisfaction
                = LESSER ( GREATER (
                     COALESCE (
                        CAST ( ROUND ( CAST ( @IndLookup
                                                 - LESSER ( @IndLookup, @FullCompare )
                                              AS DECIMAL ( 30, 6 ) )
                                          / CAST ( GREATER ( 1, @IndLookup )
                                                   AS DECIMAL ( 30, 6 ) )
                                          * 100.0,
                                       0 )
                               AS DECIMAL ( 30, 0 ) ),
                        0.0 ),
                     0.0 ), 100.0 )  -- limit the index_satisfaction to between 0 and 100

    WHERE CURRENT OF c_fetch;

   IF @fetch_counter = 1
   OR MOD ( @fetch_counter, 1000000 ) = 0
   OR @fetch_counter = @rroad_group_2_property_pivot_row_count THEN

      COMMIT;

      MESSAGE STRING (
         'Row ',
         rroad_f_number_with_commas ( @fetch_counter ),
         ' of ',
         rroad_f_number_with_commas ( @rroad_group_2_property_pivot_row_count ),
         ' updated.' ) TO CONSOLE;

   END IF;

END FOR;

COMMIT;

Here's what the FOR loop displayed for a test that began at 16:01:48:
I. 10/11 16:04:08. Row 1 of 6,133,762 updated.
I. 10/11 16:13:20. Row 1,000,000 of 6,133,762 updated.
I. 10/11 16:22:42. Row 2,000,000 of 6,133,762 updated.
I. 10/11 16:31:00. Row 3,000,000 of 6,133,762 updated.
I. 10/11 16:48:38. Row 4,000,000 of 6,133,762 updated.
I. 10/11 17:01:57. Row 5,000,000 of 6,133,762 updated.
I. 10/11 17:15:44. Row 6,000,000 of 6,133,762 updated.
I. 10/11 17:16:51. Row 6,133,762 of 6,133,762 updated.

The time between 16:01:48 and 16:04:08 was taken up by the FOR SELECT before the first row was fetched, and the rest of the time was spent doing the updates.

Total time for the UPDATE WHERE CURRENT OF cursor test...

75 minutes


And for the set-oriented UPDATE?

15 minutes!


Safe to say, cursor loops suck when it comes to performance... and this cursor loop was inside a stored procedure!

What would a client-server fetch loop have been like, with every row being schlepped across the Great ODBC Divide? or JDBC? Or, heaven forfend, LINQ?

Set-oriented UPDATE rules!


But what about the limit? What about the commits? More on this later...


3 comments:

Anonymous said...

What about the performance of an UPDATE TOP n START AT m ... ORDER BY ... style update, i.e. using a WHILE loop to split up the update in bunches of say, million rows each (or preferably much smaller bunches) and printing messages and commiting inbetween?

Unless such an order is difficult to find, this should give you the "best of both worlds", and I would expect the performance to be as good (or possibly better because of lesser locks and a smaller rollback log) than with one huge set-oriented UPDATE.

Just my 2,000,000 cents

Volker

Breck Carter said...

@Volker: Funny you should ask :)

http://sqlanywhere.blogspot.com/2011/10/set-update-with-top-start-at-and-commit.html

Alas, AFAIK your optimism about performance is misplaced... adding an ORDER BY to a query is a rather unlikely way to speed things up :)

Anonymous said...

@Breck: Well, I should've known something big was coming the way:)

My optimism w.r.t. to performance was focussed on the "more frequent commits -> less contention and smaller rollback log" issue. That might still hold. As the original UPDATE wasn't documented here, I have not thought of the implications by an ORDER BY but that's a point, I agree:)

Do you have test results with smaller update sizes (100K or the like)?

----

FWIW: I do use this technique sometimes in a different way:

"UPDATE TOP n" of those rows that are not already "handled" - which means the UPDATE will set some condition on the rows that prevent them from being selected for the UPDATE the next time - so there's no need to use an START AT clause, and the refetching might become faster as the regarded set becomes smaller... And one can simply leave the loop when @@rowcount reaches 0...

I'm not sure whether this kind of check is apropriate in your situation...


Volker