Question: What is the upper limit on UPDATE?
How many rows can the technique described in Using LockCount To Display UPDATE Progress be used for?
Answer: It depends!
At some point, surely, a set-oriented UPDATE statement affecting millions of rows will fall over in a heap because too many row locks are being held.
Epiphany Time: What about LOCK TABLE? (that ... did ... not ... occur ... to ... me ... until ... this ... moment).Then there's the rollback log... surely a set-oriented UPDATE can't go on forever, can it, bloating up the database file?
But never mind, let's just pretend that LOCK TABLE can't be used in this particular case (even ... though ... it ... can ... sigh).
Yeah, sure, that's it... let's talk about updating millions of rows but not the whole table, and the whole table can't be locked because other people are using it! (that's my story now, ok?)
Yup, that's where the "rollback log" goes, at the end of the database file. Don't get the rollback log confused with the transaction log which is really a "redo log", the exact opposite of a "rollback log".
OK, there's a limit
Let's accept there's a limit.Maybe we don't see the dreaded "User 'another user' has the row in 'xxx' locked" message much any more, but let's just accept that we can't let a set-oriented UPDATE run on forever without doing a COMMIT.
What about UPDATE WHERE CURRENT OF Cursor?
A cursor FOR loop does require a bit of effort to split the original set-oriented UPDATE into separate SELECT and UPDATE WHERE CURRENT OF cursor statements, but it has several advantages:- A FOR loop will let you issue a COMMIT every once in a while, say, after a million rows, or five million, whatever,
- which in turn allows the FOR loop to update an unlimited number of rows, plus
- it's a lot easier to display progress messages than the funky technique described in Using LockCount To Display UPDATE Progress.
Before...
Here's a real world "before" example, a set-oriented UPDATE with a single COMMIT before it was changed into a FOR loop:UPDATE rroad_group_2_property_pivot INNER JOIN rroad_sample_set ON rroad_sample_set.sample_set_number = rroad_group_2_property_pivot.sample_set_number LEFT OUTER JOIN ( SELECT rroad_group_2_property_pivot.sample_set_number AS sample_set_number, rroad_group_2_property_pivot.BlockedOn AS blocking_connection_number, COUNT(*) AS blocked_connection_count FROM rroad_group_2_property_pivot WHERE rroad_group_2_property_pivot.BlockedOn > 0 GROUP BY rroad_group_2_property_pivot.sample_set_number, rroad_group_2_property_pivot.BlockedOn ) AS blocking_connection ON blocking_connection.sample_set_number = rroad_group_2_property_pivot.sample_set_number AND blocking_connection.blocking_connection_number = rroad_group_2_property_pivot.connection_number SET rroad_group_2_property_pivot.blocked_connection_count = COALESCE ( blocking_connection.blocked_connection_count, 0 ), rroad_group_2_property_pivot.current_req_status = IF TRIM ( COALESCE ( rroad_group_2_property_pivot.ReqStatus, '' ) ) = '' THEN '-' ELSE CASE TRIM ( rroad_group_2_property_pivot.ReqStatus ) WHEN 'Idle' THEN 'Idle' WHEN 'Unscheduled' THEN 'Waiting for thread' WHEN 'BlockedIO' THEN 'Waiting for I/O' WHEN 'BlockedContention' THEN 'Waiting for shared resource' WHEN 'BlockedLock' THEN 'Blocked by lock' WHEN 'Executing' THEN 'Executing' ELSE TRIM ( rroad_group_2_property_pivot.ReqStatus ) END ENDIF, rroad_group_2_property_pivot.cache_satisfaction = LESSER ( GREATER ( COALESCE ( CAST ( ROUND ( CAST ( rroad_group_2_property_pivot.CacheHits AS DECIMAL ( 30, 6 ) ) / CAST ( GREATER ( 1, rroad_group_2_property_pivot.CacheRead ) AS DECIMAL ( 30, 6 ) ) * 100.0, 0 ) AS DECIMAL ( 30, 0 ) ), 0.0 ), 0.0 ), 100.0 ), -- limit the cache_satisfaction to between 0 and 100 rroad_group_2_property_pivot.time_connected = IF TRIM ( COALESCE ( rroad_group_2_property_pivot.LoginTime, '' ) ) = '' THEN 0 ELSE DATEDIFF ( MILLISECOND, rroad_f_dateadd_msec ( 3, rroad_group_2_property_pivot.LoginTime ), rroad_sample_set.sample_finished_at ) -- also called "sample_recorded_at" ENDIF, rroad_group_2_property_pivot.total_waits = COALESCE ( rroad_group_2_property_pivot.ReqCountBlockContention, 0 ) + COALESCE ( rroad_group_2_property_pivot.ReqCountBlockIO, 0 ) + COALESCE ( rroad_group_2_property_pivot.ReqCountBlockLock, 0 ) + COALESCE ( rroad_group_2_property_pivot.ReqCountUnscheduled, 0 ), rroad_group_2_property_pivot.waiting_time = COALESCE ( rroad_group_2_property_pivot.ReqTimeBlockContention, 0.0 ) + COALESCE ( rroad_group_2_property_pivot.ReqTimeBlockIO, 0.0 ) + COALESCE ( rroad_group_2_property_pivot.ReqTimeBlockLock, 0.0 ) + COALESCE ( rroad_group_2_property_pivot.ReqTimeUnscheduled, 0.0 ), rroad_group_2_property_pivot.transaction_running_time = IF TRIM ( COALESCE ( rroad_group_2_property_pivot.TransactionStartTime, '' ) ) = '' THEN 0 ELSE DATEDIFF ( MILLISECOND, rroad_f_dateadd_msec ( rroad_sample_set.datediff_msec_between_target_and_local, rroad_group_2_property_pivot.TransactionStartTime ), rroad_sample_set.sample_finished_at ) -- also called "sample_recorded_at" ENDIF, rroad_group_2_property_pivot.time_since_last_request = IF TRIM ( COALESCE ( rroad_group_2_property_pivot.LastReqTime, '' ) ) = '' THEN 0 ELSE DATEDIFF ( MILLISECOND, rroad_f_dateadd_msec ( rroad_sample_set.datediff_msec_between_target_and_local, rroad_group_2_property_pivot.LastReqTime ), rroad_sample_set.sample_finished_at ) -- also called "sample_recorded_at" ENDIF, rroad_group_2_property_pivot.index_satisfaction = LESSER ( GREATER ( COALESCE ( CAST ( ROUND ( CAST ( rroad_group_2_property_pivot.IndLookup - LESSER ( rroad_group_2_property_pivot.IndLookup, rroad_group_2_property_pivot.FullCompare ) AS DECIMAL ( 30, 6 ) ) / CAST ( GREATER ( 1, rroad_group_2_property_pivot.IndLookup ) AS DECIMAL ( 30, 6 ) ) * 100.0, 0 ) AS DECIMAL ( 30, 0 ) ), 0.0 ), 0.0 ), 100.0 ); -- limit the index_satisfaction to between 0 and 100 COMMIT;
...and After
Here's the same update after it was changed into a FOR loop as follows:- The CHECKPOINT and SELECT SYSTAB.count statements on lines 4 through 9 are extras, added to facilitate the "Row x of y updated" MESSAGE statement on line 141.
- The SELECT on lines 14 through 32 was formed by taking every table_name.column_name referenced in the original UPDATE SET clause and turning it into a local FOR loop variable using the AS @column_name alias specification.
- The FROM clause on lines 33 through 44 was copied directly from the original set-oriented UPDATE statement.
- The UPDATE SET clause on lines 52 through 131 was edited to replace every table_name.column_name with the corresponding @column_name defined in the SELECT.
- The IF statement on lines 135 through 148 issues a COMMIT for every one million rows updated, as well as producing a "Row x of y updated" MESSAGE. The code for the function rroad_f_number_with_commas() isn't shown here, but all it does is insert commas in big numbers like 9,999,999.
DECLARE @rroad_group_2_property_pivot_row_count BIGINT; DECLARE @fetch_counter BIGINT; CHECKPOINT; SELECT SYSTAB.count INTO @rroad_group_2_property_pivot_row_count FROM SYSTAB WHERE SYSTAB.table_name = 'rroad_group_2_property_pivot'; SET @fetch_counter = 0; FOR f_fetch AS c_fetch NO SCROLL CURSOR FOR SELECT blocking_connection.blocked_connection_count AS @blocked_connection_count, rroad_group_2_property_pivot.ReqStatus AS @ReqStatus, rroad_group_2_property_pivot.CacheHits AS @CacheHits, rroad_group_2_property_pivot.CacheRead AS @CacheRead, rroad_group_2_property_pivot.LoginTime AS @LoginTime, rroad_sample_set.sample_finished_at AS @sample_finished_at, rroad_group_2_property_pivot.ReqCountBlockContention AS @ReqCountBlockContention, rroad_group_2_property_pivot.ReqCountBlockIO AS @ReqCountBlockIO, rroad_group_2_property_pivot.ReqCountBlockLock AS @ReqCountBlockLock, rroad_group_2_property_pivot.ReqCountUnscheduled AS @ReqCountUnscheduled, rroad_group_2_property_pivot.ReqTimeBlockContention AS @ReqTimeBlockContention, rroad_group_2_property_pivot.ReqTimeBlockIO AS @ReqTimeBlockIO, rroad_group_2_property_pivot.ReqTimeBlockLock AS @ReqTimeBlockLock, rroad_group_2_property_pivot.ReqTimeUnscheduled AS @ReqTimeUnscheduled, rroad_group_2_property_pivot.TransactionStartTime AS @TransactionStartTime, rroad_sample_set.datediff_msec_between_target_and_local AS @datediff_msec_between_target_and_local, rroad_group_2_property_pivot.LastReqTime AS @LastReqTime, rroad_group_2_property_pivot.IndLookup AS @IndLookup, rroad_group_2_property_pivot.FullCompare AS @FullCompare FROM rroad_group_2_property_pivot INNER JOIN rroad_sample_set ON rroad_sample_set.sample_set_number = rroad_group_2_property_pivot.sample_set_number LEFT OUTER JOIN ( SELECT rroad_group_2_property_pivot.sample_set_number AS sample_set_number, rroad_group_2_property_pivot.BlockedOn AS blocking_connection_number, COUNT(*) AS blocked_connection_count FROM rroad_group_2_property_pivot WHERE rroad_group_2_property_pivot.BlockedOn > 0 GROUP BY rroad_group_2_property_pivot.sample_set_number, rroad_group_2_property_pivot.BlockedOn ) AS blocking_connection ON blocking_connection.sample_set_number = rroad_group_2_property_pivot.sample_set_number AND blocking_connection.blocking_connection_number = rroad_group_2_property_pivot.connection_number FOR UPDATE DO SET @fetch_counter = @fetch_counter + 1; UPDATE rroad_group_2_property_pivot SET rroad_group_2_property_pivot.blocked_connection_count = COALESCE ( @blocked_connection_count, 0 ), rroad_group_2_property_pivot.current_req_status = IF TRIM ( COALESCE ( @ReqStatus, '' ) ) = '' THEN '-' ELSE CASE TRIM ( @ReqStatus ) WHEN 'Idle' THEN 'Idle' WHEN 'Unscheduled' THEN 'Waiting for thread' WHEN 'BlockedIO' THEN 'Waiting for I/O' WHEN 'BlockedContention' THEN 'Waiting for shared resource' WHEN 'BlockedLock' THEN 'Blocked by lock' WHEN 'Executing' THEN 'Executing' ELSE TRIM ( @ReqStatus ) END ENDIF, rroad_group_2_property_pivot.cache_satisfaction = LESSER ( GREATER ( COALESCE ( CAST ( ROUND ( CAST ( @CacheHits AS DECIMAL ( 30, 6 ) ) / CAST ( GREATER ( 1, @CacheRead ) AS DECIMAL ( 30, 6 ) ) * 100.0, 0 ) AS DECIMAL ( 30, 0 ) ), 0.0 ), 0.0 ), 100.0 ), -- limit the cache_satisfaction to between 0 and 100 rroad_group_2_property_pivot.time_connected = IF TRIM ( COALESCE ( @LoginTime, '' ) ) = '' THEN 0 ELSE DATEDIFF ( MILLISECOND, rroad_f_dateadd_msec ( 3, @LoginTime ), @sample_finished_at ) -- also called "sample_recorded_at" ENDIF, rroad_group_2_property_pivot.total_waits = COALESCE ( @ReqCountBlockContention, 0 ) + COALESCE ( @ReqCountBlockIO, 0 ) + COALESCE ( @ReqCountBlockLock, 0 ) + COALESCE ( @ReqCountUnscheduled, 0 ), rroad_group_2_property_pivot.waiting_time = COALESCE ( @ReqTimeBlockContention, 0.0 ) + COALESCE ( @ReqTimeBlockIO, 0.0 ) + COALESCE ( @ReqTimeBlockLock, 0.0 ) + COALESCE ( @ReqTimeUnscheduled, 0.0 ), rroad_group_2_property_pivot.transaction_running_time = IF TRIM ( COALESCE ( @TransactionStartTime, '' ) ) = '' THEN 0 ELSE DATEDIFF ( MILLISECOND, rroad_f_dateadd_msec ( @datediff_msec_between_target_and_local, @TransactionStartTime ), @sample_finished_at ) -- also called "sample_recorded_at" ENDIF, rroad_group_2_property_pivot.time_since_last_request = IF TRIM ( COALESCE ( @LastReqTime, '' ) ) = '' THEN 0 ELSE DATEDIFF ( MILLISECOND, rroad_f_dateadd_msec ( @datediff_msec_between_target_and_local, @LastReqTime ), @sample_finished_at ) -- also called "sample_recorded_at" ENDIF, rroad_group_2_property_pivot.index_satisfaction = LESSER ( GREATER ( COALESCE ( CAST ( ROUND ( CAST ( @IndLookup - LESSER ( @IndLookup, @FullCompare ) AS DECIMAL ( 30, 6 ) ) / CAST ( GREATER ( 1, @IndLookup ) AS DECIMAL ( 30, 6 ) ) * 100.0, 0 ) AS DECIMAL ( 30, 0 ) ), 0.0 ), 0.0 ), 100.0 ) -- limit the index_satisfaction to between 0 and 100 WHERE CURRENT OF c_fetch; IF @fetch_counter = 1 OR MOD ( @fetch_counter, 1000000 ) = 0 OR @fetch_counter = @rroad_group_2_property_pivot_row_count THEN COMMIT; MESSAGE STRING ( 'Row ', rroad_f_number_with_commas ( @fetch_counter ), ' of ', rroad_f_number_with_commas ( @rroad_group_2_property_pivot_row_count ), ' updated.' ) TO CONSOLE; END IF; END FOR; COMMIT;
Here's what the FOR loop displayed for a test that began at 16:01:48:
I. 10/11 16:04:08. Row 1 of 6,133,762 updated. I. 10/11 16:13:20. Row 1,000,000 of 6,133,762 updated. I. 10/11 16:22:42. Row 2,000,000 of 6,133,762 updated. I. 10/11 16:31:00. Row 3,000,000 of 6,133,762 updated. I. 10/11 16:48:38. Row 4,000,000 of 6,133,762 updated. I. 10/11 17:01:57. Row 5,000,000 of 6,133,762 updated. I. 10/11 17:15:44. Row 6,000,000 of 6,133,762 updated. I. 10/11 17:16:51. Row 6,133,762 of 6,133,762 updated.
The time between 16:01:48 and 16:04:08 was taken up by the FOR SELECT before the first row was fetched, and the rest of the time was spent doing the updates.
Total time for the UPDATE WHERE CURRENT OF cursor test...
75 minutes
And for the set-oriented UPDATE?
15 minutes!
Safe to say, cursor loops suck when it comes to performance... and this cursor loop was inside a stored procedure!
What would a client-server fetch loop have been like, with every row being schlepped across the Great ODBC Divide? or JDBC? Or, heaven forfend, LINQ?
Set-oriented UPDATE rules!
But what about the limit? What about the commits? More on this later...
3 comments:
What about the performance of an UPDATE TOP n START AT m ... ORDER BY ... style update, i.e. using a WHILE loop to split up the update in bunches of say, million rows each (or preferably much smaller bunches) and printing messages and commiting inbetween?
Unless such an order is difficult to find, this should give you the "best of both worlds", and I would expect the performance to be as good (or possibly better because of lesser locks and a smaller rollback log) than with one huge set-oriented UPDATE.
Just my 2,000,000 cents
Volker
@Volker: Funny you should ask :)
http://sqlanywhere.blogspot.com/2011/10/set-update-with-top-start-at-and-commit.html
Alas, AFAIK your optimism about performance is misplaced... adding an ORDER BY to a query is a rather unlikely way to speed things up :)
@Breck: Well, I should've known something big was coming the way:)
My optimism w.r.t. to performance was focussed on the "more frequent commits -> less contention and smaller rollback log" issue. That might still hold. As the original UPDATE wasn't documented here, I have not thought of the implications by an ORDER BY but that's a point, I agree:)
Do you have test results with smaller update sizes (100K or the like)?
----
FWIW: I do use this technique sometimes in a different way:
"UPDATE TOP n" of those rows that are not already "handled" - which means the UPDATE will set some condition on the rows that prevent them from being selected for the UPDATE the next time - so there's no need to use an START AT clause, and the refetching might become faster as the regarded set becomes smaller... And one can simply leave the loop when @@rowcount reaches 0...
I'm not sure whether this kind of check is apropriate in your situation...
Volker
Post a Comment