Friday, September 25, 2009

Quassnoi's Explain Extended

I've added a new blog to the "WIDER VIEWS..." list on the right: Quassnoi's Explain Extended.

Lot's of solid SQL content there, in particular this excellent discussion about "thinking in sets": Double-thinking in SQL.

Wednesday, September 23, 2009

Turning Off System Restore Points

I have discovered a new way to slow Microsoft SQL Server's BULK INSERT down so much that it mimics continental drift.

It's been a couple of years since the I discovered the first way to slow down BULK INSERT: leave the PRIMARY KEY and other indexes in place while inserting data. The solution to that problem is to drop all the indexes, run the BULK INSERT and then recreate the indexes... it can be orders of magnitude faster that way.

Why am I talking about BULK INSERT? Because SQL Server is often used together with SQL Anywhere, and it's often necessary to ship data back and forth between the two databases, and BULK INSERT is a very efficient technique... sometimes. The online docs do talk about Optimizing Bulk Import Performance, but the subject of indexes is mentioned only in passing: "How to best increase the performance of a particular bulk import operation is influenced by the following factors: ... Whether the table has indexes."
What's the new way to make BULK INSERT slow? Leave the Windows System Restore feature enabled on the disk drive holding the SQL Server database. That increases disk I/O enormously, and the I/O is split between two different files in two different locations on the same drive... thus making the head move back and forth, slowing things down much more than if only one file was affected.

I've never noticed this kind of disk activity before, perhaps because I haven't seen a display like the following until after installing Windows Vista with its wonderful Resource Monitor:



In the picture above you can see the heaviest write I/O involves a file in the C:\System Volume Information folder which is where Windows automatic restore points are recorded. The SQL Server database and log files (main.mdf and main.ldf) come second and third in terms of write I/O, and this ranking persists for the entire time the BULK INSERT is running.

And then there's all that database file read I/O... I'm guessing all of it is needed for the System Restore feature, as you'll see later when it all disappears.

The solution is to follow the instructions here and turn Windows System Restore off for the disk drive affected, as follows: Control Panel - System - System Protection - Automatic restore points - clear the checkbox next to the drive letter:



Turning off restore points made a dramatic difference in the BULK INSERT disk I/O, which in turn resulted in the elapsed time dropping from 1,612 seconds to 512 seconds for a 1.9 million row data set; that's a factor of 3 improvement. It did this by completely eliminating all I/O to the System Volume Information folder, which in turn allowed the database file I/O to run much faster. It also completely eliminated the database file read I/O as shown in this picture:



Does the Windows System Restore feature affect SQL Anywhere database? I'm not sure, but I think the answer is "no". I tried loading the same data into a SQL Anywhere 11 database via LOAD TABLE and there was no activity in the System Volume Information folder at all.

Oh, and I would love to tell you how much faster SQL Anywhere's LOAD TABLE was than BULK INSERT, but I can't... that would be a "benchmark test", a violation of the license agreements, and it might even exhibit some kind of "measurement bias"... who, me, biased? <g>

Wednesday, September 16, 2009

Compare Tables With UNLOAD INTO VARIABLE

Q: Let's say you have two tables where you think the data is identical; how do you compare them?

A: UNLOAD the tables and compare the resulting files.

Q: What if one of the tables is a proxy table? UNLOAD doesn't work on proxy tables.

A: Use SELECT * INTO #temp_table, then UNLOAD the temporary table.

Q: What if you want to do everything inside a SQL script, without running a separate "compare the files" step?

A: Use UNLOAD INTO VARIABLE and compare the variables with a SQL statement.

Here's an example showing both techniques, SELECT * INTO #temp_table and UNLOAD INTO VARIABLE, on a pair of proxy tables pointing to Microsoft SQL Server and SQL Anywhere databases:


BEGIN
DECLARE @sa_t LONG VARCHAR;
DECLARE @mss_t LONG VARCHAR;

SELECT *
INTO #mss_t
FROM proxy_mss_t;

SELECT *
INTO #sa_t
FROM proxy_sa_t;

UNLOAD TABLE #mss_t
INTO VARIABLE @mss_t;

UNLOAD TABLE #sa_t
INTO VARIABLE @sa_t;

SELECT IF @mss_t = @sa_t
THEN 'EQUAL'
ELSE 'NOT EQUAL'
ENDIF;
END;

Here's the output from dbisql:

if @mss_t = @sa_t then'EQUAL' else 'NOT EQUAL' endif
'EQUAL'

If you're working with local tables, you don't need the separate SELECT * INTO #temp_table statements... and maybe someday, when the proxy table feature is enhanced, you'll won't ever need SELECT * INTO #temp_table.

Other possibilities come to mind:
  • Using UNLOAD SELECT to reformat row layouts, deal with differing numbers of columns, compare subsets, etc.

  • Using a cursor FOR loop to UNLOAD individual rows for comparison.