Monday, May 16, 2011

UNLOAD SELECT BULK INSERT

Question: How do I implement high-speed automated data transfer from SQL Anywhere to Microsoft SQL Server?

Answer: Nothing beats SQL Anywhere's UNLOAD statement for speed, and on the SQL Server side it's BULK INSERT. Throw in SQL Anywhere's FORWARD TO statement, and some special characters to keep SQL Server from misinterpreting other special characters in the data as field and row delimiters, and you're on your way to a solution.

Here is a sample of the three statements used to transfer all the rows from the DBA.t1 table to test.user1.t1; the first character in the DELIMITED BY, FIELDTERMINATOR and ROWTERMINATOR clauses may not be visible because it's the hexadecimal 1F character:

UNLOAD 
SELECT *, 
'^&*(' 
FROM "DBA"."t1" 
TO 'D:\\data\\MSSTEST\\t1.TXT' 
DELIMITED BY ' '  
ESCAPES OFF  
QUOTES OFF;

FORWARD TO MSS 
'TRUNCATE TABLE test.user1.t1';

FORWARD TO MSS 
'BULK INSERT test.user1.t1  
FROM ''D:\\data\\MSSTEST\\t1.TXT''  
WITH ( BATCHSIZE = 10000, 
TABLOCK, 
FIELDTERMINATOR = '' '', 
ROWTERMINATOR = '' ^&*(\\n'' )';

The syntax of BULK INSERT is rather funky, especially when dealing with string columns containing line breaks and other special characters. I've had good luck with hex 1F as the MSS BULK INSERT field terminator, and hex 1F followed by ^&*( followed by \x0d\x0a as the row terminator.

The hex 1F character was chosen as something you're *really* unlikely to find inside text, and the ^&*( was chosen to for the same reason. Note that SQL Anywhere's DELIMITED BY is a separator, not a terminator, but the extra column in SELECT *, '^&*(' effectively makes it a terminator. Also, UNLOAD SELECT appends \x0d\x0a to each output line... in MSS-speak that is coded as \n.

Here's an end-to-end demonstration, starting with the download location for SQL Server:
Microsoft SQL Server 2008 Express Edition Service Pack 1
Here's a command line for starting SQL Server's version of Interactive SQL, the osql.exe utility, using the -S server name sssss and -P system administrator password ppppp you specify when you install SQL Server:
"C:\Program Files\Microsoft SQL Server\100\Tools\Binn\osql.exe" -S sssss -U sa -P ppppp

Here are the SQL Server statements to create a test database and user id:
USE master
GO

CREATE DATABASE test
ON 
( NAME = test_dat,
FILENAME = 'D:\data\MSSTEST\MSSTEST.mdf',
SIZE = 200MB )
LOG ON
( NAME = 'test_log',
FILENAME = 'D:\data\MSSTEST\MSSTEST.ldf',
SIZE = 100MB )
GO

USE master
GO

sp_addlogin @loginame = 'user1', @passwd = 'sql', @defdb = 'test'
GO

USE test
GO

sp_grantdbaccess @loginame = 'user1'
GO

GRANT CONTROL TO user1;
GO

Now you can switch to using user1 with osql.exe:
"C:\Program Files\Microsoft SQL Server\100\Tools\Binn\osql.exe" -S sssss -U user1 -P sql

Here are the SQL Server statements for creating the target tables:
USE test
GO

CREATE TABLE user1.t1 (
pkey INTEGER NOT NULL PRIMARY KEY,
data VARCHAR ( 1000 ) NOT NULL )
GO

CREATE TABLE user1.t2 (
pkey INTEGER NOT NULL PRIMARY KEY,
data VARCHAR ( 1000 ) NOT NULL )
GO

Here's how to set up the source data on SQL Anywhere, including string values with embedded line breaks:
CREATE TABLE t1 (
pkey INTEGER NOT NULL PRIMARY KEY,
data VARCHAR ( 1000 ) NOT NULL );

CREATE TABLE t2 (
pkey INTEGER NOT NULL PRIMARY KEY,
data VARCHAR ( 1000 ) NOT NULL );

INSERT t1 VALUES ( 1,
'Hello
World' );

INSERT t1 VALUES ( 2,
'More
lines
of
text' );

INSERT t2 VALUES ( 1,
'Hello
World' );

INSERT t2 VALUES ( 2,
'More
lines
of
text' );

COMMIT;

CHECKPOINT; -- force row counts in SYSTABLE.count to be up-to-date

Here's the code you can run on SQL Anywhere to rapidly push all the data from the DBA.* tables over to SQL Server, replacing what's there:
BEGIN
DECLARE @path_to_data_files      LONG VARCHAR;
DECLARE @SQLA_user_id            VARCHAR ( 100 );
DECLARE @SQLA_field_delimiter    VARCHAR ( 1 );
DECLARE @SQLA_row_delimiter      VARCHAR ( 10 );
DECLARE @MSS_field_delimiter     VARCHAR ( 1 );
DECLARE @MSS_row_delimiter       VARCHAR ( 10 );
DECLARE @MSS_DSN                 VARCHAR ( 100 );
DECLARE @MSS_login_id            VARCHAR ( 100 );
DECLARE @MSS_password            VARCHAR ( 100 );
DECLARE @MSS_database            VARCHAR ( 100 );
DECLARE @MSS_owner               VARCHAR ( 100 );

DECLARE @sql                     LONG VARCHAR;
DECLARE @message                 LONG VARCHAR;
DECLARE @sqlcode                 VARCHAR ( 5 );
DECLARE @errormsg                VARCHAR ( 32767 );
DECLARE @all_done                VARCHAR ( 1 );
DECLARE @attempt_counter         INTEGER;
DECLARE @failure_count           INTEGER;

DECLARE LOCAL TEMPORARY TABLE SQLA_table ( 
table_name           VARCHAR ( 128 ) NOT NULL,
row_count            UNSIGNED BIGINT NOT NULL,
done                 VARCHAR ( 1 ),
CONSTRAINT PRIMARY KEY ( table_name ) )
NOT TRANSACTIONAL;

SET @message = STRING ( '********** Starting **********' );
MESSAGE @message TO CLIENT;
MESSAGE @message TO CONSOLE;

SET TEMPORARY OPTION CLOSE_ON_ENDTRANS = 'OFF';

SET @path_to_data_files    = 'D:\\data\\MSSTEST';

SET @SQLA_user_id          = 'DBA';
SET @SQLA_field_delimiter  = '\x1F';  -- ASCII "UNIT SEPARATOR"; see http://www.cs.tut.fi/~jkorpela/chars/c0.html
--SET @SQLA_field_delimiter   = 'ï'; -- Fails with UTF database
--SET @SQLA_field_delimiter   = '\xEF'; -- Fails with UTF database
SET @SQLA_row_delimiter    = '^&*(';

SET @MSS_field_delimiter   = @SQLA_field_delimiter;
SET @MSS_row_delimiter     = STRING ( @SQLA_field_delimiter, @SQLA_row_delimiter, '\\n' );
SET @MSS_DSN               = 'MSSTEST';
SET @MSS_login_id          = 'user1';
SET @MSS_password          = 'sql';
SET @MSS_database          = 'test';
SET @MSS_owner             = 'user1';

BEGIN
DROP EXTERNLOGIN DBA TO MSS;
EXCEPTION WHEN OTHERS THEN
END;

BEGIN
DROP SERVER MSS;
EXCEPTION WHEN OTHERS THEN
END;

SET @sql = STRING ( 
'CREATE SERVER MSS CLASS ''MSSODBC'' USING ''', 
@MSS_DSN, 
'''' );
MESSAGE @sql TO CONSOLE;
EXECUTE IMMEDIATE @sql;

SET @sql = STRING ( 'CREATE EXTERNLOGIN DBA TO MSS REMOTE LOGIN ',  
@MSS_login_id,  
' IDENTIFIED BY ',  
@MSS_password );
MESSAGE @sql TO CONSOLE;
EXECUTE IMMEDIATE @sql;

----------------------------------------------------------------------------------
-- UNLOAD all non-empty xxx tables.

SET @message = STRING ( '********** Step 1 - UNLOAD **********' );
MESSAGE @message TO CLIENT;
MESSAGE @message TO CONSOLE;

FOR f_fetch1
AS c_fetch1 NO SCROLL CURSOR FOR
SELECT 
SYSTABLE.table_name   AS @table_name,
SYSTABLE.count        AS @row_count
FROM SYSTABLE
WHERE USER_NAME ( SYSTABLE.creator ) = @SQLA_user_id
AND SYSTABLE.count > 0
ORDER BY SYSTABLE.table_name
FOR READ ONLY
DO

SET @message = STRING (
'SQL Anywhere UNLOAD SELECT ',
@row_count,
'\x09 rows from ',
@table_name );
MESSAGE @message TO CLIENT;
MESSAGE @message TO CONSOLE;

-- Template...
-- UNLOAD SELECT *, '^&*(' FROM "uuu"."ttt" TO 'fff' DELIMITED BY '\x1F' ESCAPES OFF QUOTES OFF

SET @sql = STRING ( 
'UNLOAD SELECT *, ''',
@SQLA_row_delimiter,
''' FROM "',
@SQLA_user_id,
'"."',
@table_name,
'" TO ''',
@path_to_data_files,
'\\',
@table_name,
'.TXT'' DELIMITED BY ''',
@SQLA_field_delimiter,
''' ESCAPES OFF QUOTES OFF' );
SET @sql = REPLACE ( @sql, '\\', '\\\\' );
MESSAGE @sql TO CLIENT;
MESSAGE @sql TO CONSOLE;

/* Sample command...
UNLOAD SELECT *, '^&*(' FROM "DBA"."t1" TO 'D:\\data\\MSSTEST\\t1.TXT' DELIMITED BY ' ' ESCAPES OFF QUOTES OFF
*/

EXECUTE IMMEDIATE @sql;

/*
-- Artificial delay in case file system needs to catch up.

COMMIT;
WAITFOR DELAY '00:00:01';
*/

COMMIT;
END FOR;

----------------------------------------------------------------------------------
-- TRUNCATE all MSS tables, empty or not.

SET @message = STRING ( '********** Step 2 - TRUNCATE **********' );
MESSAGE @message TO CLIENT;
MESSAGE @message TO CONSOLE;

FOR f_fetch2
AS c_fetch2 NO SCROLL CURSOR FOR
SELECT 
SYSTABLE.table_name   AS @table_name,
SYSTABLE.count        AS @row_count
FROM SYSTABLE
WHERE USER_NAME ( SYSTABLE.creator ) = @SQLA_user_id
ORDER BY SYSTABLE.table_name
FOR READ ONLY
DO

SET @message = STRING (
'MSS TRUNCATE TABLE ',
@table_name );
MESSAGE @message TO CLIENT;
MESSAGE @message TO CONSOLE;

-- Template...
-- FORWARD TO MSS 'TRUNCATE TABLE ddd.xxx.ttt'

SET @sql = STRING ( 
'FORWARD TO MSS ''TRUNCATE TABLE ',
@MSS_database,
'.',
@MSS_owner,
'.',
@table_name,
'''' );
SET @sql = REPLACE ( @sql, '\\', '\\\\' );
MESSAGE @sql TO CLIENT;
MESSAGE @sql TO CONSOLE;

/* Sample command...
FORWARD TO MSS 'TRUNCATE TABLE test.user1.t2'
*/

BEGIN

EXECUTE IMMEDIATE @sql; -- this will work, or jump to the EXCEPTION

-- Note: The TRUNCATE is not expected to fail.

EXCEPTION

-- Record exception, but otherwise carry on.

WHEN OTHERS THEN

SELECT SQLCODE, ERRORMSG() INTO @sqlcode, @errormsg;

SET @message = STRING (
' EXCEPTION in TRUNCATE: @sqlcode = "',
@sqlcode,
'", "',
' @errormsg = "',
@errormsg,
'"' );
MESSAGE @message TO CLIENT;
MESSAGE @message TO CONSOLE;

END;

COMMIT;
END FOR;

----------------------------------------------------------------------------------
-- BULK INSERT all MSS xxx tables that are non-empty on MSS.

-- Initialize list of tables to work on.

INSERT SQLA_table
SELECT SYSTABLE.table_name,
SYSTABLE.count,
'N'
FROM SYSTABLE
WHERE USER_NAME ( SYSTABLE.creator ) = @SQLA_user_id
AND SYSTABLE.table_type = 'BASE';

SET @all_done = 'N';
SET @attempt_counter = 0;

WHILE @all_done = 'N' LOOP

SET @attempt_counter = @attempt_counter + 1;

SET @message = STRING ( '********** Step 3 - BULK INSERT Attempt ', @attempt_counter, ' **********' );
MESSAGE @message TO CLIENT;
MESSAGE @message TO CONSOLE;

SET @all_done = 'Y'; -- assume this attempt will be successful, until proven otherwise

FOR f_fetch3
AS c_fetch3 NO SCROLL CURSOR FOR
SELECT 
SQLA_table.table_name   AS @table_name,
SQLA_table.row_count    AS @row_count
FROM SQLA_table
WHERE SQLA_table.row_count > 0
AND SQLA_table.done      = 'N'
ORDER BY SQLA_table.table_name
FOR READ ONLY
DO

SET @message = STRING (
'Attempt #',
@attempt_counter,
' MSS BULK INSERT ', 
@row_count,
'\x09 rows into ',
@table_name );
MESSAGE @message TO CLIENT;
MESSAGE @message TO CONSOLE;

-- BULK INSERT ddd.xxx.ttt FROM 'fff' WITH ( BATCHSIZE = 10000, TABLOCK, FIELDTERMINATOR = 'ï', ROWTERMINATOR = 'ï^&*(\n' )

SET @sql = STRING ( 
'FORWARD TO MSS ''BULK INSERT ',
@MSS_database,
'.',
@MSS_owner,
'.',
@table_name,
' FROM ''''', 
@path_to_data_files,
'\\',
@table_name,
'.TXT'''' WITH ( BATCHSIZE = 10000, TABLOCK, FIELDTERMINATOR = ''''',
@MSS_field_delimiter,
''''', ROWTERMINATOR = ''''',
@MSS_row_delimiter,
''''' )''' );
SET @sql = REPLACE ( @sql, '\\', '\\\\' );
MESSAGE @sql TO CLIENT;
MESSAGE @sql TO CONSOLE;

/* Sample command...
FORWARD TO MSS 'BULK INSERT ddd.xxx.ttt FROM ''C:\\Clients\\ddd\\05_db\\data\\ttt.TXT'' WITH ( BATCHSIZE = 10000, TABLOCK, FIELDTERMINATOR = '' '', ROWTERMINATOR = '' ^&*(\\n'' )'
*/

BEGIN

EXECUTE IMMEDIATE @sql; -- this will work, or jump to the EXCEPTION

-- It worked, so mark this table.

UPDATE SQLA_table
SET done = 'Y'
WHERE table_name = @table_name;

-- Note: The BULK INSERT may fail with a "file not found".

EXCEPTION

-- It did not work, so carry on.

WHEN OTHERS THEN

SELECT SQLCODE, ERRORMSG() INTO @sqlcode, @errormsg;

SET @all_done = 'N'; -- at least one exception has been detected

SET @message = STRING (
' EXCEPTION in BULK INSERT: @sqlcode = "',
@sqlcode,
'", "',
' @errormsg = "',
@errormsg,
'"' );
MESSAGE @message TO CLIENT;
MESSAGE @message TO CONSOLE;

END;

COMMIT;
END FOR;

IF @attempt_counter >= 3 THEN
SET @all_done = 'Y' -- no more attempts
END IF;

END LOOP;

----------------------------------------------------------------------------------

SET @message = STRING ( '********** Step 4 - Final Check **********' );
MESSAGE @message TO CLIENT;
MESSAGE @message TO CONSOLE;

SELECT COUNT(*) 
INTO @failure_count
FROM SQLA_table
WHERE SQLA_table.row_count > 0
AND SQLA_table.done      = 'N';

SET @failure_count = COALESCE ( @failure_count, 0 );

IF @failure_count = 0 THEN

SET @message = 'All OK';
MESSAGE @message TO CLIENT;
MESSAGE @message TO CONSOLE;

ELSE

FOR f_fetch4
AS c_fetch4 NO SCROLL CURSOR FOR
SELECT SQLA_table.table_name   AS @table_name,
SQLA_table.row_count    AS @row_count
FROM SQLA_table
WHERE SQLA_table.row_count  > 0
AND SQLA_table.done       = 'N'
ORDER BY SQLA_table.table_name
FOR READ ONLY
DO

SET @message = STRING (
'Failed: MSS BULK INSERT ', 
@row_count,
'\x09 rows into ',
@table_name );
MESSAGE @message TO CLIENT;
MESSAGE @message TO CONSOLE;

COMMIT;
END FOR;

END IF;

----------------------------------------------------------------------------------

COMMIT;
END;

The CREATE SERVER and CREATE EXTERNLOGIN statements on lines 61 through 73 set up a remote server inside SQL Anywhere for directly communicating with SQL Server via ODBC using the FORWARD TO statements later in the code.

EXECUTE IMMEDIATE statements are used throughout this code because variables are used to parameterize everything; e.g., DSN, user ids, delimiter strings, etc.

The FOR loop on lines 82 through 137 walks through SYSTABLE looking for tables to UNLOAD. The FOR loop on lines 146 through 209 does the same for TRUNCATE TABLE statements sent to SQL Server via FORWARD TO.

The INSERT on lines 216 through 222 loads a local table with information used later for status checking.

The WHILE loop on lines 227 through 326 sends the BULK INSERT statements to SQL Server, with the BEGIN block on lines 285 through 317 used to catch, display, record and otherwise ignore exceptions thrown by SQL Server when BULK INSERT statements fail.

The "Final Check" code on lines 328 through 372 produces the "All OK" statement shown at the end of this sample of messages from a test run:
********** Starting **********
********** Step 1 - UNLOAD **********
SQL Anywhere UNLOAD SELECT 2  rows from t1
UNLOAD SELECT *, '^&*(' FROM "DBA"."t1" TO 'D:\\data\\MSSTEST\\t1.TXT' DELIMITED BY ' ' ESCAPES OFF QUOTES OFF
SQL Anywhere UNLOAD SELECT 2  rows from t2
UNLOAD SELECT *, '^&*(' FROM "DBA"."t2" TO 'D:\\data\\MSSTEST\\t2.TXT' DELIMITED BY ' ' ESCAPES OFF QUOTES OFF
********** Step 2 - TRUNCATE **********
MSS TRUNCATE TABLE t1
FORWARD TO MSS 'TRUNCATE TABLE test.user1.t1'
MSS TRUNCATE TABLE t2
FORWARD TO MSS 'TRUNCATE TABLE test.user1.t2'
********** Step 3 - BULK INSERT Attempt 1 **********
Attempt #1 MSS BULK INSERT 2  rows into t1
FORWARD TO MSS 'BULK INSERT test.user1.t1 FROM ''D:\\data\\MSSTEST\\t1.TXT'' WITH ( BATCHSIZE = 10000, TABLOCK, FIELDTERMINATOR = '' '', ROWTERMINATOR = '' ^&*(\\n'' )'
Attempt #1 MSS BULK INSERT 2  rows into t2
FORWARD TO MSS 'BULK INSERT test.user1.t2 FROM ''D:\\data\\MSSTEST\\t2.TXT'' WITH ( BATCHSIZE = 10000, TABLOCK, FIELDTERMINATOR = '' '', ROWTERMINATOR = '' ^&*(\\n'' )'
********** Step 4 - Final Check **********
All OK
1 row(s) affected
Execution time: 6.935 seconds

Here are two queries on SQL Server that show the line breaks have been preserved:
SELECT pkey, LEFT ( data, 20 ) AS data FROM test.user1.t1 ORDER BY pkey
GO
SELECT pkey, LEFT ( data, 20 ) AS data FROM test.user1.t2 ORDER BY pkey
GO

1> SELECT pkey, LEFT ( data, 20 ) AS data FROM test.user1.t1 ORDER BY pkey
2> GO
pkey        data
----------- --------------------
1 Hello
World
2 More
lines
of
text

(2 rows affected)
1> SELECT pkey, LEFT ( data, 20 ) AS data FROM test.user1.t2 ORDER BY pkey
2> GO
pkey        data
----------- --------------------
1 Hello
World
2 More
lines
of
text

(2 rows affected)



3 comments:

Anonymous said...

Breck, you saved my day (or week):)

Exporting lots of SA tables to MS SQL is something that's on my schedule currently, and your script based-approach will prevent me from having to fiddle around with that funky Data Transfer Wizard...

BIG THANKS!

Regards
Volker

Anonymous said...

Breck, thanks again - it works great:)

Two suggestions I'd like to make:

1. I'd fill the temporary table right at the start - this makes it easier to use one central place to add the relevant tables - particular if one does not want to export all (since they do not exist in MSS or to limit for testing purposes).

2. TRUNCATE TABLE fails for tables with FK relationships (even if they are empty, methinks) in MSS. Therefore I have used DELETE FROM and let it run in descending logical (i.e FK) order. In order to use that (as I had created the MSS tables in logical order), I added a creation_date to the temporary table and filled it with data from MSS sysobects.crdate field (by means of a proxy table to sysobects).

You see, you've delivered a real good starting point:)

Volker

Anonymous said...

Volker can you please send me a private email ARMD at pacbell.net.
I have a question about the Sql anywhere 5.X version and unload to MS sql and bulk insert.
Thanks
Alex