Wednesday, December 29, 2010

UNLOAD and LOAD Via Named Pipes

'Twas the night before Christmas
and all through the house
not a creature was stirring...


...except John Smirnios, who posted these three comments on the Christmas Eve article UNLOAD COMPRESSED In Action:


John Smirnios said...

Boy, do I have suggestion for you. You might call it 'undocumented' but it's really something you could have tried on your own. Due to historical reasons, though, it's both better and worse than you would think.

The suggestion is to use a named pipe to carry the data. Supposing that both the old and new databases are on the same database server, open one connection to each database and have each connection be used by a separate thread. On the old database, issue an UNLOAD .. TO '\\\\.\\pipe\\data'. That's just a standard Windows named pipe path. On ANOTHER THREAD and using the connection to the new database, *simultaneously* run LOAD TABLE ... FROM '\\\\.\\pipe\\data'. The data will pass straight between the two statements without needing intermediate data files. It's what dbunload does when rebuilding a database.

Okay, so normally that's just using your OS facilities and not an undocumented feature of SA. It's a pretty common technique for UNIX hacks who are familiar with mkfifo(). Here's the caveat, though. Due to historical reasons, SA recognizes any path that starts with \\.\pipe as a local named pipe and doesn't actually open an OS named pipe. Instead, it bypasses all that and shuffles the data via internal buffers. There can only be one such pair of operations in progress at a time since there is only one internal buffer for this purpose. Everything after \\.\pipe is ignored.

There are ways to use real OS named pipes but it's kludgy since at some time in the past we claimed the OS's pipe paths as our own. That should get cleaned up and documented for users who wouldn't think about using named pipes.

Another caveat: if you do use a real OS named pipe (which I haven't showed you how to do), the server is at the mercy of the producer (which in this case is the UNLOAD but could an arbitrary program) to close the pipe. Otherwise, LOAD will wait indefinitely for more data.

-john.

December 24, 2010 9:53 AM

John Smirnios said...

One other caveat occurs to me. If a LOAD TABLE refers to a named pipe, it will not be recoverable unless you use CONTENT LOGGING or ROW LOGGING. Otherwise, the LOAD statement that gets logged will refer to the pipe path and that's not good. For a reload, it's not a big problem since the database gets checkpointed and the log is thrown away. I'm not sure what should be done in "production" -- it's so nice to be able to do the load without logging. That issue needs to be addressed too.

December 24, 2010 10:07 AM

John Smirnios said...

I shouldn't post comments until I've thought everything through, right? The name after \\.\pipe\ does matter somewhat. If the one-and-only internal pipe is in use with another name, the LOAD or UNLOAD will block until it is no longer in use. So, either wait for both the LOAD and UNLOAD to complete before issuing the next pair of statements or use a unique name for each pair. Otherwise, you could end up with two UNLOADs writing into the same pipe and one LOAD might see the data from both UNLOADs (one file concatenated after the other).

December 24, 2010 10:25 AM


Here's proof it works...


First, here are the Windows command files used to create two SQL Anywhere 12 databases ddd1 and ddd2, start them both using one engine ddd, and then fire up one ISQL session on each of the databases:

"%SQLANY12%\Bin32\dbinit.exe"^
ddd1.db

"%SQLANY12%\Bin32\dbinit.exe"^
ddd2.db

"%SQLANY12%\Bin32\dbspawn.exe"^
-f "%SQLANY12%\Bin32\dbsrv12.exe"^
-n ddd^
-o dbsrv12_log.txt^
ddd1.db^
ddd2.db

"%SQLANY12%\Bin32\dbisql.com"^
-c "ENG=ddd;DBN=ddd1;UID=dba;PWD=sql;CON=ddd1"

"%SQLANY12%\Bin32\dbisql.com"^
-c "ENG=ddd;DBN=ddd2;UID=dba;PWD=sql;CON=ddd2"

Here are the SQL commands that create the input table on database ddd1:

CREATE TABLE t1 (
pkey INTEGER NOT NULL PRIMARY KEY,
data INTEGER NOT NULL );

INSERT t1 VALUES ( 1, 2 );
INSERT t1 VALUES ( 2, 2 );
COMMIT;

SELECT * FROM t1;

pkey,data
1,2
2,2

Here are the SQL commands for database ddd2 that bring the data across via UNLOAD and LOAD using the named pipe \\.\pipe\data:

CREATE TABLE t2 (
pkey INTEGER NOT NULL PRIMARY KEY,
data INTEGER NOT NULL );

CREATE SERVER ddd1_server CLASS 'SAODBC'
USING 'DRIVER=SQL Anywhere 12;ENG=ddd;DBN=ddd1';

CREATE EVENT unload
HANDLER BEGIN

DECLARE @sql LONG VARCHAR;

SET @sql = 'UNLOAD TABLE t1 TO ''\\\\.\\pipe\\data''';
SET @sql = REPLACE ( @sql, '''', '''''' );
SET @sql = REPLACE ( @sql, '\\', '\\\\' );
SET @sql = REPLACE ( @sql, '\\', '\\\\' );
SET @sql = STRING ( 'FORWARD TO ddd1_server ''', @sql, '''' );

MESSAGE STRING ( CURRENT TIMESTAMP, ' ***** Step 1: unload starting' ) TO CONSOLE;
MESSAGE STRING ( CURRENT TIMESTAMP, ' ***** ', @sql ) TO CONSOLE;
EXECUTE IMMEDIATE @sql;
MESSAGE STRING ( CURRENT TIMESTAMP, ' ***** Step 2: unload done' ) TO CONSOLE;

END;

MESSAGE STRING ( CURRENT TIMESTAMP, ' ***** TEST STARTING *****' ) TO CONSOLE;
TRIGGER EVENT unload;
MESSAGE STRING ( CURRENT TIMESTAMP, ' ***** Step 3: load starting' ) TO CONSOLE;
LOAD TABLE t2 FROM '\\\\.\\pipe\\data';
MESSAGE STRING ( CURRENT TIMESTAMP, ' ***** Step 4: load done' ) TO CONSOLE;

SELECT * FROM t2;

2010-12-25 11:06:49.890 ***** TEST STARTING *****
2010-12-25 11:06:49.906 ***** Step 1: unload starting
2010-12-25 11:06:49.906 ***** FORWARD TO ddd1_server 'UNLOAD TABLE t1 TO ''\\\\\\\\.\\\\pipe\\\\data'''
2010-12-25 11:06:49.906 ***** Step 3: load starting
2010-12-25 11:06:50.156 ***** Step 2: unload done
2010-12-25 11:06:50.171 ***** Step 4: load done

pkey,data
1,2
2,2

The UNLOAD process is started by the TRIGGER EVENT statement on line 27; that statement starts the event defined on lines 8 through 24. Events run asynchronously, using their own connections to the database, and they meet John's requirement that "each connection be used by a separate thread".

The LOAD process is started by the LOAD TABLE statement on line 29.

The diagnostic MESSAGE output on lines 34 through 39 show that the LOAD processe started before the UNLOAD process completed ("Step 3: load starting" appears before "Step 2: unload done"). This meets John's requirement to "simultaneously run LOAD TABLE".

The SELECT output on lines 41 through 43 prove that the data made it through the named pipe unscathed.

Awesome! I don't know how much faster it is than files, I'm still geeking out over the fact it works!

What's with \\ all \\\\ the \\\\\\\\ backslashes?


Line 13 puts the following string in @sql, after the four string literal escape sequences '\\' are each reduced to a single backslash:
UNLOAD TABLE t1 TO '\\.\pipe\data'
Line 14 doubles up the quotes:
UNLOAD TABLE t1 TO ''\\.\pipe\data''
Lines 15 and 16 quadruple the backslashes:
UNLOAD TABLE t1 TO ''\\\\\\\\.\\\\pipe\\\\data''
Line 17 turns @sql into this FORWARD TO statement:
FORWARD TO ddd1_server 'UNLOAD TABLE t1 TO ''\\\\\\\\.\\\\pipe\\\\data'''
When line 21 processes the FORWARD TO statement, it reduces the doubled quotes and backslash escape sequences in the 'UNLOAD ...' string and sends this statement to the other database:
UNLOAD TABLE t1 TO '\\\\.\\pipe\\data'
When the UNLOAD statement is processed on database ddd1, the escape sequences are reduced (again, sigh), and the following file specification is passed to Windows
\\.\pipe\data
which is what Windows really wants, not \\\\.\\pipe\\data or \\\\\\\\.\\\\pipe\\\\data.

Doncha love escape sequences?

2 comments:

Anonymous said...

Great article - with thanks to John for bringing this idea up and to you for putting it altogether!

One quick suggestion:
I guess you could somewhat simply the escape sequences by using EXECUTE IMMEDIATE WITH ESCAPES OFF.

Volker

Anonymous said...

Err, I wanted to write "simplify" - words should be made as short as possible but no shorter, to paraphrase Einstein:)

Volker