Wednesday, August 3, 2011

UNLOAD To A Named Pipe Beats UNLOAD COMPRESSED

Folks familiar with how fast UNLOAD and LOAD TABLE run will not be surprised that they're often faster than INSERT SELECT via proxy tables.

An earlier article showed how to take the next step: Use UNLOAD COMPRESSED to make the intermediate file smaller and the process faster, and a followup article talked about UNLOAD COMPRESSED in action.

Then, John Smirnios suggested a third step: use a named pipe instead of a physical file to make the process even faster.

Does A Named Pipe Make A Difference?


It sure does; UNLOAD and LOAD using a named pipe beats UNLOAD and LOAD COMPRESSED using a physical file by a significant margin... 39% for the comparison shown in Figure 1.

Figure 1: UNLOAD COMPRESSED Versus Named Pipe


And no, you don't have break out Visual Studio and sharpen up your multiprogramming C skills... you can do it in SQL. There is a simple demo in the previous article, and this article presents a similar technique, in action, in the real world, in Listing 1.

Listing 1: UNLOAD and LOAD TABLE Via Named Pipe

DECLARE @sql LONG VARCHAR;
...

CREATE SERVER old_foxhound1 CLASS 'SAODBC'
USING 'DRIVER=SQL Anywhere 12;ENG=old_foxhound1;DBN=old_f';

CREATE EXTERNLOGIN DBA TO old_foxhound1 REMOTE LOGIN ... IDENTIFIED BY '...';
...

SET @sql = 'UNLOAD TABLE DBA.rroad_group_1_property_pivot TO ''\\\\.\\pipe\\data''';

SET @sql = REPLACE ( @sql, '''', '''''' );
SET @sql = REPLACE ( @sql, '\\', '\\\\' );
SET @sql = REPLACE ( @sql, '\\', '\\\\' );

SET @sql = 'FORWARD TO old_foxhound1 ''' + @sql + '''';

TRIGGER EVENT rroad_execute_immediate ( @sql = @sql );

LOAD TABLE rroad_group_1_property_pivot FROM '\\\\.\\pipe\\data';

You can read about the mechanics of the FORWARD TO and TRIGGER EVENT statements in the previous arcticle about using named pipes, and in the Help: TRIGGER EVENT and FORWARD TO.
Tip: Don't be discouraged by this in the Help: "The FORWARD TO statement is a server directive and cannot be used in stored procedures, triggers, events, or batches"... FORWARD TO works just fine inside a stored procedure if you wrap it in EXECUTE IMMEDIATE.
The TRIGGER EVENT on line 18 was written with "@sql = @sql" on purpose, not to be confusing (yes, it is, I know) but to show how the name of the special pseudo-parameter (the first @sql) has nothing to do with name of the actual argument (the second @sql). They are different variables, with different scopes, even though they look the same. Magic is sometimes (always?) confusing.

Listing 2 shows the event itself, a general-purpose module that runs EXECUTE IMMEDIATE on a separate connection.

Listing 2: Run EXECUTE IMMEDIATE Asynchronously

CREATE EVENT rroad_execute_immediate
HANDLER BEGIN

DECLARE @sql LONG VARCHAR;

SET @sql = COALESCE ( EVENT_PARAMETER ( '@sql' ), '' );

IF @sql <> '' THEN
EXECUTE IMMEDIATE @sql;
END IF;

END;

Now there's a THIRD "@sql", a local variable that receives its value from the magic pseudo-parameter @sql via EVENT_PARAMETER(). OK, I promise... this is the first and last time I'll use the same name for three different things :)

Question: Why is FORWARD TO necessary?
Answer: The whole process is under the control of code running in the target database (Listing 1) but the UNLOAD statement must run on the source database. The FORWARD TO statement sends it over there.

Question: Why is EXECUTE IMMEDIATE necessary?
Answer: Because it is... necessary, that is. For some unexplained reason, you can't run a naked FORWARD TO inside a stored procedure, you have to put some pants on it; i.e., wrap it in an EXECUTE IMMEDIATE.

Question: Why is an EVENT necessary?
Answer: Because when you run the EXECUTE IMMEDIATE it will sit there and wait until the FORWARD TO is finished... and the FORWARD TO will sit there and wait until the UNLOAD statement is finished, and the UNLOAD statement will run until the last byte is written to the named pipe. The EVENT runs on a separate connection, so all this waiting is done on that connection, leaving the original connection (the one running the code in Listing 1) to get on with life... to get on with running the LOAD TABLE on line 20. In other words, the code in Listing 1 can start reading data from the named pipe before the UNLOAD statement has finished writing it; Figure 2 shows one CPU busy with the UNLOAD and another one busy with the LOAD TABLE.

Note: The EVENT, the FORWARD TO and the EXECUTE IMMEDIATE all run on the target database, as does the LOAD TABLE; only the UNLOAD statement runs on the source database.

Figure 2: UNLOAD and LOAD TABLE Via Named Pipe In Action


What's Wrong With This Picture?


Figure 1 isn't exactly comparing apples-to-apples: the UNLOAD COMPRESSED test was run using two separate SQL Anywhere 12 servers on one computer, one to do the UNLOAD and the other to do the LOAD, while only one server was used to run both databases in the named pipe test. Part of the 39% savings might have come from switching to a single server, OR the savings might have been larger if the UNLOAD COMPRESSED test had also been run on one server, OR it might have made no difference... only further experimentation will determine that.

It's not like SQL Anywhere was starved for CPU or RAM (Intel Core i7 with 8G) but it WAS starved for disk resources. Figure 2 shows that the Disk Queue Length regularly hit 5: that's terrible; 0.5 is a better number, 0.05 even better. Five means a whole bunch of work is queued up for disk resources, something which happens regularly when you're unloading data from one database disk file and loading it into another at the same time, using the same physical disk drive... like in Figure 2.

So yeah, if I was serious about performance, I'd be using two drives, wouldn't I?

[fretful sighing sound]

...performance and tuning never ends :)

And The Answer Is... Door Number Two!


It turns out the correct answer is "... OR the savings might have been larger if the UNLOAD COMPRESSED test had also been run on one server". Figure 3 shows the times from Figure 1 as Test A and Test C, with Test B in the middle showing the times for UNLOAD COMPRESSED where both databases were running on the same SQL Anywhere server.

Figure 3: UNLOAD COMPRESSED on 1 and 2 Servers Versus Named Pipe


So yes, the 39% figure is correct, and yes, UNLOAD and LOAD using a named pipe does beat UNLOAD COMPRESSED.

Question: Why is only one server used for the named pipe test?
Answer: Because it has to be that way... the \\.\pipe\data thing just does not work if the UNLOAD is running on one server and the LOAD TABLE on another, even if both servers are running on the same computer. Perhaps by the time you read this, there will be an answer posted to this question on the SQL Anywhere Forum.

Who knows, maybe it will be faster using two servers, the UNLOAD COMPRESSED test certainly was.



1 comment:

Breck Carter said...

@Dheeraj: I am so glad you found what you were looking for on my blog! and I am glad you wrote, it has been many years since I last experienced the beautiful rows and columns of Jammu, the magnificent servers of Srinagar, and the thrill of white water rafting through the named pipes of Gulmarg. I will surely consider a Kashmir Tour Package for my next pilgrimage... thank you!