Friday, November 26, 2010

Pushing UNLOAD

Hey, lookit here, a whole WEEK of "Pushing" articles!

First it was Pushing OPENSTRING and CROSS APPLY which talked about splitting tab-delimited strings from one table into multiple columns in another table, and then it was Pushing LOAD which showed how to do the same thing with the new LOAD TABLE USING COLUMN feature.

This article goes back and forth

  • from multiple columns in one table

  • to a single comma-delimited "toString" version of that column in another table,

  • and then back to multiple columns again.
Here are the three features used:
  • the new UNLOAD TABLE INTO VARIABLE feature,

  • the new LOAD TABLE USING VALUE feature, and

  • the new LOAD TABLE USING COLUMN feature described in Pushing LOAD.
These aren't NEW new features, but they're pretty new, and they haven't been talked about much. That's partly because while folks were waiting for these enhancements to LOAD and UNLOAD they discovered (or were taught, if they're like me) how to exploit the LIST() aggregate function in new and wonderful ways. Well, that's my excuse... I've since coded many combined STRING and LIST calls that go on for hundreds of lines to produce entire HTML pages in a single SELECT... awesome stuff and actually readable (well, except for some of the embedded JavaScript and <funkyTags>, but that's not SQL's fault :).

OK, let's get back on topic: Pushing UNLOAD


First, let's start with a nice normalized table where each row contains several columns describing a single radio program:

CREATE TABLE original_radio_program (
program_number BIGINT NOT NULL PRIMARY KEY CLUSTERED,
station VARCHAR ( 100 ) NOT NULL,
dates VARCHAR ( 100 ) NOT NULL,
times VARCHAR ( 100 ) NOT NULL,
duration VARCHAR ( 100 ) NOT NULL,
city VARCHAR ( 100 ) NOT NULL,
state VARCHAR ( 100 ) NOT NULL,
frequency VARCHAR ( 100 ) NOT NULL );

INSERT original_radio_program VALUES (
1, 'WAAV', 'Mon-Fri', '10am', '2 hours', 'Leland', 'NC', '980 AM' );
INSERT original_radio_program VALUES (
2, 'KFAR', 'Mon-Fri', '12pm', '3 hours', 'Fairbanks', 'AK', '660 AM' );
INSERT original_radio_program VALUES (
3, 'News Talk 550', 'Mon-Fri', '10am', '2 hours', 'Gainesville', 'GA', '550 AM' );
INSERT original_radio_program VALUES (
4, 'News Radio 1390', 'Mon-Fri', '10am', '2 hours', 'Jackson', 'TN', '1390 AM' );
INSERT original_radio_program VALUES (
5, 'KENN', 'Sat', '5pm', '3 hours', 'Farmington', 'NM', '1390 AM' );
COMMIT;



Here is the intermediate "toString" table where each row from the input table is going to be squished (a technical term meaning "squashed") into a single string column:

CREATE TABLE toString_radio_program (
toString_number BIGINT NOT NULL
DEFAULT AUTOINCREMENT
PRIMARY KEY CLUSTERED,
toString_row LONG VARCHAR NOT NULL );

Here's the code that does the squishing:

CREATE VARIABLE @original_radio_programs LONG VARCHAR;

UNLOAD TABLE original_radio_program
INTO VARIABLE @original_radio_programs
ROW DELIMITED BY '\x0d\x0a###';

LOAD TABLE toString_radio_program ( toString_row )
USING VALUE @original_radio_programs
DEFAULTS ON
DELIMITED BY ''
ESCAPES OFF
HEXADECIMAL OFF
QUOTES OFF
STRIP OFF
ROW DELIMITED BY '###';

Line 1 creates the connection-level variable to be filled by UNLOAD TABLE INTO VARIABLE. This variable could just as well be a local DECLARE variable in a BEGIN block.
Note: LONG VARCHAR is just a data type like any other, there are no silly syntax rules telling you where you can and cannot code it in SQL Anywhere... generations of application developers wish they could say the same about other programming tools.
The UNLOAD statement on lines 3 to 5 takes all the rows from original_radio_program and dumps them, not into a file like classic UNLOAD, but into the string variable @original_radio_programs.

There are a zillion options you can specify for UNLOAD. In this case all the defaults are OK except for the funky ROW DELIMITED BY on line 5. Here's the story on that: The LOAD TABLE USING COLUMN statement (not the LOAD TABLE on line 7, but the one later on in this article) needs to have a row delimiter stored with each string value. However, the LOAD TABLE USING VALUE on line 7 is going to strip the row delimiters added by the UNLOAD TABLE INTO VARIABLE, so we need to define two sets of row delimiters.

So, the ROW DELIMITED BY value on line 5 specifies five characters: an ordinary carriage return - line feed pair \x0d\x0a, and three # characters... the string stored in @original_radio_programs will have '\x0d\x0a###' at the end of each row instead of just '\x0d\x0a' (the default). More on this in a moment...

The LOAD statement on lines 7 to 15 takes the data from the USING VALUE @original_radio_programs string and ignores everything except the row delimiters. In other words, it ignores the 'quotes' around string columns, and all the commas between field values, and any escapes and hex numbers and blank padding, and treats each row as a single string to be loaded into the toString_radio_program.toString_row column. This behavior is specified by the five options on lines 10 through 14.
Note: Those five values DELIMITED BY '' ESCAPES OFF HEXADECIMAL OFF QUOTES OFF STRIP OFF comprise the magic "what you see is what you get" set of options, very useful when exploiting LOAD and UNLOAD to perform string handling operations as well as bulk data handling. You can read about them in the Help and see another example here.
The DEFAULTS ON option on line 9 tells LOAD TABLE to honor the DEFAULT AUTOINCREMENT setting on the toString_number column (which isn't named in the column list on line 7). This technique is handy for recording the original input order when loading data via LOAD TABLE. It isn't important in this example because the toString_row column contains the program_number value.

...now on to line 15, where the ROW DELIMITED BY option tells LOAD TABLE to leave the \x0d\x0a at the end of each row alone, and just strip off the ###. This technique (mismatched row delimiters for input and output) is useful in other situations too, like moving between SQL Anywhere's powerful UNLOAD syntax and Microsoft SQL Server's feeble BULK INSERT syntax. Here, it's used to accommodate LOAD TABLE USING COLUMN's strange need for row delimiters on each column value.

Here's what the string looks like after the UNLOAD TABLE INTO VARIABLE:



Here's what the intermediate "toString" table looks like after the LOAD TABLE USING VALUE:



Here's some code that goes the other way, back to a reconstituted table containing multiple columns. You've seen this code before in Pushing LOAD:

CREATE TABLE reconstituted_radio_program (
program_number BIGINT NOT NULL PRIMARY KEY CLUSTERED,
station VARCHAR ( 100 ) NOT NULL,
dates VARCHAR ( 100 ) NOT NULL,
times VARCHAR ( 100 ) NOT NULL,
duration VARCHAR ( 100 ) NOT NULL,
city VARCHAR ( 100 ) NOT NULL,
state VARCHAR ( 100 ) NOT NULL,
frequency VARCHAR ( 100 ) NOT NULL );

LOAD TABLE reconstituted_radio_program
USING COLUMN toString_row
FROM toString_radio_program
ORDER BY toString_number;

Yup, it looks the same as original_radio_program:

2 comments:

SPWS said...

I'm glad I read your post. I find it hard to do this in my assignment in Sql training. It gives me idea on how to solve it.

Breck Carter said...

@michael: Good luck! Don't forget, however, that the course you're taking is all about Microsoft SQL Server which is quite different from Sybase SQL Anywhere.