Wednesday, January 27, 2010

OMIGOSH! Proxy tables are FAST!

Normally, everything about a SQL Anywhere Beta is Super Secret Stuff but Sybase has given me explicit permission to publish this article.

There are strings attached:
  • I have to tell you where to go (to get the Beta ...why, what did you think I meant? :)

  • I have to tell you that I received explicit permission from Sybase to publish this.

  • I have to tell you the Beta is Super Secret Stuff... that the Beta agreement does not normally allow articles like this.
There, all done, twice.
OK, got that? Here's the story: I've been working for oh-too-long on an article entitled "Downloading Data into a SQL Anywhere In-Memory Database from Microsoft SQL Server", and that article discusses several techniques using SQL Server 2008 and SQL Anywhere 11.0.1. Two of those techniques are BCP and LOAD TABLE (always the fastest) and proxy tables (always the slowest).

Folks who know about BCP will claim nothing on earth can unload data from a SQL Server database faster than that, and folks who know about SQL Anywhere will say the same thing about loading data via LOAD TABLE.

And lots of people will tell that proxy tables, well, suck when it comes to performance.

All of that is is certainly true for SQL Anywhere 11.0.1: the BCP and LOAD TABLE test for the article took 720 seconds for 1.9 million rows whereas INSERT ... SELECT * FROM a proxy table took 4230 seconds... think "rocket fast" versus "continental drift".

But that's all changing with SQL Anywhere Innsbruck! The INSERT ... SELECT * FROM a proxy table only took 687 seconds! Proxy tables went from last to first place in the performance tests!

Here's my guess what's happened: First, the iAnywhere engineers have made some performance improvements to proxy tables, that I've been told is true.

Second, and this is guesswork on my part, the proxy table path runs directly from one database to another, it doesn't involve writing to or reading from an external disk file like BCP and LOAD TABLE. So, that bit of extra file I/O overhead is always gonna be an anchor 'round the neck of BCP and LOAD TABLE.

FYI, here is my very first post on the SQL Anywhere Beta forum (the "over a minute" claim was wrong, the proxy table test was only 33 seconds faster than BCP and LOAD TABLE):


Subject: OMIGOSH! Proxy tables are FAST!

Here's the full text of my forthcoming article "Top 1 New Features Of
SQL Anywhere Innsbruck"... seriously... I *live* for proxy tables...

CREATE SERVER mss
CLASS 'MSSODBC'
USING 'DSN=main_BRECK-PC';

CREATE EXTERNLOGIN DBA
TO mss
REMOTE LOGIN "sa"
IDENTIFIED BY 'j68Fje9#fyu489';

CREATE EXISTING TABLE proxy_mss_source
AT 'mss.main.dbo.mss_source';

INSERT sa_target
SELECT *
FROM proxy_mss_source;

-- 11.0.1...
1925469 row(s) inserted
Execution time: 4229.875 seconds

-- Innsbruck...
1925469 row(s) inserted
Execution time: 686.859 seconds

...ok, I'm REALLY serious, that just beat the combination of MSS 2008
BCP and 11.0.1 LOAD TABLE for transferring data straight across, by
over a minute, and hey, NOTHING beats BCP and LOAD TABLE... nothing!

Woohoo!

Breck Is Tilting On His Axis

Friday, January 15, 2010

When did Google become perfect?

This problem had been bugging me for years: I couldn't figure out how to use wonderfully-modern Wordpad instead of awkwardly-ancient Notepad to edit *.bat files.

So this morning I just bit the bullet, typed my question into Google...

how do I change the default editor for batch files
Check it out, see how stupid I feel now, not at the answer, but that sometimes when I have a question
Google is not the first thing I try.
That's just dumb.

How long has this interweb been going on?

Wednesday, January 13, 2010

Ranking SQLA

The SQLA question & answer website for SQL Anywhere ranks well down in the list of StackExchange beta sites:

  • 45th in number of questions

  • 45th in number of answers

  • 53rd in number of registered users

  • 47th in number of answers per question
But... SQLA isn't anywhere near the bottom, there are 154 sites in the list.

Some thoughts:
  • Some of those 154 sites are dead... ok, lots of them are dead, but the StackExchange beta software is sure getting a good workout.

  • SQLA hasn't been actively promoted yet. That's partly because everything's still in beta but mostly because there's no end-user backup yet (where "end user" means "site administrator", which is me).

  • The fact that SQLA has essentially the same rank in all 4 categories above means something... not sure what... maybe someone at mathoverflow can answer that.

  • Speaking of mathoverflow, that's where to go for Ego Therapy. If you think you're really smart try answering some of their questions.

  • If your head doesn't explode first, you will come away with shoulders slumped and a humbler view of your place in the universe.

  • Unless you really are smart. In which case please don't stare at me :)

  • Some of the other sites are pretty interesting and a lot less intimidating.
Here's the full list at the end of 2009...

Monday, January 11, 2010

Crosstab, Rotate, Pivot

Shameless promotion, that's what the title is all about: Include all the keywords anyone might use when asking this question:

Question: How do I rotate a table so that different row values in a single column become different columns in a new table?

Answer: I could have sworn I talked about this years ago, but apparently not... at least not anywhere people can find it.

Is that a subtle dig at the NNTP newsgroups? Yes, I'm sorry, it is... I'll try to be less subtle: NNTP sucks, the days of the newsgroups are numbered, and the future for SQL Anywhere questions and answers lies with SQLA! <g>
SQL Anywhere doesn't have the ability to rotate (crosstab, pivot, whatever) a table, and that feature isn't coming any time soon. What it does have is EXECUTE IMMEDIATE, plus the ability to code IF expressions just about anywhere in the SELECT statement. You can combine those two features to kludge a solution.
Is "kludge" too strong a word? Maybe so, since performance is pretty good. Good enough for this technique to be used in real time by the Foxhound database monitor when analyzing data gathered from thousands of database connections. But, no question... the code's funky.
It's actually hard to explain what rotating a table is all about, it's not exactly as simple as turning rows into columns and vice versa... it's easier to use an example.

Here's a table showing sales by state and quarter:

-- Part 1: Initialize data.

BEGIN
DROP TABLE t1;
EXCEPTION WHEN OTHERS THEN
END;

CREATE TABLE t1 (
c1 VARCHAR ( 10 ) NOT NULL,
c2 VARCHAR ( 10 ) NOT NULL,
c3 INTEGER NOT NULL,
PRIMARY KEY ( c1, c2 ) );

INSERT t1 VALUES ( 'CA', 'Q1', 1000 );
INSERT t1 VALUES ( 'CA', 'Q2', 2000 );
INSERT t1 VALUES ( 'CA', 'Q3', 9000 );
INSERT t1 VALUES ( 'CA', 'Q4', 7000 );

INSERT t1 VALUES ( 'NY', 'Q1', 4000 );
INSERT t1 VALUES ( 'NY', 'Q2', 5000 );
INSERT t1 VALUES ( 'NY', 'Q3', 1000 );
INSERT t1 VALUES ( 'NY', 'Q4', 6000 );

INSERT t1 VALUES ( 'FL', 'Q1', 9000 );
INSERT t1 VALUES ( 'FL', 'Q2', 7000 );
INSERT t1 VALUES ( 'FL', 'Q3', 2000 );
INSERT t1 VALUES ( 'FL', 'Q4', 1000 );

INSERT t1 VALUES ( 'AZ', 'Q1', 5000 );
INSERT t1 VALUES ( 'AZ', 'Q2', 5000 );
INSERT t1 VALUES ( 'AZ', 'Q3', 1000 );
INSERT t1 VALUES ( 'AZ', 'Q4', 3000 );

INSERT t1 VALUES ( 'MA', 'Q1', 2000 );
INSERT t1 VALUES ( 'MA', 'Q2', 6000 );
INSERT t1 VALUES ( 'MA', 'Q3', 5000 );
INSERT t1 VALUES ( 'MA', 'Q4', 3000 );
COMMIT;

SELECT * FROM t1 ORDER BY c1, c2;

So far, so good... the table is nicely normalized, everything's elegant... and useless:

c1 c2 c3
AZ Q1 5000
AZ Q2 5000
AZ Q3 1000
AZ Q4 3000
CA Q1 1000
CA Q2 2000
CA Q3 9000
CA Q4 7000
FL Q1 9000
FL Q2 7000
FL Q3 2000
FL Q4 1000
MA Q1 2000
MA Q2 6000
MA Q3 5000
MA Q4 3000
NY Q1 4000
NY Q2 5000
NY Q3 1000
NY Q4 6000

What folks often want to see is something like this, sales by quarter for each state:

c2 AZ CA FL MA NY
Q1 5000 1000 9000 2000 4000
Q2 5000 2000 7000 6000 5000
Q3 1000 9000 2000 5000 1000
Q4 3000 7000 1000 3000 6000

Here's how you can do that in SQL Anywhere:

-- Part 2: Pivot c1 values into columns.

BEGIN
DECLARE @sql LONG VARCHAR;
SET @sql = 'SELECT c2';
FOR f_fetch
AS c_fetch NO SCROLL CURSOR FOR
SELECT DISTINCT t1.c1 AS @c1
FROM t1
ORDER BY t1.c1
FOR READ ONLY
DO
SET @sql = STRING (
@sql,
', SUM ( ( IF t1.c1 = ''',
@c1,
''' THEN 1 ELSE 0 ENDIF ) * t1.c3 ) AS "',
@c1,
'"' );
END FOR;
SET @sql = STRING (
@sql,
' INTO #t1 FROM t1 GROUP BY c2' );
MESSAGE @sql TO CONSOLE;
EXECUTE IMMEDIATE @sql;
SELECT * FROM #t1 ORDER BY c2; -- pivot table
END;

What if you want to see sales by state for each quarter?

c1 Q1 Q2 Q3 Q4
AZ 5000 5000 1000 3000
CA 1000 2000 9000 7000
FL 9000 7000 2000 1000
MA 2000 6000 5000 3000
NY 4000 5000 1000 6000

Here's the code for that:

-- Part 3: Pivot c2 values into columns.

BEGIN
DECLARE @sql LONG VARCHAR;
SET @sql = 'SELECT c1';
FOR f_fetch
AS c_fetch NO SCROLL CURSOR FOR
SELECT DISTINCT t1.c2 AS @c2
FROM t1
ORDER BY t1.c2
FOR READ ONLY
DO
SET @sql = STRING (
@sql,
', SUM ( ( IF t1.c2 = ''',
@c2,
''' THEN 1 ELSE 0 ENDIF ) * t1.c3 ) AS "',
@c2,
'"' );
END FOR;
SET @sql = STRING (
@sql,
' INTO #t1 FROM t1 GROUP BY c1' );
MESSAGE @sql TO CONSOLE;
EXECUTE IMMEDIATE @sql;
SELECT * FROM #t1 ORDER BY c1; -- pivot table
END;

Here's where the magic lies, in the SUMs of row values multiplied by 1 or 0; the following SELECT statements are generated by the code shown above:

SELECT c2,
SUM ( ( IF t1.c1 = 'AZ' THEN 1 ELSE 0 ENDIF ) * t1.c3 ) AS "AZ",
SUM ( ( IF t1.c1 = 'CA' THEN 1 ELSE 0 ENDIF ) * t1.c3 ) AS "CA",
SUM ( ( IF t1.c1 = 'FL' THEN 1 ELSE 0 ENDIF ) * t1.c3 ) AS "FL",
SUM ( ( IF t1.c1 = 'MA' THEN 1 ELSE 0 ENDIF ) * t1.c3 ) AS "MA",
SUM ( ( IF t1.c1 = 'NY' THEN 1 ELSE 0 ENDIF ) * t1.c3 ) AS "NY"
INTO #t1
FROM t1
GROUP BY c2

SELECT c1,
SUM ( ( IF t1.c2 = 'Q1' THEN 1 ELSE 0 ENDIF ) * t1.c3 ) AS "Q1",
SUM ( ( IF t1.c2 = 'Q2' THEN 1 ELSE 0 ENDIF ) * t1.c3 ) AS "Q2",
SUM ( ( IF t1.c2 = 'Q3' THEN 1 ELSE 0 ENDIF ) * t1.c3 ) AS "Q3",
SUM ( ( IF t1.c2 = 'Q4' THEN 1 ELSE 0 ENDIF ) * t1.c3 ) AS "Q4"
INTO #t1
FROM t1
GROUP BY c1

The "SUM IF 1 OR 0" trick is an old one, used by generations of programmers to write funky code for icky problems, and not just in SQL. In fact, the code shown above is really simple compared with what goes on in the world, just a starting point.

If folks are interested I've got a couple of other demos that build on this technique... not the raw incomprehensible code you find in real-world applications, but hammered down into the basics for presentation... that's actually harder to do, the hammering down, harder than writing the original incomprehensible stuff :)