Wednesday, January 27, 2010

OMIGOSH! Proxy tables are FAST!

Normally, everything about a SQL Anywhere Beta is Super Secret Stuff but Sybase has given me explicit permission to publish this article.

There are strings attached:
  • I have to tell you where to go (to get the Beta ...why, what did you think I meant? :)

  • I have to tell you that I received explicit permission from Sybase to publish this.

  • I have to tell you the Beta is Super Secret Stuff... that the Beta agreement does not normally allow articles like this.
There, all done, twice.
OK, got that? Here's the story: I've been working for oh-too-long on an article entitled "Downloading Data into a SQL Anywhere In-Memory Database from Microsoft SQL Server", and that article discusses several techniques using SQL Server 2008 and SQL Anywhere 11.0.1. Two of those techniques are BCP and LOAD TABLE (always the fastest) and proxy tables (always the slowest).

Folks who know about BCP will claim nothing on earth can unload data from a SQL Server database faster than that, and folks who know about SQL Anywhere will say the same thing about loading data via LOAD TABLE.

And lots of people will tell that proxy tables, well, suck when it comes to performance.

All of that is is certainly true for SQL Anywhere 11.0.1: the BCP and LOAD TABLE test for the article took 720 seconds for 1.9 million rows whereas INSERT ... SELECT * FROM a proxy table took 4230 seconds... think "rocket fast" versus "continental drift".

But that's all changing with SQL Anywhere Innsbruck! The INSERT ... SELECT * FROM a proxy table only took 687 seconds! Proxy tables went from last to first place in the performance tests!

Here's my guess what's happened: First, the iAnywhere engineers have made some performance improvements to proxy tables, that I've been told is true.

Second, and this is guesswork on my part, the proxy table path runs directly from one database to another, it doesn't involve writing to or reading from an external disk file like BCP and LOAD TABLE. So, that bit of extra file I/O overhead is always gonna be an anchor 'round the neck of BCP and LOAD TABLE.

FYI, here is my very first post on the SQL Anywhere Beta forum (the "over a minute" claim was wrong, the proxy table test was only 33 seconds faster than BCP and LOAD TABLE):

1
<br>Subject: OMIGOSH! Proxy tables are FAST!<br><br>Here's the full text of my forthcoming article "Top 1 New Features Of<br>SQL Anywhere Innsbruck"... seriously... I *live* for proxy tables...<br><br>CREATE SERVER mss <br>   CLASS 'MSSODBC' <br>   USING 'DSN=main_BRECK-PC';<br><br>CREATE EXTERNLOGIN DBA  <br>   TO mss  <br>   REMOTE LOGIN "sa" <br>   IDENTIFIED BY 'j68Fje9#fyu489';<br><br>CREATE EXISTING TABLE proxy_mss_source  <br>   AT 'mss.main.dbo.mss_source';<br><br>INSERT sa_target<br>SELECT *<br>  FROM proxy_mss_source;<br><br>-- 11.0.1...<br>1925469 row(s) inserted<br>Execution time: 4229.875 seconds<br><br>-- Innsbruck...<br>1925469 row(s) inserted<br>Execution time: 686.859 seconds<br><br>...ok, I'm REALLY serious, that just beat the combination of MSS 2008<br>BCP and 11.0.1 LOAD TABLE for transferring data straight across, by<br>over a minute, and hey, NOTHING beats BCP and LOAD TABLE... nothing!<br><br>Woohoo!<br><br>Breck Is Tilting On His Axis<br>

Friday, January 15, 2010

When did Google become perfect?

This problem had been bugging me for years: I couldn't figure out how to use wonderfully-modern Wordpad instead of awkwardly-ancient Notepad to edit *.bat files.

So this morning I just bit the bullet, typed my question into Google...

how do I change the default editor for batch files
Check it out, see how stupid I feel now, not at the answer, but that sometimes when I have a question
Google is not the first thing I try.
That's just dumb.

How long has this interweb been going on?

Wednesday, January 13, 2010

Ranking SQLA

The SQLA question & answer website for SQL Anywhere ranks well down in the list of StackExchange beta sites:

  • 45th in number of questions

  • 45th in number of answers

  • 53rd in number of registered users

  • 47th in number of answers per question
But... SQLA isn't anywhere near the bottom, there are 154 sites in the list.

Some thoughts:
  • Some of those 154 sites are dead... ok, lots of them are dead, but the StackExchange beta software is sure getting a good workout.

  • SQLA hasn't been actively promoted yet. That's partly because everything's still in beta but mostly because there's no end-user backup yet (where "end user" means "site administrator", which is me).

  • The fact that SQLA has essentially the same rank in all 4 categories above means something... not sure what... maybe someone at mathoverflow can answer that.

  • Speaking of mathoverflow, that's where to go for Ego Therapy. If you think you're really smart try answering some of their questions.

  • If your head doesn't explode first, you will come away with shoulders slumped and a humbler view of your place in the universe.

  • Unless you really are smart. In which case please don't stare at me :)

  • Some of the other sites are pretty interesting and a lot less intimidating.
Here's the full list at the end of 2009...

Monday, January 11, 2010

Crosstab, Rotate, Pivot

Shameless promotion, that's what the title is all about: Include all the keywords anyone might use when asking this question:

Question: How do I rotate a table so that different row values in a single column become different columns in a new table?

Answer: I could have sworn I talked about this years ago, but apparently not... at least not anywhere people can find it.

Is that a subtle dig at the NNTP newsgroups? Yes, I'm sorry, it is... I'll try to be less subtle: NNTP sucks, the days of the newsgroups are numbered, and the future for SQL Anywhere questions and answers lies with SQLA! <g>
SQL Anywhere doesn't have the ability to rotate (crosstab, pivot, whatever) a table, and that feature isn't coming any time soon. What it does have is EXECUTE IMMEDIATE, plus the ability to code IF expressions just about anywhere in the SELECT statement. You can combine those two features to kludge a solution.
Is "kludge" too strong a word? Maybe so, since performance is pretty good. Good enough for this technique to be used in real time by the Foxhound database monitor when analyzing data gathered from thousands of database connections. But, no question... the code's funky.
It's actually hard to explain what rotating a table is all about, it's not exactly as simple as turning rows into columns and vice versa... it's easier to use an example.

Here's a table showing sales by state and quarter:
1
<br>-- Part 1: Initialize data.<br><br>BEGIN<br>   DROP TABLE t1;<br>   EXCEPTION WHEN OTHERS THEN<br>END;<br><br>CREATE TABLE t1 (<br>   c1 VARCHAR ( 10 ) NOT NULL,<br>   c2 VARCHAR ( 10 ) NOT NULL, <br>   c3 INTEGER        NOT NULL,<br>   PRIMARY KEY ( c1, c2 ) );<br><br>INSERT t1 VALUES ( 'CA', 'Q1', 1000 );<br>INSERT t1 VALUES ( 'CA', 'Q2', 2000 );<br>INSERT t1 VALUES ( 'CA', 'Q3', 9000 );<br>INSERT t1 VALUES ( 'CA', 'Q4', 7000 );<br><br>INSERT t1 VALUES ( 'NY', 'Q1', 4000 );<br>INSERT t1 VALUES ( 'NY', 'Q2', 5000 );<br>INSERT t1 VALUES ( 'NY', 'Q3', 1000 );<br>INSERT t1 VALUES ( 'NY', 'Q4', 6000 );<br><br>INSERT t1 VALUES ( 'FL', 'Q1', 9000 );<br>INSERT t1 VALUES ( 'FL', 'Q2', 7000 );<br>INSERT t1 VALUES ( 'FL', 'Q3', 2000 );<br>INSERT t1 VALUES ( 'FL', 'Q4', 1000 );<br><br>INSERT t1 VALUES ( 'AZ', 'Q1', 5000 );<br>INSERT t1 VALUES ( 'AZ', 'Q2', 5000 );<br>INSERT t1 VALUES ( 'AZ', 'Q3', 1000 );<br>INSERT t1 VALUES ( 'AZ', 'Q4', 3000 );<br><br>INSERT t1 VALUES ( 'MA', 'Q1', 2000 );<br>INSERT t1 VALUES ( 'MA', 'Q2', 6000 );<br>INSERT t1 VALUES ( 'MA', 'Q3', 5000 );<br>INSERT t1 VALUES ( 'MA', 'Q4', 3000 );<br>COMMIT;<br><br>SELECT * FROM t1 ORDER BY c1, c2;<br>

So far, so good... the table is nicely normalized, everything's elegant... and useless:
1
<br>c1  c2   c3<br>AZ  Q1  5000<br>AZ  Q2  5000<br>AZ  Q3  1000<br>AZ  Q4  3000<br>CA  Q1  1000<br>CA  Q2  2000<br>CA  Q3  9000<br>CA  Q4  7000<br>FL  Q1  9000<br>FL  Q2  7000<br>FL  Q3  2000<br>FL  Q4  1000<br>MA  Q1  2000<br>MA  Q2  6000<br>MA  Q3  5000<br>MA  Q4  3000<br>NY  Q1  4000<br>NY  Q2  5000<br>NY  Q3  1000<br>NY  Q4  6000 <br>

What folks often want to see is something like this, sales by quarter for each state:
1
<br>c2   AZ    CA    FL    MA    NY<br>Q1  5000  1000  9000  2000  4000<br>Q2  5000  2000  7000  6000  5000<br>Q3  1000  9000  2000  5000  1000<br>Q4  3000  7000  1000  3000  6000<br>

Here's how you can do that in SQL Anywhere:
1
<br>-- Part 2: Pivot c1 values into columns.<br><br>BEGIN<br>DECLARE @sql LONG VARCHAR;<br>SET @sql = 'SELECT c2';<br>FOR f_fetch<br>AS c_fetch NO SCROLL CURSOR FOR<br>SELECT DISTINCT t1.c1 AS @c1<br>  FROM t1<br> ORDER BY t1.c1<br>FOR READ ONLY<br>DO<br>   SET @sql = STRING (<br>      @sql,<br>      ', SUM ( ( IF t1.c1 = ''',<br>      @c1,<br>      ''' THEN 1 ELSE 0 ENDIF ) * t1.c3 ) AS "',<br>      @c1,<br>      '"' );<br>END FOR;<br>SET @sql = STRING (<br>   @sql,<br>   ' INTO #t1 FROM t1 GROUP BY c2' );<br>MESSAGE @sql TO CONSOLE;<br>EXECUTE IMMEDIATE @sql;<br>SELECT * FROM #t1 ORDER BY c2; -- pivot table<br>END;<br>

What if you want to see sales by state for each quarter?
1
<br>c1   Q1    Q2    Q3    Q4<br>AZ  5000  5000  1000  3000<br>CA  1000  2000  9000  7000<br>FL  9000  7000  2000  1000<br>MA  2000  6000  5000  3000<br>NY  4000  5000  1000  6000<br>

Here's the code for that:
1
<br>-- Part 3: Pivot c2 values into columns.<br><br>BEGIN<br>DECLARE @sql LONG VARCHAR;<br>SET @sql = 'SELECT c1';<br>FOR f_fetch<br>AS c_fetch NO SCROLL CURSOR FOR<br>SELECT DISTINCT t1.c2 AS @c2<br>  FROM t1<br> ORDER BY t1.c2<br>FOR READ ONLY<br>DO<br>   SET @sql = STRING (<br>      @sql,<br>      ', SUM ( ( IF t1.c2 = ''',<br>      @c2,<br>      ''' THEN 1 ELSE 0 ENDIF ) * t1.c3 ) AS "',<br>      @c2,<br>      '"' );<br>END FOR;<br>SET @sql = STRING (<br>   @sql,<br>   ' INTO #t1 FROM t1 GROUP BY c1' );<br>MESSAGE @sql TO CONSOLE;<br>EXECUTE IMMEDIATE @sql;<br>SELECT * FROM #t1 ORDER BY c1; -- pivot table<br>END;<br>

Here's where the magic lies, in the SUMs of row values multiplied by 1 or 0; the following SELECT statements are generated by the code shown above:
1
<br>SELECT c2, <br>       SUM ( ( IF t1.c1 = 'AZ' THEN 1 ELSE 0 ENDIF ) * t1.c3 ) AS "AZ",  <br>       SUM ( ( IF t1.c1 = 'CA' THEN 1 ELSE 0 ENDIF ) * t1.c3 ) AS "CA",  <br>       SUM ( ( IF t1.c1 = 'FL' THEN 1 ELSE 0 ENDIF ) * t1.c3 ) AS "FL",  <br>       SUM ( ( IF t1.c1 = 'MA' THEN 1 ELSE 0 ENDIF ) * t1.c3 ) AS "MA",  <br>       SUM ( ( IF t1.c1 = 'NY' THEN 1 ELSE 0 ENDIF ) * t1.c3 ) AS "NY"  <br>  INTO #t1  <br>  FROM t1  <br> GROUP BY c2<br><br>SELECT c1,  <br>       SUM ( ( IF t1.c2 = 'Q1' THEN 1 ELSE 0 ENDIF ) * t1.c3 ) AS "Q1",  <br>       SUM ( ( IF t1.c2 = 'Q2' THEN 1 ELSE 0 ENDIF ) * t1.c3 ) AS "Q2",  <br>       SUM ( ( IF t1.c2 = 'Q3' THEN 1 ELSE 0 ENDIF ) * t1.c3 ) AS "Q3",  <br>       SUM ( ( IF t1.c2 = 'Q4' THEN 1 ELSE 0 ENDIF ) * t1.c3 ) AS "Q4"  <br>  INTO #t1  <br>  FROM t1  <br> GROUP BY c1<br>

The "SUM IF 1 OR 0" trick is an old one, used by generations of programmers to write funky code for icky problems, and not just in SQL. In fact, the code shown above is really simple compared with what goes on in the world, just a starting point.

If folks are interested I've got a couple of other demos that build on this technique... not the raw incomprehensible code you find in real-world applications, but hammered down into the basics for presentation... that's actually harder to do, the hammering down, harder than writing the original incomprehensible stuff :)