Monday, January 11, 2010

Crosstab, Rotate, Pivot

Shameless promotion, that's what the title is all about: Include all the keywords anyone might use when asking this question:

Question: How do I rotate a table so that different row values in a single column become different columns in a new table?

Answer: I could have sworn I talked about this years ago, but apparently not... at least not anywhere people can find it.

Is that a subtle dig at the NNTP newsgroups? Yes, I'm sorry, it is... I'll try to be less subtle: NNTP sucks, the days of the newsgroups are numbered, and the future for SQL Anywhere questions and answers lies with SQLA! <g>
SQL Anywhere doesn't have the ability to rotate (crosstab, pivot, whatever) a table, and that feature isn't coming any time soon. What it does have is EXECUTE IMMEDIATE, plus the ability to code IF expressions just about anywhere in the SELECT statement. You can combine those two features to kludge a solution.
Is "kludge" too strong a word? Maybe so, since performance is pretty good. Good enough for this technique to be used in real time by the Foxhound database monitor when analyzing data gathered from thousands of database connections. But, no question... the code's funky.
It's actually hard to explain what rotating a table is all about, it's not exactly as simple as turning rows into columns and vice versa... it's easier to use an example.

Here's a table showing sales by state and quarter:

-- Part 1: Initialize data.

BEGIN
DROP TABLE t1;
EXCEPTION WHEN OTHERS THEN
END;

CREATE TABLE t1 (
c1 VARCHAR ( 10 ) NOT NULL,
c2 VARCHAR ( 10 ) NOT NULL,
c3 INTEGER NOT NULL,
PRIMARY KEY ( c1, c2 ) );

INSERT t1 VALUES ( 'CA', 'Q1', 1000 );
INSERT t1 VALUES ( 'CA', 'Q2', 2000 );
INSERT t1 VALUES ( 'CA', 'Q3', 9000 );
INSERT t1 VALUES ( 'CA', 'Q4', 7000 );

INSERT t1 VALUES ( 'NY', 'Q1', 4000 );
INSERT t1 VALUES ( 'NY', 'Q2', 5000 );
INSERT t1 VALUES ( 'NY', 'Q3', 1000 );
INSERT t1 VALUES ( 'NY', 'Q4', 6000 );

INSERT t1 VALUES ( 'FL', 'Q1', 9000 );
INSERT t1 VALUES ( 'FL', 'Q2', 7000 );
INSERT t1 VALUES ( 'FL', 'Q3', 2000 );
INSERT t1 VALUES ( 'FL', 'Q4', 1000 );

INSERT t1 VALUES ( 'AZ', 'Q1', 5000 );
INSERT t1 VALUES ( 'AZ', 'Q2', 5000 );
INSERT t1 VALUES ( 'AZ', 'Q3', 1000 );
INSERT t1 VALUES ( 'AZ', 'Q4', 3000 );

INSERT t1 VALUES ( 'MA', 'Q1', 2000 );
INSERT t1 VALUES ( 'MA', 'Q2', 6000 );
INSERT t1 VALUES ( 'MA', 'Q3', 5000 );
INSERT t1 VALUES ( 'MA', 'Q4', 3000 );
COMMIT;

SELECT * FROM t1 ORDER BY c1, c2;

So far, so good... the table is nicely normalized, everything's elegant... and useless:

c1 c2 c3
AZ Q1 5000
AZ Q2 5000
AZ Q3 1000
AZ Q4 3000
CA Q1 1000
CA Q2 2000
CA Q3 9000
CA Q4 7000
FL Q1 9000
FL Q2 7000
FL Q3 2000
FL Q4 1000
MA Q1 2000
MA Q2 6000
MA Q3 5000
MA Q4 3000
NY Q1 4000
NY Q2 5000
NY Q3 1000
NY Q4 6000

What folks often want to see is something like this, sales by quarter for each state:

c2 AZ CA FL MA NY
Q1 5000 1000 9000 2000 4000
Q2 5000 2000 7000 6000 5000
Q3 1000 9000 2000 5000 1000
Q4 3000 7000 1000 3000 6000

Here's how you can do that in SQL Anywhere:

-- Part 2: Pivot c1 values into columns.

BEGIN
DECLARE @sql LONG VARCHAR;
SET @sql = 'SELECT c2';
FOR f_fetch
AS c_fetch NO SCROLL CURSOR FOR
SELECT DISTINCT t1.c1 AS @c1
FROM t1
ORDER BY t1.c1
FOR READ ONLY
DO
SET @sql = STRING (
@sql,
', SUM ( ( IF t1.c1 = ''',
@c1,
''' THEN 1 ELSE 0 ENDIF ) * t1.c3 ) AS "',
@c1,
'"' );
END FOR;
SET @sql = STRING (
@sql,
' INTO #t1 FROM t1 GROUP BY c2' );
MESSAGE @sql TO CONSOLE;
EXECUTE IMMEDIATE @sql;
SELECT * FROM #t1 ORDER BY c2; -- pivot table
END;

What if you want to see sales by state for each quarter?

c1 Q1 Q2 Q3 Q4
AZ 5000 5000 1000 3000
CA 1000 2000 9000 7000
FL 9000 7000 2000 1000
MA 2000 6000 5000 3000
NY 4000 5000 1000 6000

Here's the code for that:

-- Part 3: Pivot c2 values into columns.

BEGIN
DECLARE @sql LONG VARCHAR;
SET @sql = 'SELECT c1';
FOR f_fetch
AS c_fetch NO SCROLL CURSOR FOR
SELECT DISTINCT t1.c2 AS @c2
FROM t1
ORDER BY t1.c2
FOR READ ONLY
DO
SET @sql = STRING (
@sql,
', SUM ( ( IF t1.c2 = ''',
@c2,
''' THEN 1 ELSE 0 ENDIF ) * t1.c3 ) AS "',
@c2,
'"' );
END FOR;
SET @sql = STRING (
@sql,
' INTO #t1 FROM t1 GROUP BY c1' );
MESSAGE @sql TO CONSOLE;
EXECUTE IMMEDIATE @sql;
SELECT * FROM #t1 ORDER BY c1; -- pivot table
END;

Here's where the magic lies, in the SUMs of row values multiplied by 1 or 0; the following SELECT statements are generated by the code shown above:

SELECT c2,
SUM ( ( IF t1.c1 = 'AZ' THEN 1 ELSE 0 ENDIF ) * t1.c3 ) AS "AZ",
SUM ( ( IF t1.c1 = 'CA' THEN 1 ELSE 0 ENDIF ) * t1.c3 ) AS "CA",
SUM ( ( IF t1.c1 = 'FL' THEN 1 ELSE 0 ENDIF ) * t1.c3 ) AS "FL",
SUM ( ( IF t1.c1 = 'MA' THEN 1 ELSE 0 ENDIF ) * t1.c3 ) AS "MA",
SUM ( ( IF t1.c1 = 'NY' THEN 1 ELSE 0 ENDIF ) * t1.c3 ) AS "NY"
INTO #t1
FROM t1
GROUP BY c2

SELECT c1,
SUM ( ( IF t1.c2 = 'Q1' THEN 1 ELSE 0 ENDIF ) * t1.c3 ) AS "Q1",
SUM ( ( IF t1.c2 = 'Q2' THEN 1 ELSE 0 ENDIF ) * t1.c3 ) AS "Q2",
SUM ( ( IF t1.c2 = 'Q3' THEN 1 ELSE 0 ENDIF ) * t1.c3 ) AS "Q3",
SUM ( ( IF t1.c2 = 'Q4' THEN 1 ELSE 0 ENDIF ) * t1.c3 ) AS "Q4"
INTO #t1
FROM t1
GROUP BY c1

The "SUM IF 1 OR 0" trick is an old one, used by generations of programmers to write funky code for icky problems, and not just in SQL. In fact, the code shown above is really simple compared with what goes on in the world, just a starting point.

If folks are interested I've got a couple of other demos that build on this technique... not the raw incomprehensible code you find in real-world applications, but hammered down into the basics for presentation... that's actually harder to do, the hammering down, harder than writing the original incomprehensible stuff :)

6 comments:

Tore Kjørsvik said...

Why would you do a multiplication to keep the right value in your IF clauses?

Wouldn't this be equally, but simpler:

SUM ( IF t1.c1 = 'AZ' THEN t1.c3 ELSE 0 ENDIF ) AS "AZ"

Or am I missing something?
My 2 cents.

Tore Kjorsvik

Breck Carter said...

@Tore: I think you are correct.

Tore Kjørsvik said...

@Breck: I hope my comment was not taken as critic. Your solution to Sybase's lack of Crosstab/Rotate/Pivot is brilliant, and we are testing an implementation of this in a reporting system right now. It looks quite promising.

Thank you for your excellent work!

Anonymous said...

Hi,

Thanks for this superb article.

Would it possible for the c3 to be dates rather then numeric?

MAny thansk

Tore Kjørsvik said...

@Anonymous: c3 could of course be any datatype, but I don't quite get your question/problem. If c3 is a date, it makes no sense to sum these data. What would you like to do with the dates in c3? Min/Max/"average"/"Most used"?

If you can give us a little more information about your problem, it would be much easier to find a good solution.

mary.k.dba said...

Thanks so much! This is exactly what I needed.