OLAP is so attractive from afar, it solves so many Big Important Problems, how can anyone not love it?
Well, it's easy not to love OLAP: just try something simple, something that should be easy given all the expressive power built in to the WINDOW clause and all the related window functions like, say, FIRST_VALUE.
Question: How do I select the first row in each partition of a table?
Here's an example:
CREATE TABLE t ( partition_id INTEGER NOT NULL, entry_id INTEGER NOT NULL, data VARCHAR ( 10 ) NOT NULL, PRIMARY KEY ( partition_id, entry_id ) ); INSERT t VALUES ( 10, 1, 'aaa' ); INSERT t VALUES ( 10, 2, 'bbb' ); INSERT t VALUES ( 10, 3, 'ccc' ); INSERT t VALUES ( 20, 1, 'ppp' ); INSERT t VALUES ( 20, 2, 'qqq' ); INSERT t VALUES ( 20, 3, 'rrr' ); INSERT t VALUES ( 30, 1, 'xxx' ); INSERT t VALUES ( 30, 2, 'yyy' ); INSERT t VALUES ( 30, 3, 'zzz' ); SELECT [the first row in each partition of t]; partition_id,entry_id,data 10,1,'aaa' 20,1,'ppp' 30,1,'xxx'
Answer: OLAP to the rescue! It's got the PARTITION keyword, and ORDER BY, and RANGE, and FIRST_VALUE, everything you need!
Hubba hubba, here we go! ...
SELECT partition_id, FIRST_VALUE ( entry_id ) OVER partition_id_window AS first_entry_id, FIRST_VALUE ( data ) OVER partition_id_window AS first_data FROM t WINDOW partition_id_window AS ( PARTITION BY partition_id ORDER BY entry_id RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) ORDER BY partition_id; partition_id,first_entry_id,first_data 10,1,'aaa' 10,1,'aaa' 10,1,'aaa' 20,1,'ppp' 20,1,'ppp' 20,1,'ppp' 30,1,'xxx' 30,1,'xxx' 30,1,'xxx'
Well, that sucks!
That's not even close!You can't code RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED PRECEDING, that throws a syntax error. In fact, the RANGE clause seems to be singularly useless, it has so few options.
It's almost like the WINDOW clause was designed by a standards committee.
Old School
Remember GROUP BY? Yeah, we used to think THAT was complicated before the WINDOW clause came along.SELECT t.* FROM t INNER JOIN ( SELECT partition_id AS partition_id, MIN ( entry_id ) AS min_entry_id FROM t GROUP BY partition_id ) AS first_row ON t.partition_id = first_row.partition_id AND t.entry_id = first_row.min_entry_id ORDER BY t.partition_id; partition_id,entry_id,data 10,1,'aaa' 20,1,'ppp' 30,1,'xxx'
Tell me it ain't so!
Please, someone tell me the WINDOW clause isn't so hard to get.That may be wishful thinking, however...
"Window operations don’t add expressive power to the SQL language because other constructions, which may require multiple statements, can provide the same functionality as window functions."
Glenn Paulley in The Window operator and window aggregate functions, March 31st, 2009
3 comments:
I would agree that WINDOW clauses are not that easy (and I'm using them too rarely to make noticable improvements)...
However, in the current case I guess a simple SELECT DISTINCT would do. Though I agree that I would have expected FIRST_VALUE() to do that on its own, based on experience with MAX() and other aggregate functions.
And now we're waiting for Glenn's explanation...
Volker
It's helpful to think of WINDOW as augmenting the result set that is generated by the FROM/WHERE clause - WINDOW doesn't *just* return the aggregate values for groups (if PARTITION is used), it adds to the projection of the result set.
Consequently, if what you want is *the* row that is the "lowest" of a partition, you have to eliminate all of the other rows from result set.
FIRST_VALUE does indeed return the first value in the partition. However, you need to restrict the result set to eliminate those rows from t where the row is not the first value for that partition - and that's a bit harder to do because it's possible to have duplicate values for either entry_id or data within each partition.
A better way of getting what you want is to use the RANK function, which will return 1 for the lowest value in each partition (and will label ties similarly). The query is
SELECT partition_id,
entry_id,
data
FROM ( select partition_id, entry_id, data,
rank ( ) OVER partition_id_window AS entry_rank
FROM t
WINDOW partition_id_window AS (
PARTITION BY partition_id
ORDER BY entry_id
) ) DT
WHERE entry_rank = 1
ORDER BY partition_id;
Breck, having Glenn's thorough explanation:
Now I'd like to know whether you find the WINDOW clause easier to use (and(or understand) than before...
Volker
Post a Comment