In his great novel Brave New World Aldous Huxley divided The World State into five heavily conditioned castes: the fully developed Alphas, and the arrested development Beta, Gamma, Delta and Epsilon castes. I know I'm not an Epsilon because I know what an Epsilon is and they don't.
But I am just as surely not an Alpha, and that means I'm not an Alpha Plus, the higher caste-within-a-caste, the leaders and intellectual elites.
Let's say I'm a Beta. OK, Gamma, but let's pretend for a moment, shall we? (Hey! Who shouted "Delta"? :)
"Alpha children wear grey. They work much harder than we do, because they're so frightfully clever. I'm really awfully glad I'm a Beta, because I don't work so hard. And then we are much better than the Gammas and Deltas. Gammas are stupid. They all wear green, and Delta children wear khaki. Oh no, I don't want to play with Delta children. And Epsilons are still worse. They're too stupid to be able ..." - Wikiquote
Here's what I'm afraid of:
Only Alphas understand OLAP
The WINDOW clause and the associated window functions were invented by Alpha Pluses for use by the Alpha caste, that's what I'm afraid of. (Either that, or I'm a Delta... that's always possible :)
Of course, The World State is fictional, this is the real world, and in the real world it is possible for the upper castes to teach the rest of us about new stuff. Here's Glenn Paulley responding to my earlier complaint about the WINDOW clause not answering the question "How do I select the first row in each partition of a table?":
It's helpful to think of WINDOW as augmenting the result set that is generated by the FROM/WHERE clause - WINDOW doesn't *just* return the aggregate values for groups (if PARTITION is used), it adds to the projection of the result set.
Consequently, if what you want is *the* row that is the "lowest" of a partition, you have to eliminate all of the other rows from result set.
FIRST_VALUE does indeed return the first value in the partition. However, you need to restrict the result set to eliminate those rows from t where the row is not the first value for that partition - and that's a bit harder to do because it's possible to have duplicate values for either entry_id or data within each partition.
A better way of getting what you want is to use the RANK function, which will return 1 for the lowest value in each partition (and will label ties similarly). The query is
SELECT partition_id, entry_id, data FROM ( select partition_id, entry_id, data, rank ( ) OVER partition_id_window AS entry_rank FROM t WINDOW partition_id_window AS ( PARTITION BY partition_id ORDER BY entry_id ) ) DT WHERE entry_rank = 1 ORDER BY partition_id;
Glenn's efforts to educate are most appreciated, especially since he knows it's an uphill task:
"Unfortunately, WINDOW functions remain beyond the experience of the majority of application developers, who typically have a working knowledge of the considerably simpler SQL:1992 standard."
Let's Compare and Contrast
Here's the original problem: "How do I select the first row in each partition of a table?"
CREATE TABLE t ( partition_id INTEGER NOT NULL, entry_id INTEGER NOT NULL, data VARCHAR ( 10 ) NOT NULL, PRIMARY KEY ( partition_id, entry_id ) ); INSERT t VALUES ( 10, 1, 'aaa' ); INSERT t VALUES ( 10, 2, 'bbb' ); INSERT t VALUES ( 10, 3, 'ccc' ); INSERT t VALUES ( 20, 1, 'ppp' ); INSERT t VALUES ( 20, 2, 'qqq' ); INSERT t VALUES ( 20, 3, 'rrr' ); INSERT t VALUES ( 30, 1, 'xxx' ); INSERT t VALUES ( 30, 2, 'yyy' ); INSERT t VALUES ( 30, 3, 'zzz' ); SELECT [the first row in each partition of t]; partition_id,entry_id,data 10,1,'aaa' 20,1,'ppp' 30,1,'xxx'
Here's the OLAP solution using WINDOW and RANK():
SELECT partition_id, entry_id, data FROM ( select partition_id, entry_id, data, rank ( ) OVER partition_id_window AS entry_rank FROM t WINDOW partition_id_window AS ( PARTITION BY partition_id ORDER BY entry_id ) ) DT WHERE entry_rank = 1 ORDER BY partition_id; partition_id,entry_id,data 10,1,'aaa' 20,1,'ppp' 30,1,'xxx'
Here's the Old School solution using GROUP BY and MIN():
SELECT t.* FROM t INNER JOIN ( SELECT partition_id AS partition_id, MIN ( entry_id ) AS min_entry_id FROM t GROUP BY partition_id ) AS first_row ON t.partition_id = first_row.partition_id AND t.entry_id = first_row.min_entry_id ORDER BY t.partition_id; partition_id,entry_id,data 10,1,'aaa' 20,1,'ppp' 30,1,'xxx'
Here's the comparison:
OLAP Solution Old School Solution ============= =================== Built upon a WINDOW clause. Built upon a GROUP BY clause. Uses RANK() to find the first Uses MIN() to find the row in each PARTITION. first row in each GROUP. A derived table is required A derived table is required because RANK() can't appear because t.data can't appear in a WHERE clause in the in a SELECT based on same SELECT. GROUP BY partition_id. Requires a WHERE clause to No WHERE clause is required. make use of the RANK() result.The part about "a derived table is required" makes the two solutions pretty much the same when it comes to understandability: all other things being equal, any query that requires a derived table is going to be harder to write and understand than a query that doesn't need one.
Which query is best is not the point here, the point is that the WINDOW clause can be made to work using the RANK() function...
...which only Alphas know about?
When Betas (OK, Deltas) use OLAP they use the Help, and they learn what they can do by looking at topics like this:
- Aggregate functions,
- Window aggregate functions and
- Mathematical formulas for the aggregate functions,
Memo to the Doc Team: Please accept my apologies for dragging you into this! Stuff happens, and in normal circumstances a simple suggestion in DocCommentXchange would suffice, but the opportunity to make a
smart-aleck commentpoint here was impossible to forgo!
HERE'S the scary part......and it's not about the Help or which method works or is easier to write and understand, it's this:
Is it important to performance to use OLAP instead of Old School solutions?Here's a hint...
...maybe not in this case, but just because the Old School solution worked it doesn't mean we can give up on learning about OLAP and retreat to the familiar.
"...window functions have the potential to lead to better query execution plans by eliminating joins and nested queries, yet retaining the simplicity of a single SQL statement."
Glenn Paulley in The Window operator and window aggregate functions, March 31st, 2009
Apparently, not if we care about the improvements coming to us courtesy of the Alphas!