The question's simple, why should the answer be complicated?
Here's the data:
Question: How do I select the first row in each partition of a table?
CREATE TABLE t ( partition_id INTEGER NOT NULL, entry_id INTEGER NOT NULL, data VARCHAR ( 10 ) NOT NULL, PRIMARY KEY ( partition_id, entry_id ) ); INSERT t VALUES ( 10, 1, 'aaa' ); INSERT t VALUES ( 10, 2, 'bbb' ); INSERT t VALUES ( 10, 3, 'ccc' ); INSERT t VALUES ( 20, 1, 'ppp' ); INSERT t VALUES ( 20, 2, 'qqq' ); INSERT t VALUES ( 20, 3, 'rrr' ); INSERT t VALUES ( 30, 1, 'xxx' ); INSERT t VALUES ( 30, 2, 'yyy' ); INSERT t VALUES ( 30, 3, 'zzz' );Here's what the result should look like:
SELECT [the first row in each partition of t]; partition_id,entry_id,data 10,1,'aaa' 20,1,'ppp' 30,1,'xxx'If you didn't already know, could you tell what the question was by reading the answers in these earlier articles?
First, there was this Old School solution using MIN and GROUP BY:
SELECT t.* FROM t INNER JOIN ( SELECT partition_id AS partition_id, MIN ( entry_id ) AS min_entry_id FROM t GROUP BY partition_id ) AS first_row ON t.partition_id = first_row.partition_id AND t.entry_id = first_row.min_entry_id ORDER BY t.partition_id;Second, there was an OLAP solution using RANK and PARTITION BY:
SELECT partition_id, entry_id, data FROM ( select partition_id, entry_id, data, rank ( ) OVER partition_id_window AS entry_rank FROM t WINDOW partition_id_window AS ( PARTITION BY partition_id ORDER BY entry_id ) ) DT WHERE entry_rank = 1 ORDER BY partition_id;Both get points for being obscure, neither look much like an answer to the question.
Let's Try Again
Let's start by breaking the question in two parts, and solving each part separately:
Question: How do I select the first row in each partition of a table?
Question 1: How do I select all the unique values for partition_id?
Answer: Use DISTINCT when selecting all the values of partition_id.
There's no question what this code is doing, it's selecting all the unique values of partition_id just like the question asked:
SELECT DISTINCT t.partition_id FROM t ORDER BY t.partition_id; partition_id 10 20 30Question 2: How do I select the first row in a single partition?
Answer: Use a WHERE clause to pick the partition, then FIRST and ORDER BY to select the first row.
Once again, it's clear from reading the code what the question was:
SELECT FIRST * FROM t WHERE t.partition_id = 10 ORDER BY t.entry_id; partition_id,entry_id,data 10,1,'aaa'
What now?
Now comes the hard part: Combining those two result sets to get the answer to the original question.
Let's try this:
SELECT first_t.* FROM ( SELECT DISTINCT t.partition_id FROM t ORDER BY t.partition_id ) AS unique_partition INNER JOIN ( SELECT FIRST * FROM t WHERE t.partition_id = unique_partition.partition_id ORDER BY t.entry_id ) AS first_t ORDER BY first_t.partition_id;You knew that wouldn't work, didn't you? INNER JOIN needs an ON clause:
Let's try the old-fashioned "comma join operator", it doesn't need an ON clause:Could not execute statement. There is no way to join 'unique_partition' to 'first_t' SQLCODE=-146, ODBC 3 State="42000" Line 1, column 1
SELECT first_t.* FROM ( SELECT DISTINCT t.partition_id FROM t ORDER BY t.partition_id ) AS unique_partition, ( SELECT FIRST * FROM t WHERE t.partition_id = unique_partition.partition_id ORDER BY t.entry_id ) AS first_t ORDER BY first_t.partition_id;Different problem this time; you can't refer to a column in the left-hand side of a join in the right-hand side:
One last try; let's move the WHERE clause to an ON clause so INNER JOIN will work:Could not execute statement. Illegal reference to correlation name 'unique_partition' SQLCODE=-824, ODBC 3 State="42S02" Line 1, column 1
SELECT first_t.* FROM ( SELECT DISTINCT t.partition_id FROM t ORDER BY t.partition_id ) AS unique_partition INNER JOIN ( SELECT FIRST * FROM t ORDER BY t.entry_id ) AS first_t ON first_t.partition_id = unique_partition.partition_id ORDER BY first_t.partition_id;Yeah, it "works" all right, but it only returns one row instead of three; the predicate really needs to be inside the inner SELECT, not the outer one:
partition_id,entry_id,data 10,1,'aaa'
You've seen that before, haven't you?
Let's be honest, you've seen those error messages and wrong results before, haven't you? Like when you were trying to solve a problem that should have been easy (like this one) but you had to fiddle around with complex subqueries or temporary tables or even (gasp!) cursor fetch loops, and you ended up with a solution that looked like it was designed for a whole different problem.
It turns out there is a join operator that makes both of those error messages go away:
- it doesn't need an ON clause, and
- it does allow references from the right-hand side back to the left-hand side.
It's called CROSS APPLY...
...and it looks like this:
SELECT first_t.* FROM ( SELECT DISTINCT t.partition_id FROM t ORDER BY t.partition_id ) AS unique_partition CROSS APPLY ( SELECT FIRST * FROM t WHERE t.partition_id = unique_partition.partition_id ORDER BY t.entry_id ) AS first_t ORDER BY first_t.partition_id; partition_id,entry_id,data 10,1,'aaa' 20,2,'bbb' 30,3,'ccc'CROSS APPLY is like the comma-join and CROSS JOIN operators in that it doesn't need (and cannot have) an ON clause. It is different from other join operators in that predicates on the right-hand side can refer to columns on the left-hand side.
So, like comma-join and CROSS JOIN, the CROSS APPLY joins every single row on the left-hand side with every single row on the right-hand side...
but...
...exactly what constitutes "every single row on the right-hand side" depends on each row on the left-hand side.
In this case, for each row on the left-hand side, the set consisting of "every single row on the right-hand side" is boiled down to one single row, the first row with matching partition_id.
So, CROSS APPLY is like INNER JOIN with special powers.
References:
- Glenn Paulley's article on cross and outer apply and how they differ from LATERAL
- The FROM clause topic in the Help
- CROSS APPLY examples in the Help
3 comments:
Great insight, as usual:)
However, I feel the need to ask "Do you really think this CROSS APPLYcation is less obscure than the two queries cited in the first paragraphs?"
Regards
Volker Feels Gamma Now
@Volker: Yes, IMO the CROSS APPLY is as clear as SQL can be. Transliterate the left-hand side into English and you get "select each distinct partition_id" and the right-hand side is "select the first row by entry_id for each partition_id" which is exactly what the question asked. Try to transliterate the GROUP BY and OLAP solutions directly into English, see what you get: the very definition of "obscure".
@Breck: After some thought (and your new blog article) I agree with you - the query is much less obscure, and so is its structure.
It's primarily the fact that CROSS APPLY itself is not that well-known that makes it more difficult IMHO - it just requires some more learning on my part.
Thanks for helping me to do so:)
Best regards
Volker Now Thinks Learning OLAP Is Lots Harder
Post a Comment