The question's simple, why should the answer be complicated?
Question: How do I select the first row in each partition of a table?
Here's the data:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | CREATE TABLE t (
partition_id INTEGER NOT NULL,
entry_id INTEGER NOT NULL,
data VARCHAR ( 10 ) NOT NULL,
PRIMARY KEY ( partition_id, entry_id ) );
INSERT t VALUES ( 10, 1, 'aaa' );
INSERT t VALUES ( 10, 2, 'bbb' );
INSERT t VALUES ( 10, 3, 'ccc' );
INSERT t VALUES ( 20, 1, 'ppp' );
INSERT t VALUES ( 20, 2, 'qqq' );
INSERT t VALUES ( 20, 3, 'rrr' );
INSERT t VALUES ( 30, 1, 'xxx' );
INSERT t VALUES ( 30, 2, 'yyy' );
INSERT t VALUES ( 30, 3, 'zzz' );
|
Here's what the result should look like:
1 2 3 4 5 6 | SELECT [the first row in each partition of t];
partition_id,entry_id,data
10,1,'aaa'
20,1,'ppp'
30,1,'xxx'
|
If you didn't already know, could you tell what the
question was by reading the answers in these earlier articles?
First, there was this
Old School solution using MIN and GROUP BY:
1 2 3 4 5 6 7 8 9 | SELECT t.*
FROM t INNER JOIN ( SELECT partition_id AS partition_id,
MIN ( entry_id ) AS min_entry_id
FROM t
GROUP BY partition_id
) AS first_row
ON t.partition_id = first_row.partition_id
AND t.entry_id = first_row.min_entry_id
ORDER BY t.partition_id;
|
Second, there was an
OLAP solution using RANK and PARTITION BY:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | SELECT partition_id,
entry_id,
data
FROM ( select partition_id,
entry_id,
data,
rank ( ) OVER partition_id_window AS entry_rank
FROM t
WINDOW partition_id_window AS (
PARTITION BY partition_id
ORDER BY entry_id
) ) DT
WHERE entry_rank = 1
ORDER BY partition_id;
|
Both get points for being obscure, neither look much like an answer to the question.
Let's Try Again
Question: How do I select the first row in each partition of a table?
Let's start by breaking the question in two parts, and solving each part separately:
Question 1: How do I select all the unique values for partition_id?
Answer: Use DISTINCT when selecting all the values of partition_id.
There's no question what this code is doing, it's selecting all the unique values of partition_id just like the question asked:
1 2 3 4 5 6 7 8 9 | SELECT DISTINCT
t.partition_id
FROM t
ORDER BY t.partition_id;
partition_id
10
20
30
|
Question 2: How do I select the first row in a single partition?
Answer: Use a WHERE clause to pick the partition, then FIRST and ORDER BY to select the first row.
Once again, it's clear from reading the code what the question was:
1 2 3 4 5 6 7 | SELECT FIRST *
FROM t
WHERE t.partition_id = 10
ORDER BY t.entry_id;
partition_id,entry_id,data
10,1,'aaa'
|
What now?
Now comes the hard part: Combining those two result sets to get the answer to the original question.
Let's try this:
1 2 3 4 5 6 7 8 9 10 11 | SELECT first_t.*
FROM ( SELECT DISTINCT
t.partition_id
FROM t
ORDER BY t.partition_id ) AS unique_partition
INNER JOIN
( SELECT FIRST *
FROM t
WHERE t.partition_id = unique_partition.partition_id
ORDER BY t.entry_id ) AS first_t
ORDER BY first_t.partition_id;
|
You knew that wouldn't work, didn't you? INNER JOIN needs an ON clause:
Could not execute statement.
There is no way to join 'unique_partition' to 'first_t'
SQLCODE=-146, ODBC 3 State="42000"
Line 1, column 1
Let's try the old-fashioned "comma join operator", it doesn't need an ON clause:
1 2 3 4 5 6 7 8 9 10 | SELECT first_t.*
FROM ( SELECT DISTINCT
t.partition_id
FROM t
ORDER BY t.partition_id ) AS unique_partition,
( SELECT FIRST *
FROM t
WHERE t.partition_id = unique_partition.partition_id
ORDER BY t.entry_id ) AS first_t
ORDER BY first_t.partition_id;
|
Different problem this time; you can't refer to a column in the left-hand side of a join in the right-hand side:
Could not execute statement.
Illegal reference to correlation name 'unique_partition'
SQLCODE=-824, ODBC 3 State="42S02"
Line 1, column 1
One last try; let's move the WHERE clause to an ON clause so INNER JOIN will work:
1 2 3 4 5 6 7 8 9 10 11 | SELECT first_t.*
FROM ( SELECT DISTINCT
t.partition_id
FROM t
ORDER BY t.partition_id ) AS unique_partition
INNER JOIN
( SELECT FIRST *
FROM t
ORDER BY t.entry_id ) AS first_t
ON first_t.partition_id = unique_partition.partition_id
ORDER BY first_t.partition_id;
|
Yeah, it "works" all right, but it only returns one row instead of three; the predicate really needs to be inside the inner SELECT, not the outer one:
partition_id,entry_id,data
10,1,'aaa'
You've seen that before, haven't you?
Let's be honest, you've seen those error messages and wrong results before, haven't you? Like when you were trying to solve a problem that
should have been easy (like this one) but you had to fiddle around with complex subqueries or temporary tables or even (gasp!) cursor fetch loops, and you ended up with a solution that looked like it was designed for a whole different problem.
It turns out there
is a join operator that makes both of those error messages go away:
- it doesn't need an ON clause, and
- it does allow references from the right-hand side back to the left-hand side.
It's called CROSS APPLY...
...and it looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | SELECT first_t.*
FROM ( SELECT DISTINCT
t.partition_id
FROM t
ORDER BY t.partition_id ) AS unique_partition
CROSS APPLY
( SELECT FIRST *
FROM t
WHERE t.partition_id = unique_partition.partition_id
ORDER BY t.entry_id ) AS first_t
ORDER BY first_t.partition_id;
partition_id,entry_id,data
10,1,'aaa'
20,2,'bbb'
30,3,'ccc'
|
CROSS APPLY is like the comma-join and CROSS JOIN operators in that it doesn't need (and cannot have) an ON clause. It is different from other join operators in that predicates on the right-hand side can refer to columns on the left-hand side.
So, like comma-join and CROSS JOIN, the CROSS APPLY joins every single row on the left-hand side with every single row on the right-hand side...
but...
...exactly what constitutes "every single row on the right-hand side" depends on each row on the left-hand side.
In this case, for each row on the left-hand side, the set consisting of "every single row on the right-hand side" is boiled down to one single row, the first row with matching partition_id.
So, CROSS APPLY is like INNER JOIN with special powers.
References: