Here's another simple question:
Question: How do I select the top 10 rows in each partition of a table?"
Will
CROSS APPLY work as well as it did before?
Here's the data. The table looks pretty much the same, except that the entry_id is autoincrementing instead of repeating and there's a lot more data; three partitions with 100 rows in each partition:
CREATE TABLE t (
partition_id INTEGER NOT NULL,
entry_id INTEGER NOT NULL DEFAULT AUTOINCREMENT,
data VARCHAR ( 10 ) NOT NULL,
PRIMARY KEY ( entry_id ) );
BEGIN
DECLARE @loop_counter INTEGER;
SET @loop_counter = 1;
WHILE @loop_counter <= 100 LOOP
INSERT t ( partition_id, data ) VALUES ( 10, 'aaa' );
INSERT t ( partition_id, data ) VALUES ( 20, 'bbb' );
INSERT t ( partition_id, data ) VALUES ( 30, 'ccc' );
SET @loop_counter = @loop_counter + 1;
END LOOP;
COMMIT;
END;
SELECT * FROM t ORDER BY entry_id;
partition_id,entry_id,data
10,1,'aaa'
20,2,'bbb'
30,3,'ccc'
10,4,'aaa'
20,5,'bbb'
30,6,'ccc'
10,7,'aaa'
20,8,'bbb'
30,9,'ccc'
...
partition_id,entry_id,data
10,298,'aaa'
20,299,'bbb'
30,300,'ccc'
Before tackling the "top 10" question, let's confirm that the CROSS APPLY answer to
the previous question "How do I select the first row in each partition of a table?" works just as well on this table:
SELECT first_t.*
FROM ( SELECT DISTINCT
t.partition_id
FROM t
ORDER BY t.partition_id ) AS unique_partition
CROSS APPLY
( SELECT FIRST *
FROM t
WHERE t.partition_id = unique_partition.partition_id
ORDER BY t.entry_id ) AS first_t
ORDER BY first_t.partition_id;
Yes, it works:
partition_id,entry_id,data
10,1,'aaa'
20,2,'bbb'
30,3,'ccc'
Plus, it is clear from the SQL what its purpose is:
- When the "( SELECT DISTINCT ... )" query is translated into English you get "select each distinct partition_id,"
- when the "( SELECT FIRST ... )" query is translated into English you get "select the first row by entry_id for each row in the other result set," and then
- when the combined CROSS APPLY query is translated by combining the other two translations, you get "select the first row by entry_id for each distinct partition_id"
which is pretty much a restatement of the original question: "How do I select the first row in each partition?"
It's not perfect; in a perfect world, every single query problem would be solved by a single SQL SELECT statement, not a combination of two queries. But, this is real life... and the original question did refer to two queries: "first row" and "each partition", and the CROSS APPLY contains exactly those two queries...
Not like the Old School solution which does funky things with MIN and GROUP BY, neither of which are even indirectly mentioned in the original question,
SELECT t.*
FROM t INNER JOIN ( SELECT partition_id AS partition_id,
MIN ( entry_id ) AS min_entry_id
FROM t
GROUP BY partition_id
) AS first_row
ON t.partition_id = first_row.partition_id
AND t.entry_id = first_row.min_entry_id
ORDER BY t.partition_id;
or the
OLAP solution which introduced the OVER, WINDOW, PARTITION BY and nested ORDER BY clauses plus the RANK function and a WHERE clause to deal with the RANK result.
SELECT partition_id,
entry_id,
data
FROM ( select partition_id,
entry_id,
data,
rank ( ) OVER partition_id_window AS entry_rank
FROM t
WINDOW partition_id_window AS (
PARTITION BY partition_id
ORDER BY entry_id
) ) DT
WHERE entry_rank = 1
ORDER BY partition_id;
Go ahead, translate either of those queries into English, see what you get!
Back on topic...
Question: How do I select the top 10 rows in each partition of a table?"
It turns out the CROSS APPLY solution scales just as well as the OLAP solution. With the OLAP solution, the "WHERE entry_rank = 1" predicate was simply changed to "entry_rank <= 10", and with the CROSS APPLY the "FIRST" simply needs to be changed to "TOP 10".
Once again, it works, and once again, the SQL matches the question:
SELECT first_t.*
FROM ( SELECT DISTINCT
t.partition_id
FROM t
ORDER BY t.partition_id ) AS unique_partition
CROSS APPLY
( SELECT TOP 10 *
FROM t
WHERE t.partition_id = unique_partition.partition_id
ORDER BY t.entry_id ) AS first_t
ORDER BY first_t.partition_id;
partition_id,entry_id,data
10,1,'aaa'
10,4,'aaa'
10,7,'aaa'
10,10,'aaa'
10,13,'aaa'
10,16,'aaa'
10,19,'aaa'
10,22,'aaa'
10,25,'aaa'
10,28,'aaa'
20,2,'bbb'
20,5,'bbb'
20,8,'bbb'
20,11,'bbb'
20,14,'bbb'
20,17,'bbb'
20,20,'bbb'
20,23,'bbb'
20,26,'bbb'
20,29,'bbb'
30,3,'ccc'
30,6,'ccc'
30,9,'ccc'
30,12,'ccc'
30,15,'ccc'
30,18,'ccc'
30,21,'ccc'
30,24,'ccc'
30,27,'ccc'
30,30,'ccc'
Sometimes "scalability" means "Will it handle millions of rows?", other times the definition is different: "Will it handle more complex problems as well as it handled simple ones?"
For the second definition, at least so far, the answer for CROSS APPLY is "yes".