Here's another simple question:
Question: How do I select the top 10 rows in each partition of a table?"Will CROSS APPLY work as well as it did before?
Here's the data. The table looks pretty much the same, except that the entry_id is autoincrementing instead of repeating and there's a lot more data; three partitions with 100 rows in each partition:
CREATE TABLE t ( partition_id INTEGER NOT NULL, entry_id INTEGER NOT NULL DEFAULT AUTOINCREMENT, data VARCHAR ( 10 ) NOT NULL, PRIMARY KEY ( entry_id ) ); BEGIN DECLARE @loop_counter INTEGER; SET @loop_counter = 1; WHILE @loop_counter <= 100 LOOP INSERT t ( partition_id, data ) VALUES ( 10, 'aaa' ); INSERT t ( partition_id, data ) VALUES ( 20, 'bbb' ); INSERT t ( partition_id, data ) VALUES ( 30, 'ccc' ); SET @loop_counter = @loop_counter + 1; END LOOP; COMMIT; END; SELECT * FROM t ORDER BY entry_id; partition_id,entry_id,data 10,1,'aaa' 20,2,'bbb' 30,3,'ccc' 10,4,'aaa' 20,5,'bbb' 30,6,'ccc' 10,7,'aaa' 20,8,'bbb' 30,9,'ccc' ... partition_id,entry_id,data 10,298,'aaa' 20,299,'bbb' 30,300,'ccc'Before tackling the "top 10" question, let's confirm that the CROSS APPLY answer to the previous question "How do I select the first row in each partition of a table?" works just as well on this table:
SELECT first_t.* FROM ( SELECT DISTINCT t.partition_id FROM t ORDER BY t.partition_id ) AS unique_partition CROSS APPLY ( SELECT FIRST * FROM t WHERE t.partition_id = unique_partition.partition_id ORDER BY t.entry_id ) AS first_t ORDER BY first_t.partition_id;Yes, it works:
Plus, it is clear from the SQL what its purpose is:partition_id,entry_id,data 10,1,'aaa' 20,2,'bbb' 30,3,'ccc'
- When the "( SELECT DISTINCT ... )" query is translated into English you get "select each distinct partition_id,"
- when the "( SELECT FIRST ... )" query is translated into English you get "select the first row by entry_id for each row in the other result set," and then
- when the combined CROSS APPLY query is translated by combining the other two translations, you get "select the first row by entry_id for each distinct partition_id"
It's not perfect; in a perfect world, every single query problem would be solved by a single SQL SELECT statement, not a combination of two queries. But, this is real life... and the original question did refer to two queries: "first row" and "each partition", and the CROSS APPLY contains exactly those two queries...
Not like the Old School solution which does funky things with MIN and GROUP BY, neither of which are even indirectly mentioned in the original question,
SELECT t.* FROM t INNER JOIN ( SELECT partition_id AS partition_id, MIN ( entry_id ) AS min_entry_id FROM t GROUP BY partition_id ) AS first_row ON t.partition_id = first_row.partition_id AND t.entry_id = first_row.min_entry_id ORDER BY t.partition_id;or the OLAP solution which introduced the OVER, WINDOW, PARTITION BY and nested ORDER BY clauses plus the RANK function and a WHERE clause to deal with the RANK result.
SELECT partition_id, entry_id, data FROM ( select partition_id, entry_id, data, rank ( ) OVER partition_id_window AS entry_rank FROM t WINDOW partition_id_window AS ( PARTITION BY partition_id ORDER BY entry_id ) ) DT WHERE entry_rank = 1 ORDER BY partition_id;Go ahead, translate either of those queries into English, see what you get!
Back on topic...
Question: How do I select the top 10 rows in each partition of a table?"It turns out the CROSS APPLY solution scales just as well as the OLAP solution. With the OLAP solution, the "WHERE entry_rank = 1" predicate was simply changed to "entry_rank <= 10", and with the CROSS APPLY the "FIRST" simply needs to be changed to "TOP 10".
Once again, it works, and once again, the SQL matches the question:
SELECT first_t.* FROM ( SELECT DISTINCT t.partition_id FROM t ORDER BY t.partition_id ) AS unique_partition CROSS APPLY ( SELECT TOP 10 * FROM t WHERE t.partition_id = unique_partition.partition_id ORDER BY t.entry_id ) AS first_t ORDER BY first_t.partition_id; partition_id,entry_id,data 10,1,'aaa' 10,4,'aaa' 10,7,'aaa' 10,10,'aaa' 10,13,'aaa' 10,16,'aaa' 10,19,'aaa' 10,22,'aaa' 10,25,'aaa' 10,28,'aaa' 20,2,'bbb' 20,5,'bbb' 20,8,'bbb' 20,11,'bbb' 20,14,'bbb' 20,17,'bbb' 20,20,'bbb' 20,23,'bbb' 20,26,'bbb' 20,29,'bbb' 30,3,'ccc' 30,6,'ccc' 30,9,'ccc' 30,12,'ccc' 30,15,'ccc' 30,18,'ccc' 30,21,'ccc' 30,24,'ccc' 30,27,'ccc' 30,30,'ccc'Sometimes "scalability" means "Will it handle millions of rows?", other times the definition is different: "Will it handle more complex problems as well as it handled simple ones?"
For the second definition, at least so far, the answer for CROSS APPLY is "yes".
No comments:
Post a Comment