The following question
"How do I select the first row in each partition of a table?"was discussed in these two articles
Unrequited OLAPand the conclusion seemed to be
OLAP Counseling
"The new FIRST_VALUE() aggregate function together with the OLAP WINDOW clause might seem to be ideal, but it's not. The RANK() function can be made to work with the WINDOW clause, but the Old-School solution using MIN() and GROUP BY is actually simpler... and easier to understand for some (ok, me)."You can see a side-by-side comparison here. Scroll down to the bottom of that article and you will see this parting shot
"...just because the Old School solution worked it doesn't mean we can give up on learning about OLAP and retreat to the familiar."
Onwards and upwards...
Here's a newer, slightly harder question:
"How do I select the top 10 rows in each partition of a table?"Let's start with a sample table containing 300 rows in three partitions:
CREATE TABLE t ( partition_id INTEGER NOT NULL, entry_id INTEGER NOT NULL DEFAULT AUTOINCREMENT, data VARCHAR ( 10 ) NOT NULL, PRIMARY KEY ( entry_id ) ); BEGIN DECLARE @loop_counter INTEGER; SET @loop_counter = 1; WHILE @loop_counter <= 100 LOOP INSERT t ( partition_id, data ) VALUES ( 10, 'aaa' ); INSERT t ( partition_id, data ) VALUES ( 20, 'bbb' ); INSERT t ( partition_id, data ) VALUES ( 30, 'ccc' ); SET @loop_counter = @loop_counter + 1; END LOOP; COMMIT; END; SELECT * FROM t ORDER BY entry_id; partition_id,entry_id,data 10,1,'aaa' 20,2,'bbb' 30,3,'ccc' 10,4,'aaa' 20,5,'bbb' 30,6,'ccc' 10,7,'aaa' 20,8,'bbb' 30,9,'ccc' 10,10,'aaa' 20,11,'bbb' 30,12,'ccc' 10,13,'aaa' 20,14,'bbb' 30,15,'ccc' 10,16,'aaa' 20,17,'bbb' 30,18,'ccc' 10,19,'aaa' 20,20,'bbb' ... 20,290,'bbb' 30,291,'ccc' 10,292,'aaa' 20,293,'bbb' 30,294,'ccc' 10,295,'aaa' 20,296,'bbb' 30,297,'ccc' 10,298,'aaa' 20,299,'bbb' 30,300,'ccc'Here's what the output is supposed to look like:
SELECT [the top 10 rows in each partition of t]; partition_id,entry_id,data 10,1,'aaa' 10,4,'aaa' 10,7,'aaa' 10,10,'aaa' 10,13,'aaa' 10,16,'aaa' 10,19,'aaa' 10,22,'aaa' 10,25,'aaa' 10,28,'aaa' 20,2,'bbb' 20,5,'bbb' 20,8,'bbb' 20,11,'bbb' 20,14,'bbb' 20,17,'bbb' 20,20,'bbb' 20,23,'bbb' 20,26,'bbb' 20,29,'bbb' 30,3,'ccc' 30,6,'ccc' 30,9,'ccc' 30,12,'ccc' 30,15,'ccc' 30,18,'ccc' 30,21,'ccc' 30,24,'ccc' 30,27,'ccc' 30,30,'ccc'Presumably, the Old-School solution would use TOP 10, but how? It turns out that the OLAP RANK() solution used previously to solve the "first" problem works just as well on the "top 10":
SELECT ranked.partition_id, ranked.entry_id, ranked.data FROM ( SELECT *, RANK() OVER partition_window AS entry_rank FROM t WINDOW partition_window AS ( PARTITION BY t.partition_id ORDER BY t.entry_id ) ) AS ranked WHERE ranked.entry_rank <= 10 ORDER BY ranked.partition_id, ranked.entry_id;Logically speaking, the WINDOW clause on lines 7 through 9 subdivides all the rows of t into three partitions (partition_id = 10, 20, 30) then sorts those rows on entry_id within each partition: entry_id = 1, 4, 7... for partition_id = 10, entry_id = 2, 5, 8... for partition_id = 20, and so on.
So far, so good... but we still can't use TOP 10 because that doesn't work on windows or partitions, just the whole result set, and the result set returned by the SELECT on lines 4 through 10 contains all 300 rows...
...and besides, it doesn't have an ORDER BY, and you have to have an ORDER BY to use TOP 10. Sure, the WINDOW clause has an ORDER BY, but TOP doesn't do windows.
The solution lies with the RANK() function call on line 5: it adds a new column called "entry_rank" to the result set, and that column contains the value 1, 2, 3... for the rows in each partition; unlike TOP, RANK() does work with windows and partitions.
The WHERE clause on line 11 does what TOP couldn't do: selects only the top 10 rows in each partition, ordered by entry_id.
The ORDER BY clause on lines 12 and 13 re-sorts the whole thing in the final order for presentation... you pretty much always need an outer ORDER BY because there is no guarantee that the ordering imposed by inner ORDER BY clauses will be preserved, especially not an ORDER BY buried two levels down in a WINDOW clause.
For a real-world example, have a look at this Foxhound FAQ: How do I run adhoc queries on the Foxhound database?
1 comment:
alphas! alphas! alphas!
enter for bettas is prohibitted!
Post a Comment