Question: How do I perform a breadth-first traversal of a hierarchical table? In other words, how do I write a SQL Anywhere query that returns all the rows of a tree-structured table in left-to-right, top-to-bottom order?
Next Question: How do I perform a depth-first traversal (top-to-bottom, left-to-right) of the same table?
Here's a diagram showing the primary keys for a tree-structured table:
Here's what the breadth-first and depth-first queries should return:1 | --------------------------------------- 2 93 4 5 | | | | -------------- ------------ -------- ------ 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | | | | ----- ----- ----- ----- 27 26 25 24 23 22 21 20
SQL queries don't have any natural order other than what you specify in an ORDER BY clause, and clearly the numbers 1, 2, 3 aren't enough to determine either vertical (1 - 93 - 11) or horizontal ( 27 - 26 - 25) ordering. In other words, the table needs more than just a primary key:Breadth-First Depth-First 1 1 2 2 93 6 4 7 5 27 6 26 7 8 8 9 9 10 10 93 11 11 12 12 13 13 14 25 15 24 16 14 17 4 18 15 19 16 27 17 26 23 25 22 24 5 23 18 22 21 21 20 20 19
CREATE TABLE node ( node_id INTEGER NOT NULL, parent_id INTEGER NOT NULL REFERENCES node ( node_id ), sibling_number INTEGER NOT NULL, PRIMARY KEY ( node_id ) );
- node_id is a random primary key... not really "random" in this example because that would make the diagram hard to read, but just a "little" random: 93 appears below 1 but above 11, and while 18 appears to the left of 19, 21 comes before 20.
- parent_id specifies top-to-bottom ordering; it contains the node_id of the row above this one, except for the top row which has parent_id = node_id.
- sibling_number specifies the left-to-right ordering of rows with the same parent_id.
The point is, if top-to-bottom and left-to-right ordering is to make any sense, there has to be something in the data (like parent_id and sibling_number) to represent it.
Here's what the data looks like for this table, with the diagram modified to show the node_id,sibling_number pairs for each row:
Note: The sibling_number for the root node is set to 9 instead of the "more natural value" 1 to show that sibling_number has no role to play in determining top-to-bottom ordering, only left-to-right.
1,9 | ------------------------------------------------------------------ 2,1 93,2 4,3 5,4 | | | | ------------------------ ---------------------- ---------------- ---------- 6,5 7,6 8,7 9,8 10,9 11,3 12,5 13,7 14,9 15,1 16,5 17,9 18,2 19,8 | | | | ---------- ---------- ---------- ---------- 27,8 26,9 25,7 24,8 23,6 22,7 21,5 20,6
INSERT INTO node VALUES ( 1, 1, 9 ); INSERT INTO node VALUES ( 2, 1, 1 ); INSERT INTO node VALUES ( 6, 2, 5 ); INSERT INTO node VALUES ( 7, 2, 6 ); INSERT INTO node VALUES ( 27, 7, 8 ); INSERT INTO node VALUES ( 26, 7, 9 ); INSERT INTO node VALUES ( 8, 2, 7 ); INSERT INTO node VALUES ( 9, 2, 8 ); INSERT INTO node VALUES ( 10, 2, 9 ); INSERT INTO node VALUES ( 93, 1, 2 ); INSERT INTO node VALUES ( 11, 93, 3 ); INSERT INTO node VALUES ( 12, 93, 5 ); INSERT INTO node VALUES ( 13, 93, 7 ); INSERT INTO node VALUES ( 25, 13, 7 ); INSERT INTO node VALUES ( 24, 13, 8 ); INSERT INTO node VALUES ( 14, 93, 9 ); INSERT INTO node VALUES ( 4, 1, 3 ); INSERT INTO node VALUES ( 15, 4, 1 ); INSERT INTO node VALUES ( 16, 4, 5 ); INSERT INTO node VALUES ( 17, 4, 9 ); INSERT INTO node VALUES ( 23, 17, 6 ); INSERT INTO node VALUES ( 22, 17, 7 ); INSERT INTO node VALUES ( 5, 1, 4 ); INSERT INTO node VALUES ( 18, 5, 2 ); INSERT INTO node VALUES ( 21, 18, 5 ); INSERT INTO node VALUES ( 20, 18, 6 ); INSERT INTO node VALUES ( 19, 5, 8 ); COMMIT;
Now, you can take my word...
...that no simple queries exist for either breadth-first or depth-first result sets, nothing in the formSELECT name FROM employee ORDER BY [column-name-list];
Or, you can take a moment to experiment... go ahead, I'll wait :)
Starting Point: Top-Down Traversal
The first step is to write a building block for future efforts: a query that does a top-to-bottom traversal without regard to the left-to-right ordering.The following RECURSIVE UNION query uses parent_id to calculat a new column, level, which contains the vertical level number 1, 2, 3, 4.
The first SELECT on lines 6 through 11 starts the ball rolling by returning the single root row:
The second SELECT on lines 13 through 20 recursively traverses all the rows underneath Jordan:level, node_id, parent_id, sibling_number 1, 1, 1, 9
- The FROM clause on lines 17 through 19 joins the employee table with all the rows that have already been selected for the top_down_traversal result; this is the "recursive" part, a reference inside a query to the outer query itself.
- the ON clause on line 19 makes sure each new row selected from employee is exactly one level down from its parent row in top_down_traversal.
- the expression top_down_traversal.level + 1 on line 13 calculates the level number (one level down) for the new row selected from employee, and
- the WHERE clause on line 20 excludes the root "1" row because it was added by the first SELECT.
WITH RECURSIVE top_down_traversal ( level, node_id, parent_id, sibling_number ) AS ( SELECT CAST ( 1 AS INTEGER ) AS level, node.node_id AS node_id, node.parent_id AS parent_id, node.sibling_number AS sibling_number FROM node WHERE node.node_id = node.parent_id UNION ALL SELECT top_down_traversal.level + 1, node.node_id, node.parent_id, node.sibling_number FROM top_down_traversal INNER JOIN node ON node.parent_id = top_down_traversal.node_id WHERE node.parent_id <> node.node_id ) SELECT level, node_id FROM top_down_traversal ORDER BY level, node_id;The WITH RECURSIVE clause on lines 1 through 5 is the header for the temporary view defined by the AS ( ... ) clause on lines 6 through 20.
The SELECT on lines 21 through 25 demonstrates how the temporary view works to add level number to each row.
The output shows the top-to-bottom order, but not (yet) left-to-right:
level, node_id 1, 1 2, 2 2, 4 2, 5 2, 93 3, 6 3, 7 3, 8 3, 9 3, 10 3, 11 3, 12 3, 13 3, 14 3, 15 3, 16 3, 17 3, 18 3, 19 4, 20 4, 21 4, 22 4, 23 4, 24 4, 25 4, 26 4, 27
Answer 1: Breadth-First Traversal
ORDER BY level gives the top-to-bottom order for all the rows, and ORDER BY sibling_number gives the left-to-right order for rows with the same parent, but what about traversing all the rows in breadth-first order?What's needed is a kind of "variable ORDER BY", one that includes the sibling_number for all the rows in the current row's lineage, all the way from the top of the hierarchy down to the current row: ORDER BY level, sibling_number, sibling_number, sibling_number, ...
It would look like ORDER BY 1, 9 for the root row, ORDER BY 2, 9, 1 for row 2, ORDER BY 2, 9, 2 for row 93 and so on.
Here's what the variable ORDER BY would look like for all the rows, if it was possible to code such a thing in SQL:
1,9 | ------------------------------------------------------------------ 2,1 93,2 4,3 5,4 | | | | ------------------------ ---------------------- ---------------- ---------- 6,5 7,6 8,7 9,8 10,9 11,3 12,5 13,7 14,9 15,1 16,5 17,9 18,2 19,8 | | | | ---------- ---------- ---------- ---------- 27,8 26,9 25,7 24,8 23,6 22,7 21,5 20,6 node_id ORDER BY level, sibling_number, ... ======= =================================== 1 ORDER BY 1, 9 2 ORDER BY 2, 9, 1 93 ORDER BY 2, 9, 2 4 ORDER BY 2, 9, 3 5 ORDER BY 2, 9, 4 6 ORDER BY 3, 9, 1, 5 7 ORDER BY 3, 9, 1, 6 8 ORDER BY 3, 9, 1, 7 9 ORDER BY 3, 9, 1, 8 10 ORDER BY 3, 9, 1, 9 11 ORDER BY 3, 9, 2, 3 12 ORDER BY 3, 9, 2, 5 13 ORDER BY 3, 9, 2, 7 14 ORDER BY 3, 9, 2, 9 15 ORDER BY 3, 9, 3, 1 16 ORDER BY 3, 9, 3, 5 17 ORDER BY 3, 9, 3, 9 18 ORDER BY 3, 9, 4, 2 19 ORDER BY 3, 9, 4, 8 27 ORDER BY 4, 9, 1, 6, 8 26 ORDER BY 4, 9, 1, 6, 9 25 ORDER BY 4, 9, 2, 7, 7 24 ORDER BY 4, 9, 2, 7, 8 23 ORDER BY 4, 9, 3, 9, 6 22 ORDER BY 4, 9, 3, 9, 7 21 ORDER BY 4, 9, 4, 2, 5 20 ORDER BY 4, 9, 4, 2, 6
Variable ORDER BY clauses...
...aren't possible in SQL Anywhere, but variable strings are.The following query adds one more column to the result set: lineage contains a formatted string consisting of the sibling_numbers from all the ancestors to this one. Each sibling_number is padded on the left with zeros to deal with the fact that strings are sorted from left to right, and dashes "-" are used as separators for clarity.
WITH RECURSIVE breadth_first_traversal ( level, lineage, node_id, parent_id, sibling_number ) AS ( SELECT CAST ( 1 AS INTEGER ) AS level, CAST ( RIGHT ( STRING ( '00', node.sibling_number ), 3 ) AS LONG VARCHAR ) AS lineage, node.node_id AS node_id, node.parent_id AS parent_id, node.sibling_number AS sibling_number FROM node WHERE node.node_id = node.parent_id UNION ALL SELECT breadth_first_traversal.level + 1, STRING ( breadth_first_traversal.lineage, '-', RIGHT ( STRING ( '00', node.sibling_number ), 3 ) ), node.node_id, node.parent_id, node.sibling_number FROM breadth_first_traversal INNER JOIN node ON node.parent_id = breadth_first_traversal.node_id WHERE node.parent_id <> node.node_id ) SELECT node_id, level, lineage FROM breadth_first_traversal ORDER BY level, lineage;Here's the breadth-first result of the ORDER BY level, lineage clause:
1,9 | ------------------------------------------------------------------ 2,1 93,2 4,3 5,4 | | | | ------------------------ ---------------------- ---------------- ---------- 6,5 7,6 8,7 9,8 10,9 11,3 12,5 13,7 14,9 15,1 16,5 17,9 18,2 19,8 | | | | ---------- ---------- ---------- ---------- 27,8 26,9 25,7 24,8 23,6 22,7 21,5 20,6 node_id, level, lineage 1, 1, 009 2, 2, 009-001 93, 2, 009-002 4, 2, 009-003 5, 2, 009-004 6, 3, 009-001-005 7, 3, 009-001-006 8, 3, 009-001-007 9, 3, 009-001-008 10, 3, 009-001-009 11, 3, 009-002-003 12, 3, 009-002-005 13, 3, 009-002-007 14, 3, 009-002-009 15, 3, 009-003-001 16, 3, 009-003-005 17, 3, 009-003-009 18, 3, 009-004-002 19, 3, 009-004-008 27, 4, 009-001-006-008 26, 4, 009-001-006-009 25, 4, 009-002-007-007 24, 4, 009-002-007-008 23, 4, 009-003-009-006 22, 4, 009-003-009-007 21, 4, 009-004-002-005 20, 4, 009-004-002-006
Answer 2: Depth-First Traversal
A variable ORDER BY is also required for a depth-first traversal, but it's simpler: no level number, just the lineage.In other words, it's exactly the same query as the breadth-first traversal except for the ORDER BY lineage clause (and the view name, depth_first_traversal):
WITH RECURSIVE depth_first_traversal ( level, lineage, node_id, parent_id, sibling_number ) AS ( SELECT CAST ( 1 AS INTEGER ) AS level, CAST ( RIGHT ( STRING ( '00', node.sibling_number ), 3 ) AS LONG VARCHAR ) AS lineage, node.node_id AS node_id, node.parent_id AS parent_id, node.sibling_number AS sibling_number FROM node WHERE node.node_id = node.parent_id UNION ALL SELECT depth_first_traversal.level + 1, STRING ( depth_first_traversal.lineage, '-', RIGHT ( STRING ( '00', node.sibling_number ), 3 ) ), node.node_id, node.parent_id, node.sibling_number FROM depth_first_traversal INNER JOIN node ON node.parent_id = depth_first_traversal.node_id WHERE node.parent_id <> node.node_id ) SELECT node_id, level, lineage FROM depth_first_traversal ORDER BY lineage;Here's what the depth-first output from ORDER BY lineage looks like:
1,9 | ------------------------------------------------------------------ 2,1 93,2 4,3 5,4 | | | | ------------------------ ---------------------- ---------------- ---------- 6,5 7,6 8,7 9,8 10,9 11,3 12,5 13,7 14,9 15,1 16,5 17,9 18,2 19,8 | | | | ---------- ---------- ---------- ---------- 27,8 26,9 25,7 24,8 23,6 22,7 21,5 20,6 node_id, level, lineage 1, 1, 009 2, 2, 009-001 6, 3, 009-001-005 7, 3, 009-001-006 27, 4, 009-001-006-008 26, 4, 009-001-006-009 8, 3, 009-001-007 9, 3, 009-001-008 10, 3, 009-001-009 93, 2, 009-002 11, 3, 009-002-003 12, 3, 009-002-005 13, 3, 009-002-007 25, 4, 009-002-007-007 24, 4, 009-002-007-008 14, 3, 009-002-009 4, 2, 009-003 15, 3, 009-003-001 16, 3, 009-003-005 17, 3, 009-003-009 23, 4, 009-003-009-006 22, 4, 009-003-009-007 5, 2, 009-004 18, 3, 009-004-002 21, 4, 009-004-002-005 20, 4, 009-004-002-006 19, 3, 009-004-008
When looking at the list above, don't forget that level is NOT included in the ORDER BY.
Also, the root sibling_number "009" doesn't really have to be included in the lineage string since it's the same for all rows.
Test Case: Ainslie's Company
Here's the data from an earlier article on RECURSIVE UNION queries:CREATE TABLE employee ( employee_id INTEGER NOT NULL, manager_id INTEGER NOT NULL REFERENCES employee ( employee_id ), name VARCHAR ( 20 ) NOT NULL, salary NUMERIC ( 20, 2 ) NOT NULL, PRIMARY KEY ( employee_id ) ); INSERT INTO employee VALUES ( 1, 1, 'Ainslie', 1000000.00 ); INSERT INTO employee VALUES ( 2, 1, 'Briana', 900000.00 ); INSERT INTO employee VALUES ( 3, 1, 'Calista', 900000.00 ); INSERT INTO employee VALUES ( 4, 1, 'Delmar', 900000.00 ); INSERT INTO employee VALUES ( 5, 2, 'Electra', 750000.00 ); INSERT INTO employee VALUES ( 6, 3, 'Fabriane', 800000.00 ); INSERT INTO employee VALUES ( 7, 3, 'Genevieve', 750000.00 ); INSERT INTO employee VALUES ( 8, 4, 'Hunter', 800000.00 ); INSERT INTO employee VALUES ( 9, 6, 'Inari', 500000.00 ); INSERT INTO employee VALUES ( 10, 6, 'Jordan', 100000.00 ); INSERT INTO employee VALUES ( 11, 8, 'Khalil', 100000.00 ); INSERT INTO employee VALUES ( 12, 8, 'Lisette', 100000.00 ); INSERT INTO employee VALUES ( 13, 10, 'Marlon', 100000.00 ); INSERT INTO employee VALUES ( 14, 10, 'Nissa', 100000.00 ); COMMIT;Here's the breadth-first traversal query for Ainslie's company, same as the previous example with the following name changes:
- node becomes employee,
- node_id becomes employee_id
- parent_id becomes manager_id, and
- sibling_number becomes name:
WITH RECURSIVE breadth_first_traversal ( level, lineage, employee_id, manager_id, name ) AS ( SELECT CAST ( 1 AS INTEGER ) AS level, CAST ( employee.name AS LONG VARCHAR ) AS lineage, employee.employee_id AS employee_id, employee.manager_id AS manager_id, employee.name AS name FROM employee WHERE employee.employee_id = employee.manager_id UNION ALL SELECT breadth_first_traversal.level + 1, STRING ( breadth_first_traversal.lineage, '-', employee.name ), employee.employee_id, employee.manager_id, employee.name FROM breadth_first_traversal INNER JOIN employee ON employee.manager_id = breadth_first_traversal.employee_id WHERE employee.manager_id <> employee.employee_id ) SELECT employee_id, level, lineage FROM breadth_first_traversal ORDER BY level, lineage;Here's the breadth-first result set for Ainslie's company:
Here's the depth-first query for Ainslie's company:employee_id, level, lineage 1, 1, Ainslie 2, 2, Ainslie-Briana 3, 2, Ainslie-Calista 4, 2, Ainslie-Delmar 5, 3, Ainslie-Briana-Electra 6, 3, Ainslie-Calista-Fabriane 7, 3, Ainslie-Calista-Genevieve 8, 3, Ainslie-Delmar-Hunter 9, 4, Ainslie-Calista-Fabriane-Inari 10, 4, Ainslie-Calista-Fabriane-Jordan 11, 4, Ainslie-Delmar-Hunter-Khalil 12, 4, Ainslie-Delmar-Hunter-Lisette 13, 5, Ainslie-Calista-Fabriane-Jordan-Marlon 14, 5, Ainslie-Calista-Fabriane-Jordan-Nissa
WITH RECURSIVE depth_first_traversal ( level, lineage, employee_id, manager_id, name ) AS ( SELECT CAST ( 1 AS INTEGER ) AS level, CAST ( employee.name AS LONG VARCHAR ) AS lineage, employee.employee_id AS employee_id, employee.manager_id AS manager_id, employee.name AS name FROM employee WHERE employee.employee_id = employee.manager_id UNION ALL SELECT depth_first_traversal.level + 1, STRING ( depth_first_traversal.lineage, '-', employee.name ), employee.employee_id, employee.manager_id, employee.name FROM depth_first_traversal INNER JOIN employee ON employee.manager_id = depth_first_traversal.employee_id WHERE employee.manager_id <> employee.employee_id ) SELECT employee_id, level, lineage FROM depth_first_traversal ORDER BY lineage;Here's the depth-first result set for Ainslie's company:
employee_id, level, lineage 1, 1, Ainslie 2, 2, Ainslie-Briana 5, 3, Ainslie-Briana-Electra 3, 2, Ainslie-Calista 6, 3, Ainslie-Calista-Fabriane 9, 4, Ainslie-Calista-Fabriane-Inari 10, 4, Ainslie-Calista-Fabriane-Jordan 13, 5, Ainslie-Calista-Fabriane-Jordan-Marlon 14, 5, Ainslie-Calista-Fabriane-Jordan-Nissa 7, 3, Ainslie-Calista-Genevieve 4, 2, Ainslie-Delmar 8, 3, Ainslie-Delmar-Hunter 11, 4, Ainslie-Delmar-Hunter-Khalil 12, 4, Ainslie-Delmar-Hunter-Lisette
Test Case: Jordan's Company
Ainslie's company is not a good example for testing and debugging because the data is unrealistic; i.e., both of these simple queries work as breadth-first traversals without all the fuss and muss of a RECURSIVE UNION:SELECT * FROM employee ORDER BY employee_id; SELECT * FROM employee ORDER BY name;Here's a better example; same names but shuffled around, and random values used for employee_id:
636 Jordan | ------------------------------------------------------ 543 Briana 323 Delmar 914 Electra | | | ------------ -------------------------- ---------- 587 Fabriane 168 Calista 813 Genevieve 362 Hunter | | ---------------------- ----------------------- 611 Ainslie 893 Inari 119 Khalil 777 Lisette | --------------------- 809 Marlon 326 Nissa
CREATE TABLE employee ( employee_id INTEGER NOT NULL, manager_id INTEGER NOT NULL REFERENCES employee ( employee_id ), name VARCHAR ( 20 ) NOT NULL, salary NUMERIC ( 20, 2 ) NOT NULL, PRIMARY KEY ( employee_id ) ); INSERT INTO employee VALUES ( 636, 636, 'Jordan', 1000000.00 ); INSERT INTO employee VALUES ( 543, 636, 'Briana', 900000.00 ); INSERT INTO employee VALUES ( 323, 636, 'Delmar', 900000.00 ); INSERT INTO employee VALUES ( 914, 636, 'Electra', 900000.00 ); INSERT INTO employee VALUES ( 587, 543, 'Fabriane', 750000.00 ); INSERT INTO employee VALUES ( 168, 323, 'Calista', 800000.00 ); INSERT INTO employee VALUES ( 813, 323, 'Genevieve', 750000.00 ); INSERT INTO employee VALUES ( 362, 914, 'Hunter', 800000.00 ); INSERT INTO employee VALUES ( 611, 168, 'Ainslie', 500000.00 ); INSERT INTO employee VALUES ( 893, 168, 'Inari', 100000.00 ); INSERT INTO employee VALUES ( 119, 362, 'Khalil', 100000.00 ); INSERT INTO employee VALUES ( 777, 362, 'Lisette', 100000.00 ); INSERT INTO employee VALUES ( 809, 893, 'Marlon', 100000.00 ); INSERT INTO employee VALUES ( 326, 893, 'Nissa', 100000.00 ); COMMIT;The queries are the same as for Ainslie's company, just the data is different.
Here's the breadth-first result set for Jordan's company:
Here's the depth-first result set for Jordan's company:employee_id, level, lineage 636, 1, Jordan 543, 2, Jordan-Briana 323, 2, Jordan-Delmar 914, 2, Jordan-Electra 587, 3, Jordan-Briana-Fabriane 168, 3, Jordan-Delmar-Calista 813, 3, Jordan-Delmar-Genevieve 362, 3, Jordan-Electra-Hunter 611, 4, Jordan-Delmar-Calista-Ainslie 893, 4, Jordan-Delmar-Calista-Inari 119, 4, Jordan-Electra-Hunter-Khalil 777, 4, Jordan-Electra-Hunter-Lisette 809, 5, Jordan-Delmar-Calista-Inari-Marlon 326, 5, Jordan-Delmar-Calista-Inari-Nissa
employee_id, level, lineage 636, 1, Jordan 543, 2, Jordan-Briana 587, 3, Jordan-Briana-Fabriane 323, 2, Jordan-Delmar 168, 3, Jordan-Delmar-Calista 611, 4, Jordan-Delmar-Calista-Ainslie 893, 4, Jordan-Delmar-Calista-Inari 809, 5, Jordan-Delmar-Calista-Inari-Marlon 326, 5, Jordan-Delmar-Calista-Inari-Nissa 813, 3, Jordan-Delmar-Genevieve 914, 2, Jordan-Electra 362, 3, Jordan-Electra-Hunter 119, 4, Jordan-Electra-Hunter-Khalil 777, 4, Jordan-Electra-Hunter-Lisette
2 comments:
Hi Breck, I may know where you got this question from:)
However, whereas your solution works well for attributes le a "name", I still wait for a solution, say, to order by "SALARY DESC" in a depth_first fashion.
Without a method to number child nodes within the parent node dynamically, I don't think it's possible.
Best regards
Volker
I do not find any means of protection against cycles? That is why I prefer the use of a path rather than nodes. If you store just parents (pointers) it is possible for an A -> B -> A cycle to be created. If you have a single path entry like 'A > B' or [A,B] there is only a single value possible and if a cycle gets introduced it often just renders the row invisible to the recursive algorithm instead of breaking it.
As a least afford solution introducing a max_depth constant as an invariant might save the day.
Post a Comment