Question: How do I perform a breadth-first traversal of a hierarchical table? In other words, how do I write a SQL Anywhere query that returns all the rows of a tree-structured table in left-to-right, top-to-bottom order?
Next Question: How do I perform a depth-first traversal (top-to-bottom, left-to-right) of the same table?
Here's a diagram showing the primary keys for a tree-structured table:
1
|
---------------------------------------
2 93 4 5
| | | |
-------------- ------------ -------- ------
6 7 8 9 10 11 12 13 14 15 16 17 18 19
| | | |
----- ----- ----- -----
27 26 25 24 23 22 21 20
Here's what the breadth-first and depth-first queries should return:SQL queries don't have any natural order other than what you specify in an ORDER BY clause, and clearly the numbers 1, 2, 3 aren't enough to determine either vertical (1 - 93 - 11) or horizontal ( 27 - 26 - 25) ordering. In other words, the table needs more than just a primary key:Breadth-First Depth-First 1 1 2 2 93 6 4 7 5 27 6 26 7 8 8 9 9 10 10 93 11 11 12 12 13 13 14 25 15 24 16 14 17 4 18 15 19 16 27 17 26 23 25 22 24 5 23 18 22 21 21 20 20 19
CREATE TABLE node ( node_id INTEGER NOT NULL, parent_id INTEGER NOT NULL REFERENCES node ( node_id ), sibling_number INTEGER NOT NULL, PRIMARY KEY ( node_id ) );
- node_id is a random primary key... not really "random" in this example because that would make the diagram hard to read, but just a "little" random: 93 appears below 1 but above 11, and while 18 appears to the left of 19, 21 comes before 20.
- parent_id specifies top-to-bottom ordering; it contains the node_id of the row above this one, except for the top row which has parent_id = node_id.
- sibling_number specifies the left-to-right ordering of rows with the same parent_id.
The point is, if top-to-bottom and left-to-right ordering is to make any sense, there has to be something in the data (like parent_id and sibling_number) to represent it.
Here's what the data looks like for this table, with the diagram modified to show the node_id,sibling_number pairs for each row:
Note: The sibling_number for the root node is set to 9 instead of the "more natural value" 1 to show that sibling_number has no role to play in determining top-to-bottom ordering, only left-to-right.
1,9
|
------------------------------------------------------------------
2,1 93,2 4,3 5,4
| | | |
------------------------ ---------------------- ---------------- ----------
6,5 7,6 8,7 9,8 10,9 11,3 12,5 13,7 14,9 15,1 16,5 17,9 18,2 19,8
| | | |
---------- ---------- ---------- ----------
27,8 26,9 25,7 24,8 23,6 22,7 21,5 20,6
INSERT INTO node VALUES ( 1, 1, 9 ); INSERT INTO node VALUES ( 2, 1, 1 ); INSERT INTO node VALUES ( 6, 2, 5 ); INSERT INTO node VALUES ( 7, 2, 6 ); INSERT INTO node VALUES ( 27, 7, 8 ); INSERT INTO node VALUES ( 26, 7, 9 ); INSERT INTO node VALUES ( 8, 2, 7 ); INSERT INTO node VALUES ( 9, 2, 8 ); INSERT INTO node VALUES ( 10, 2, 9 ); INSERT INTO node VALUES ( 93, 1, 2 ); INSERT INTO node VALUES ( 11, 93, 3 ); INSERT INTO node VALUES ( 12, 93, 5 ); INSERT INTO node VALUES ( 13, 93, 7 ); INSERT INTO node VALUES ( 25, 13, 7 ); INSERT INTO node VALUES ( 24, 13, 8 ); INSERT INTO node VALUES ( 14, 93, 9 ); INSERT INTO node VALUES ( 4, 1, 3 ); INSERT INTO node VALUES ( 15, 4, 1 ); INSERT INTO node VALUES ( 16, 4, 5 ); INSERT INTO node VALUES ( 17, 4, 9 ); INSERT INTO node VALUES ( 23, 17, 6 ); INSERT INTO node VALUES ( 22, 17, 7 ); INSERT INTO node VALUES ( 5, 1, 4 ); INSERT INTO node VALUES ( 18, 5, 2 ); INSERT INTO node VALUES ( 21, 18, 5 ); INSERT INTO node VALUES ( 20, 18, 6 ); INSERT INTO node VALUES ( 19, 5, 8 ); COMMIT;
Now, you can take my word...
...that no simple queries exist for either breadth-first or depth-first result sets, nothing in the formSELECT name FROM employee ORDER BY [column-name-list];
Or, you can take a moment to experiment... go ahead, I'll wait :)
Starting Point: Top-Down Traversal
The first step is to write a building block for future efforts: a query that does a top-to-bottom traversal without regard to the left-to-right ordering.The following RECURSIVE UNION query uses parent_id to calculat a new column, level, which contains the vertical level number 1, 2, 3, 4.
The first SELECT on lines 6 through 11 starts the ball rolling by returning the single root row:
The second SELECT on lines 13 through 20 recursively traverses all the rows underneath Jordan:level, node_id, parent_id, sibling_number 1, 1, 1, 9
- The FROM clause on lines 17 through 19 joins the employee table with all the rows that have already been selected for the top_down_traversal result; this is the "recursive" part, a reference inside a query to the outer query itself.
- the ON clause on line 19 makes sure each new row selected from employee is exactly one level down from its parent row in top_down_traversal.
- the expression top_down_traversal.level + 1 on line 13 calculates the level number (one level down) for the new row selected from employee, and
- the WHERE clause on line 20 excludes the root "1" row because it was added by the first SELECT.
WITH RECURSIVE top_down_traversal
( level,
node_id,
parent_id,
sibling_number )
AS ( SELECT CAST ( 1 AS INTEGER ) AS level,
node.node_id AS node_id,
node.parent_id AS parent_id,
node.sibling_number AS sibling_number
FROM node
WHERE node.node_id = node.parent_id
UNION ALL
SELECT top_down_traversal.level + 1,
node.node_id,
node.parent_id,
node.sibling_number
FROM top_down_traversal
INNER JOIN node
ON node.parent_id = top_down_traversal.node_id
WHERE node.parent_id <> node.node_id )
SELECT level,
node_id
FROM top_down_traversal
ORDER BY level,
node_id;
The WITH RECURSIVE clause on lines 1 through 5 is the header for the temporary view defined by the AS ( ... ) clause on lines 6 through 20.The SELECT on lines 21 through 25 demonstrates how the temporary view works to add level number to each row.
The output shows the top-to-bottom order, but not (yet) left-to-right:
level, node_id 1, 1 2, 2 2, 4 2, 5 2, 93 3, 6 3, 7 3, 8 3, 9 3, 10 3, 11 3, 12 3, 13 3, 14 3, 15 3, 16 3, 17 3, 18 3, 19 4, 20 4, 21 4, 22 4, 23 4, 24 4, 25 4, 26 4, 27
Answer 1: Breadth-First Traversal
ORDER BY level gives the top-to-bottom order for all the rows, and ORDER BY sibling_number gives the left-to-right order for rows with the same parent, but what about traversing all the rows in breadth-first order?What's needed is a kind of "variable ORDER BY", one that includes the sibling_number for all the rows in the current row's lineage, all the way from the top of the hierarchy down to the current row: ORDER BY level, sibling_number, sibling_number, sibling_number, ...
It would look like ORDER BY 1, 9 for the root row, ORDER BY 2, 9, 1 for row 2, ORDER BY 2, 9, 2 for row 93 and so on.
Here's what the variable ORDER BY would look like for all the rows, if it was possible to code such a thing in SQL:
1,9
|
------------------------------------------------------------------
2,1 93,2 4,3 5,4
| | | |
------------------------ ---------------------- ---------------- ----------
6,5 7,6 8,7 9,8 10,9 11,3 12,5 13,7 14,9 15,1 16,5 17,9 18,2 19,8
| | | |
---------- ---------- ---------- ----------
27,8 26,9 25,7 24,8 23,6 22,7 21,5 20,6
node_id ORDER BY level, sibling_number, ...
======= ===================================
1 ORDER BY 1, 9
2 ORDER BY 2, 9, 1
93 ORDER BY 2, 9, 2
4 ORDER BY 2, 9, 3
5 ORDER BY 2, 9, 4
6 ORDER BY 3, 9, 1, 5
7 ORDER BY 3, 9, 1, 6
8 ORDER BY 3, 9, 1, 7
9 ORDER BY 3, 9, 1, 8
10 ORDER BY 3, 9, 1, 9
11 ORDER BY 3, 9, 2, 3
12 ORDER BY 3, 9, 2, 5
13 ORDER BY 3, 9, 2, 7
14 ORDER BY 3, 9, 2, 9
15 ORDER BY 3, 9, 3, 1
16 ORDER BY 3, 9, 3, 5
17 ORDER BY 3, 9, 3, 9
18 ORDER BY 3, 9, 4, 2
19 ORDER BY 3, 9, 4, 8
27 ORDER BY 4, 9, 1, 6, 8
26 ORDER BY 4, 9, 1, 6, 9
25 ORDER BY 4, 9, 2, 7, 7
24 ORDER BY 4, 9, 2, 7, 8
23 ORDER BY 4, 9, 3, 9, 6
22 ORDER BY 4, 9, 3, 9, 7
21 ORDER BY 4, 9, 4, 2, 5
20 ORDER BY 4, 9, 4, 2, 6
Variable ORDER BY clauses...
...aren't possible in SQL Anywhere, but variable strings are.The following query adds one more column to the result set: lineage contains a formatted string consisting of the sibling_numbers from all the ancestors to this one. Each sibling_number is padded on the left with zeros to deal with the fact that strings are sorted from left to right, and dashes "-" are used as separators for clarity.
WITH RECURSIVE breadth_first_traversal
( level,
lineage,
node_id,
parent_id,
sibling_number )
AS ( SELECT CAST ( 1 AS INTEGER ) AS level,
CAST ( RIGHT ( STRING ( '00', node.sibling_number ), 3 )
AS LONG VARCHAR ) AS lineage,
node.node_id AS node_id,
node.parent_id AS parent_id,
node.sibling_number AS sibling_number
FROM node
WHERE node.node_id = node.parent_id
UNION ALL
SELECT breadth_first_traversal.level + 1,
STRING ( breadth_first_traversal.lineage,
'-',
RIGHT ( STRING ( '00', node.sibling_number ), 3 ) ),
node.node_id,
node.parent_id,
node.sibling_number
FROM breadth_first_traversal
INNER JOIN node
ON node.parent_id = breadth_first_traversal.node_id
WHERE node.parent_id <> node.node_id )
SELECT node_id,
level,
lineage
FROM breadth_first_traversal
ORDER BY level,
lineage;
Here's the breadth-first result of the ORDER BY level, lineage clause: 1,9
|
------------------------------------------------------------------
2,1 93,2 4,3 5,4
| | | |
------------------------ ---------------------- ---------------- ----------
6,5 7,6 8,7 9,8 10,9 11,3 12,5 13,7 14,9 15,1 16,5 17,9 18,2 19,8
| | | |
---------- ---------- ---------- ----------
27,8 26,9 25,7 24,8 23,6 22,7 21,5 20,6
node_id, level, lineage
1, 1, 009
2, 2, 009-001
93, 2, 009-002
4, 2, 009-003
5, 2, 009-004
6, 3, 009-001-005
7, 3, 009-001-006
8, 3, 009-001-007
9, 3, 009-001-008
10, 3, 009-001-009
11, 3, 009-002-003
12, 3, 009-002-005
13, 3, 009-002-007
14, 3, 009-002-009
15, 3, 009-003-001
16, 3, 009-003-005
17, 3, 009-003-009
18, 3, 009-004-002
19, 3, 009-004-008
27, 4, 009-001-006-008
26, 4, 009-001-006-009
25, 4, 009-002-007-007
24, 4, 009-002-007-008
23, 4, 009-003-009-006
22, 4, 009-003-009-007
21, 4, 009-004-002-005
20, 4, 009-004-002-006
Answer 2: Depth-First Traversal
A variable ORDER BY is also required for a depth-first traversal, but it's simpler: no level number, just the lineage.In other words, it's exactly the same query as the breadth-first traversal except for the ORDER BY lineage clause (and the view name, depth_first_traversal):
WITH RECURSIVE depth_first_traversal
( level,
lineage,
node_id,
parent_id,
sibling_number )
AS ( SELECT CAST ( 1 AS INTEGER ) AS level,
CAST ( RIGHT ( STRING ( '00', node.sibling_number ), 3 )
AS LONG VARCHAR ) AS lineage,
node.node_id AS node_id,
node.parent_id AS parent_id,
node.sibling_number AS sibling_number
FROM node
WHERE node.node_id = node.parent_id
UNION ALL
SELECT depth_first_traversal.level + 1,
STRING ( depth_first_traversal.lineage,
'-',
RIGHT ( STRING ( '00', node.sibling_number ), 3 ) ),
node.node_id,
node.parent_id,
node.sibling_number
FROM depth_first_traversal
INNER JOIN node
ON node.parent_id = depth_first_traversal.node_id
WHERE node.parent_id <> node.node_id )
SELECT node_id,
level,
lineage
FROM depth_first_traversal
ORDER BY lineage;
Here's what the depth-first output from ORDER BY lineage looks like: 1,9
|
------------------------------------------------------------------
2,1 93,2 4,3 5,4
| | | |
------------------------ ---------------------- ---------------- ----------
6,5 7,6 8,7 9,8 10,9 11,3 12,5 13,7 14,9 15,1 16,5 17,9 18,2 19,8
| | | |
---------- ---------- ---------- ----------
27,8 26,9 25,7 24,8 23,6 22,7 21,5 20,6
node_id, level, lineage
1, 1, 009
2, 2, 009-001
6, 3, 009-001-005
7, 3, 009-001-006
27, 4, 009-001-006-008
26, 4, 009-001-006-009
8, 3, 009-001-007
9, 3, 009-001-008
10, 3, 009-001-009
93, 2, 009-002
11, 3, 009-002-003
12, 3, 009-002-005
13, 3, 009-002-007
25, 4, 009-002-007-007
24, 4, 009-002-007-008
14, 3, 009-002-009
4, 2, 009-003
15, 3, 009-003-001
16, 3, 009-003-005
17, 3, 009-003-009
23, 4, 009-003-009-006
22, 4, 009-003-009-007
5, 2, 009-004
18, 3, 009-004-002
21, 4, 009-004-002-005
20, 4, 009-004-002-006
19, 3, 009-004-008
When looking at the list above, don't forget that level is NOT included in the ORDER BY.
Also, the root sibling_number "009" doesn't really have to be included in the lineage string since it's the same for all rows.
Test Case: Ainslie's Company
Here's the data from an earlier article on RECURSIVE UNION queries:
CREATE TABLE employee ( employee_id INTEGER NOT NULL, manager_id INTEGER NOT NULL REFERENCES employee ( employee_id ), name VARCHAR ( 20 ) NOT NULL, salary NUMERIC ( 20, 2 ) NOT NULL, PRIMARY KEY ( employee_id ) ); INSERT INTO employee VALUES ( 1, 1, 'Ainslie', 1000000.00 ); INSERT INTO employee VALUES ( 2, 1, 'Briana', 900000.00 ); INSERT INTO employee VALUES ( 3, 1, 'Calista', 900000.00 ); INSERT INTO employee VALUES ( 4, 1, 'Delmar', 900000.00 ); INSERT INTO employee VALUES ( 5, 2, 'Electra', 750000.00 ); INSERT INTO employee VALUES ( 6, 3, 'Fabriane', 800000.00 ); INSERT INTO employee VALUES ( 7, 3, 'Genevieve', 750000.00 ); INSERT INTO employee VALUES ( 8, 4, 'Hunter', 800000.00 ); INSERT INTO employee VALUES ( 9, 6, 'Inari', 500000.00 ); INSERT INTO employee VALUES ( 10, 6, 'Jordan', 100000.00 ); INSERT INTO employee VALUES ( 11, 8, 'Khalil', 100000.00 ); INSERT INTO employee VALUES ( 12, 8, 'Lisette', 100000.00 ); INSERT INTO employee VALUES ( 13, 10, 'Marlon', 100000.00 ); INSERT INTO employee VALUES ( 14, 10, 'Nissa', 100000.00 ); COMMIT;Here's the breadth-first traversal query for Ainslie's company, same as the previous example with the following name changes:
- node becomes employee,
- node_id becomes employee_id
- parent_id becomes manager_id, and
- sibling_number becomes name:
WITH RECURSIVE breadth_first_traversal
( level,
lineage,
employee_id,
manager_id,
name )
AS ( SELECT CAST ( 1 AS INTEGER ) AS level,
CAST ( employee.name
AS LONG VARCHAR ) AS lineage,
employee.employee_id AS employee_id,
employee.manager_id AS manager_id,
employee.name AS name
FROM employee
WHERE employee.employee_id = employee.manager_id
UNION ALL
SELECT breadth_first_traversal.level + 1,
STRING ( breadth_first_traversal.lineage,
'-',
employee.name ),
employee.employee_id,
employee.manager_id,
employee.name
FROM breadth_first_traversal
INNER JOIN employee
ON employee.manager_id = breadth_first_traversal.employee_id
WHERE employee.manager_id <> employee.employee_id )
SELECT employee_id,
level,
lineage
FROM breadth_first_traversal
ORDER BY level,
lineage;
Here's the breadth-first result set for Ainslie's company:Here's the depth-first query for Ainslie's company:employee_id, level, lineage 1, 1, Ainslie 2, 2, Ainslie-Briana 3, 2, Ainslie-Calista 4, 2, Ainslie-Delmar 5, 3, Ainslie-Briana-Electra 6, 3, Ainslie-Calista-Fabriane 7, 3, Ainslie-Calista-Genevieve 8, 3, Ainslie-Delmar-Hunter 9, 4, Ainslie-Calista-Fabriane-Inari 10, 4, Ainslie-Calista-Fabriane-Jordan 11, 4, Ainslie-Delmar-Hunter-Khalil 12, 4, Ainslie-Delmar-Hunter-Lisette 13, 5, Ainslie-Calista-Fabriane-Jordan-Marlon 14, 5, Ainslie-Calista-Fabriane-Jordan-Nissa
WITH RECURSIVE depth_first_traversal
( level,
lineage,
employee_id,
manager_id,
name )
AS ( SELECT CAST ( 1 AS INTEGER ) AS level,
CAST ( employee.name
AS LONG VARCHAR ) AS lineage,
employee.employee_id AS employee_id,
employee.manager_id AS manager_id,
employee.name AS name
FROM employee
WHERE employee.employee_id = employee.manager_id
UNION ALL
SELECT depth_first_traversal.level + 1,
STRING ( depth_first_traversal.lineage,
'-',
employee.name ),
employee.employee_id,
employee.manager_id,
employee.name
FROM depth_first_traversal
INNER JOIN employee
ON employee.manager_id = depth_first_traversal.employee_id
WHERE employee.manager_id <> employee.employee_id )
SELECT employee_id,
level,
lineage
FROM depth_first_traversal
ORDER BY lineage;
Here's the depth-first result set for Ainslie's company:employee_id, level, lineage 1, 1, Ainslie 2, 2, Ainslie-Briana 5, 3, Ainslie-Briana-Electra 3, 2, Ainslie-Calista 6, 3, Ainslie-Calista-Fabriane 9, 4, Ainslie-Calista-Fabriane-Inari 10, 4, Ainslie-Calista-Fabriane-Jordan 13, 5, Ainslie-Calista-Fabriane-Jordan-Marlon 14, 5, Ainslie-Calista-Fabriane-Jordan-Nissa 7, 3, Ainslie-Calista-Genevieve 4, 2, Ainslie-Delmar 8, 3, Ainslie-Delmar-Hunter 11, 4, Ainslie-Delmar-Hunter-Khalil 12, 4, Ainslie-Delmar-Hunter-Lisette
Test Case: Jordan's Company
Ainslie's company is not a good example for testing and debugging because the data is unrealistic; i.e., both of these simple queries work as breadth-first traversals without all the fuss and muss of a RECURSIVE UNION:SELECT * FROM employee ORDER BY employee_id; SELECT * FROM employee ORDER BY name;Here's a better example; same names but shuffled around, and random values used for employee_id:
636 Jordan
|
------------------------------------------------------
543 Briana 323 Delmar 914 Electra
| | |
------------ -------------------------- ----------
587 Fabriane 168 Calista 813 Genevieve 362 Hunter
| |
---------------------- -----------------------
611 Ainslie 893 Inari 119 Khalil 777 Lisette
|
---------------------
809 Marlon 326 Nissa
CREATE TABLE employee ( employee_id INTEGER NOT NULL, manager_id INTEGER NOT NULL REFERENCES employee ( employee_id ), name VARCHAR ( 20 ) NOT NULL, salary NUMERIC ( 20, 2 ) NOT NULL, PRIMARY KEY ( employee_id ) ); INSERT INTO employee VALUES ( 636, 636, 'Jordan', 1000000.00 ); INSERT INTO employee VALUES ( 543, 636, 'Briana', 900000.00 ); INSERT INTO employee VALUES ( 323, 636, 'Delmar', 900000.00 ); INSERT INTO employee VALUES ( 914, 636, 'Electra', 900000.00 ); INSERT INTO employee VALUES ( 587, 543, 'Fabriane', 750000.00 ); INSERT INTO employee VALUES ( 168, 323, 'Calista', 800000.00 ); INSERT INTO employee VALUES ( 813, 323, 'Genevieve', 750000.00 ); INSERT INTO employee VALUES ( 362, 914, 'Hunter', 800000.00 ); INSERT INTO employee VALUES ( 611, 168, 'Ainslie', 500000.00 ); INSERT INTO employee VALUES ( 893, 168, 'Inari', 100000.00 ); INSERT INTO employee VALUES ( 119, 362, 'Khalil', 100000.00 ); INSERT INTO employee VALUES ( 777, 362, 'Lisette', 100000.00 ); INSERT INTO employee VALUES ( 809, 893, 'Marlon', 100000.00 ); INSERT INTO employee VALUES ( 326, 893, 'Nissa', 100000.00 ); COMMIT;The queries are the same as for Ainslie's company, just the data is different.
Here's the breadth-first result set for Jordan's company:
Here's the depth-first result set for Jordan's company:employee_id, level, lineage 636, 1, Jordan 543, 2, Jordan-Briana 323, 2, Jordan-Delmar 914, 2, Jordan-Electra 587, 3, Jordan-Briana-Fabriane 168, 3, Jordan-Delmar-Calista 813, 3, Jordan-Delmar-Genevieve 362, 3, Jordan-Electra-Hunter 611, 4, Jordan-Delmar-Calista-Ainslie 893, 4, Jordan-Delmar-Calista-Inari 119, 4, Jordan-Electra-Hunter-Khalil 777, 4, Jordan-Electra-Hunter-Lisette 809, 5, Jordan-Delmar-Calista-Inari-Marlon 326, 5, Jordan-Delmar-Calista-Inari-Nissa
employee_id, level, lineage 636, 1, Jordan 543, 2, Jordan-Briana 587, 3, Jordan-Briana-Fabriane 323, 2, Jordan-Delmar 168, 3, Jordan-Delmar-Calista 611, 4, Jordan-Delmar-Calista-Ainslie 893, 4, Jordan-Delmar-Calista-Inari 809, 5, Jordan-Delmar-Calista-Inari-Marlon 326, 5, Jordan-Delmar-Calista-Inari-Nissa 813, 3, Jordan-Delmar-Genevieve 914, 2, Jordan-Electra 362, 3, Jordan-Electra-Hunter 119, 4, Jordan-Electra-Hunter-Khalil 777, 4, Jordan-Electra-Hunter-Lisette
2 comments:
Hi Breck, I may know where you got this question from:)
However, whereas your solution works well for attributes le a "name", I still wait for a solution, say, to order by "SALARY DESC" in a depth_first fashion.
Without a method to number child nodes within the parent node dynamically, I don't think it's possible.
Best regards
Volker
I do not find any means of protection against cycles? That is why I prefer the use of a path rather than nodes. If you store just parents (pointers) it is possible for an A -> B -> A cycle to be created. If you have a single path entry like 'A > B' or [A,B] there is only a single value possible and if a cycle gets introduced it often just renders the row invisible to the recursive algorithm instead of breaking it.
As a least afford solution introducing a max_depth constant as an invariant might save the day.
Post a Comment