Question: How do I perform a breadth-first traversal of a hierarchical table? In other words, how do I write a SQL Anywhere query that returns all the rows of a tree-structured table in left-to-right, top-to-bottom order?
Next Question: How do I perform a depth-first traversal (top-to-bottom, left-to-right) of the same table?
Here's a diagram showing the primary keys for a tree-structured table:
1
|
---------------------------------------
2 93 4 5
| | | |
-------------- ------------ -------- ------
6 7 8 9 10 11 12 13 14 15 16 17 18 19
| | | |
----- ----- ----- -----
27 26 25 24 23 22 21 20
Here's what the breadth-first and depth-first queries should return:
Breadth-First Depth-First
1 1
2 2
93 6
4 7
5 27
6 26
7 8
8 9
9 10
10 93
11 11
12 12
13 13
14 25
15 24
16 14
17 4
18 15
19 16
27 17
26 23
25 22
24 5
23 18
22 21
21 20
20 19
SQL queries don't have any natural order other than what you specify in an ORDER BY clause, and clearly the numbers 1, 2, 3 aren't enough to determine either vertical (1 - 93 - 11) or horizontal ( 27 - 26 - 25) ordering. In other words, the table needs more than just a primary key:
CREATE TABLE node (
node_id INTEGER NOT NULL,
parent_id INTEGER NOT NULL REFERENCES node ( node_id ),
sibling_number INTEGER NOT NULL,
PRIMARY KEY ( node_id ) );
- node_id is a random primary key... not really "random" in this example because that would make the diagram hard to read, but just a "little" random: 93 appears below 1 but above 11, and while 18 appears to the left of 19, 21 comes before 20.
- parent_id specifies top-to-bottom ordering; it contains the node_id of the row above this one, except for the top row which has parent_id = node_id.
- sibling_number specifies the left-to-right ordering of rows with the same parent_id.
Columns like node_id and parent_id often appear in hierarchical tables, but sibling_id is something else. If left-to-right ordering is important in a hierarchical table (it often isn't) there must be one or more columns that can be used in an ORDER BY to get that result. Sometimes rows are sorted left-to-right in alphabetic order, sometimes in date/time order, maybe a line number, maybe even an autoincrement... in this example sibling_number is used to make the column's purpose absolutely clear.
The point is, if top-to-bottom and left-to-right ordering is to make any sense, there has to be something in the data (like parent_id and sibling_number) to represent it.
Here's what the data looks like for this table, with the diagram modified to show the node_id,sibling_number pairs for each row:
Note: The sibling_number for the root node is set to 9 instead of the "more natural value" 1 to show that sibling_number has no role to play in determining top-to-bottom ordering, only left-to-right.
1,9
|
------------------------------------------------------------------
2,1 93,2 4,3 5,4
| | | |
------------------------ ---------------------- ---------------- ----------
6,5 7,6 8,7 9,8 10,9 11,3 12,5 13,7 14,9 15,1 16,5 17,9 18,2 19,8
| | | |
---------- ---------- ---------- ----------
27,8 26,9 25,7 24,8 23,6 22,7 21,5 20,6
INSERT INTO node VALUES ( 1, 1, 9 );
INSERT INTO node VALUES ( 2, 1, 1 );
INSERT INTO node VALUES ( 6, 2, 5 );
INSERT INTO node VALUES ( 7, 2, 6 );
INSERT INTO node VALUES ( 27, 7, 8 );
INSERT INTO node VALUES ( 26, 7, 9 );
INSERT INTO node VALUES ( 8, 2, 7 );
INSERT INTO node VALUES ( 9, 2, 8 );
INSERT INTO node VALUES ( 10, 2, 9 );
INSERT INTO node VALUES ( 93, 1, 2 );
INSERT INTO node VALUES ( 11, 93, 3 );
INSERT INTO node VALUES ( 12, 93, 5 );
INSERT INTO node VALUES ( 13, 93, 7 );
INSERT INTO node VALUES ( 25, 13, 7 );
INSERT INTO node VALUES ( 24, 13, 8 );
INSERT INTO node VALUES ( 14, 93, 9 );
INSERT INTO node VALUES ( 4, 1, 3 );
INSERT INTO node VALUES ( 15, 4, 1 );
INSERT INTO node VALUES ( 16, 4, 5 );
INSERT INTO node VALUES ( 17, 4, 9 );
INSERT INTO node VALUES ( 23, 17, 6 );
INSERT INTO node VALUES ( 22, 17, 7 );
INSERT INTO node VALUES ( 5, 1, 4 );
INSERT INTO node VALUES ( 18, 5, 2 );
INSERT INTO node VALUES ( 21, 18, 5 );
INSERT INTO node VALUES ( 20, 18, 6 );
INSERT INTO node VALUES ( 19, 5, 8 );
COMMIT;
Now, you can take my word...
...that no simple queries exist for either breadth-first or depth-first result sets, nothing in the form
SELECT name
FROM employee
ORDER BY [column-name-list];
Or, you can take a moment to experiment... go ahead, I'll wait :)
Starting Point: Top-Down Traversal
The first step is to write a building block for future efforts: a query that does a top-to-bottom traversal without regard to the left-to-right ordering.
The following RECURSIVE UNION query uses parent_id to calculat a new column, level, which contains the vertical level number 1, 2, 3, 4.
The first SELECT on lines 6 through 11 starts the ball rolling by returning the single root row:
level, node_id, parent_id, sibling_number
1, 1, 1, 9
The second SELECT on lines 13 through 20 recursively traverses all the rows underneath Jordan:
- The FROM clause on lines 17 through 19 joins the employee table with all the rows that have already been selected for the top_down_traversal result; this is the "recursive" part, a reference inside a query to the outer query itself.
- the ON clause on line 19 makes sure each new row selected from employee is exactly one level down from its parent row in top_down_traversal.
- the expression top_down_traversal.level + 1 on line 13 calculates the level number (one level down) for the new row selected from employee, and
- the WHERE clause on line 20 excludes the root "1" row because it was added by the first SELECT.
WITH RECURSIVE top_down_traversal
( level,
node_id,
parent_id,
sibling_number )
AS ( SELECT CAST ( 1 AS INTEGER ) AS level,
node.node_id AS node_id,
node.parent_id AS parent_id,
node.sibling_number AS sibling_number
FROM node
WHERE node.node_id = node.parent_id
UNION ALL
SELECT top_down_traversal.level + 1,
node.node_id,
node.parent_id,
node.sibling_number
FROM top_down_traversal
INNER JOIN node
ON node.parent_id = top_down_traversal.node_id
WHERE node.parent_id <> node.node_id )
SELECT level,
node_id
FROM top_down_traversal
ORDER BY level,
node_id;
The WITH RECURSIVE clause on lines 1 through 5 is the header for the temporary view defined by the AS ( ... ) clause on lines 6 through 20.
The SELECT on lines 21 through 25 demonstrates how the temporary view works to add level number to each row.
The output shows the top-to-bottom order, but not (yet) left-to-right:
level, node_id
1, 1
2, 2
2, 4
2, 5
2, 93
3, 6
3, 7
3, 8
3, 9
3, 10
3, 11
3, 12
3, 13
3, 14
3, 15
3, 16
3, 17
3, 18
3, 19
4, 20
4, 21
4, 22
4, 23
4, 24
4, 25
4, 26
4, 27
Answer 1: Breadth-First Traversal
ORDER BY level gives the top-to-bottom order for all the rows, and ORDER BY sibling_number gives the left-to-right order for rows with the same parent, but what about traversing all the rows in breadth-first order?
What's needed is a kind of "variable ORDER BY", one that includes the sibling_number for all the rows in the current row's lineage, all the way from the top of the hierarchy down to the current row: ORDER BY level, sibling_number, sibling_number, sibling_number, ...
It would look like ORDER BY 1, 9 for the root row, ORDER BY 2, 9, 1 for row 2, ORDER BY 2, 9, 2 for row 93 and so on.
Here's what the variable ORDER BY would look like for all the rows, if it was possible to code such a thing in SQL:
1,9
|
------------------------------------------------------------------
2,1 93,2 4,3 5,4
| | | |
------------------------ ---------------------- ---------------- ----------
6,5 7,6 8,7 9,8 10,9 11,3 12,5 13,7 14,9 15,1 16,5 17,9 18,2 19,8
| | | |
---------- ---------- ---------- ----------
27,8 26,9 25,7 24,8 23,6 22,7 21,5 20,6
node_id ORDER BY level, sibling_number, ...
======= ===================================
1 ORDER BY 1, 9
2 ORDER BY 2, 9, 1
93 ORDER BY 2, 9, 2
4 ORDER BY 2, 9, 3
5 ORDER BY 2, 9, 4
6 ORDER BY 3, 9, 1, 5
7 ORDER BY 3, 9, 1, 6
8 ORDER BY 3, 9, 1, 7
9 ORDER BY 3, 9, 1, 8
10 ORDER BY 3, 9, 1, 9
11 ORDER BY 3, 9, 2, 3
12 ORDER BY 3, 9, 2, 5
13 ORDER BY 3, 9, 2, 7
14 ORDER BY 3, 9, 2, 9
15 ORDER BY 3, 9, 3, 1
16 ORDER BY 3, 9, 3, 5
17 ORDER BY 3, 9, 3, 9
18 ORDER BY 3, 9, 4, 2
19 ORDER BY 3, 9, 4, 8
27 ORDER BY 4, 9, 1, 6, 8
26 ORDER BY 4, 9, 1, 6, 9
25 ORDER BY 4, 9, 2, 7, 7
24 ORDER BY 4, 9, 2, 7, 8
23 ORDER BY 4, 9, 3, 9, 6
22 ORDER BY 4, 9, 3, 9, 7
21 ORDER BY 4, 9, 4, 2, 5
20 ORDER BY 4, 9, 4, 2, 6
Variable ORDER BY clauses...
...aren't possible in SQL Anywhere, but variable
strings are.
The following query adds one more column to the result set: lineage contains a formatted string consisting of the sibling_numbers from all the ancestors to this one. Each sibling_number is padded on the left with zeros to deal with the fact that strings are sorted from left to right, and dashes "-" are used as separators for clarity.
WITH RECURSIVE breadth_first_traversal
( level,
lineage,
node_id,
parent_id,
sibling_number )
AS ( SELECT CAST ( 1 AS INTEGER ) AS level,
CAST ( RIGHT ( STRING ( '00', node.sibling_number ), 3 )
AS LONG VARCHAR ) AS lineage,
node.node_id AS node_id,
node.parent_id AS parent_id,
node.sibling_number AS sibling_number
FROM node
WHERE node.node_id = node.parent_id
UNION ALL
SELECT breadth_first_traversal.level + 1,
STRING ( breadth_first_traversal.lineage,
'-',
RIGHT ( STRING ( '00', node.sibling_number ), 3 ) ),
node.node_id,
node.parent_id,
node.sibling_number
FROM breadth_first_traversal
INNER JOIN node
ON node.parent_id = breadth_first_traversal.node_id
WHERE node.parent_id <> node.node_id )
SELECT node_id,
level,
lineage
FROM breadth_first_traversal
ORDER BY level,
lineage;
Here's the breadth-first result of the ORDER BY level, lineage clause:
1,9
|
------------------------------------------------------------------
2,1 93,2 4,3 5,4
| | | |
------------------------ ---------------------- ---------------- ----------
6,5 7,6 8,7 9,8 10,9 11,3 12,5 13,7 14,9 15,1 16,5 17,9 18,2 19,8
| | | |
---------- ---------- ---------- ----------
27,8 26,9 25,7 24,8 23,6 22,7 21,5 20,6
node_id, level, lineage
1, 1, 009
2, 2, 009-001
93, 2, 009-002
4, 2, 009-003
5, 2, 009-004
6, 3, 009-001-005
7, 3, 009-001-006
8, 3, 009-001-007
9, 3, 009-001-008
10, 3, 009-001-009
11, 3, 009-002-003
12, 3, 009-002-005
13, 3, 009-002-007
14, 3, 009-002-009
15, 3, 009-003-001
16, 3, 009-003-005
17, 3, 009-003-009
18, 3, 009-004-002
19, 3, 009-004-008
27, 4, 009-001-006-008
26, 4, 009-001-006-009
25, 4, 009-002-007-007
24, 4, 009-002-007-008
23, 4, 009-003-009-006
22, 4, 009-003-009-007
21, 4, 009-004-002-005
20, 4, 009-004-002-006
Answer 2: Depth-First Traversal
A variable ORDER BY is also required for a depth-first traversal, but it's simpler: no level number, just the lineage.
In other words, it's exactly the same query as the breadth-first traversal except for the ORDER BY lineage clause (and the view name, depth_first_traversal):
WITH RECURSIVE depth_first_traversal
( level,
lineage,
node_id,
parent_id,
sibling_number )
AS ( SELECT CAST ( 1 AS INTEGER ) AS level,
CAST ( RIGHT ( STRING ( '00', node.sibling_number ), 3 )
AS LONG VARCHAR ) AS lineage,
node.node_id AS node_id,
node.parent_id AS parent_id,
node.sibling_number AS sibling_number
FROM node
WHERE node.node_id = node.parent_id
UNION ALL
SELECT depth_first_traversal.level + 1,
STRING ( depth_first_traversal.lineage,
'-',
RIGHT ( STRING ( '00', node.sibling_number ), 3 ) ),
node.node_id,
node.parent_id,
node.sibling_number
FROM depth_first_traversal
INNER JOIN node
ON node.parent_id = depth_first_traversal.node_id
WHERE node.parent_id <> node.node_id )
SELECT node_id,
level,
lineage
FROM depth_first_traversal
ORDER BY lineage;
Here's what the depth-first output from ORDER BY lineage looks like:
1,9
|
------------------------------------------------------------------
2,1 93,2 4,3 5,4
| | | |
------------------------ ---------------------- ---------------- ----------
6,5 7,6 8,7 9,8 10,9 11,3 12,5 13,7 14,9 15,1 16,5 17,9 18,2 19,8
| | | |
---------- ---------- ---------- ----------
27,8 26,9 25,7 24,8 23,6 22,7 21,5 20,6
node_id, level, lineage
1, 1, 009
2, 2, 009-001
6, 3, 009-001-005
7, 3, 009-001-006
27, 4, 009-001-006-008
26, 4, 009-001-006-009
8, 3, 009-001-007
9, 3, 009-001-008
10, 3, 009-001-009
93, 2, 009-002
11, 3, 009-002-003
12, 3, 009-002-005
13, 3, 009-002-007
25, 4, 009-002-007-007
24, 4, 009-002-007-008
14, 3, 009-002-009
4, 2, 009-003
15, 3, 009-003-001
16, 3, 009-003-005
17, 3, 009-003-009
23, 4, 009-003-009-006
22, 4, 009-003-009-007
5, 2, 009-004
18, 3, 009-004-002
21, 4, 009-004-002-005
20, 4, 009-004-002-006
19, 3, 009-004-008
When looking at the list above, don't forget that level is NOT included in the ORDER BY.
Also, the root sibling_number "009" doesn't really have to be included in the lineage string since it's the same for all rows.
Test Case: Ainslie's Company
Here's the data from
an earlier article on RECURSIVE UNION queries:
CREATE TABLE employee (
employee_id INTEGER NOT NULL,
manager_id INTEGER NOT NULL REFERENCES employee ( employee_id ),
name VARCHAR ( 20 ) NOT NULL,
salary NUMERIC ( 20, 2 ) NOT NULL,
PRIMARY KEY ( employee_id ) );
INSERT INTO employee VALUES ( 1, 1, 'Ainslie', 1000000.00 );
INSERT INTO employee VALUES ( 2, 1, 'Briana', 900000.00 );
INSERT INTO employee VALUES ( 3, 1, 'Calista', 900000.00 );
INSERT INTO employee VALUES ( 4, 1, 'Delmar', 900000.00 );
INSERT INTO employee VALUES ( 5, 2, 'Electra', 750000.00 );
INSERT INTO employee VALUES ( 6, 3, 'Fabriane', 800000.00 );
INSERT INTO employee VALUES ( 7, 3, 'Genevieve', 750000.00 );
INSERT INTO employee VALUES ( 8, 4, 'Hunter', 800000.00 );
INSERT INTO employee VALUES ( 9, 6, 'Inari', 500000.00 );
INSERT INTO employee VALUES ( 10, 6, 'Jordan', 100000.00 );
INSERT INTO employee VALUES ( 11, 8, 'Khalil', 100000.00 );
INSERT INTO employee VALUES ( 12, 8, 'Lisette', 100000.00 );
INSERT INTO employee VALUES ( 13, 10, 'Marlon', 100000.00 );
INSERT INTO employee VALUES ( 14, 10, 'Nissa', 100000.00 );
COMMIT;
Here's the breadth-first traversal query for Ainslie's company, same as the previous example with the following name changes:
- node becomes employee,
- node_id becomes employee_id
- parent_id becomes manager_id, and
- sibling_number becomes name:
WITH RECURSIVE breadth_first_traversal
( level,
lineage,
employee_id,
manager_id,
name )
AS ( SELECT CAST ( 1 AS INTEGER ) AS level,
CAST ( employee.name
AS LONG VARCHAR ) AS lineage,
employee.employee_id AS employee_id,
employee.manager_id AS manager_id,
employee.name AS name
FROM employee
WHERE employee.employee_id = employee.manager_id
UNION ALL
SELECT breadth_first_traversal.level + 1,
STRING ( breadth_first_traversal.lineage,
'-',
employee.name ),
employee.employee_id,
employee.manager_id,
employee.name
FROM breadth_first_traversal
INNER JOIN employee
ON employee.manager_id = breadth_first_traversal.employee_id
WHERE employee.manager_id <> employee.employee_id )
SELECT employee_id,
level,
lineage
FROM breadth_first_traversal
ORDER BY level,
lineage;
Here's the breadth-first result set for Ainslie's company:
employee_id, level, lineage
1, 1, Ainslie
2, 2, Ainslie-Briana
3, 2, Ainslie-Calista
4, 2, Ainslie-Delmar
5, 3, Ainslie-Briana-Electra
6, 3, Ainslie-Calista-Fabriane
7, 3, Ainslie-Calista-Genevieve
8, 3, Ainslie-Delmar-Hunter
9, 4, Ainslie-Calista-Fabriane-Inari
10, 4, Ainslie-Calista-Fabriane-Jordan
11, 4, Ainslie-Delmar-Hunter-Khalil
12, 4, Ainslie-Delmar-Hunter-Lisette
13, 5, Ainslie-Calista-Fabriane-Jordan-Marlon
14, 5, Ainslie-Calista-Fabriane-Jordan-Nissa
Here's the depth-first query for Ainslie's company:
WITH RECURSIVE depth_first_traversal
( level,
lineage,
employee_id,
manager_id,
name )
AS ( SELECT CAST ( 1 AS INTEGER ) AS level,
CAST ( employee.name
AS LONG VARCHAR ) AS lineage,
employee.employee_id AS employee_id,
employee.manager_id AS manager_id,
employee.name AS name
FROM employee
WHERE employee.employee_id = employee.manager_id
UNION ALL
SELECT depth_first_traversal.level + 1,
STRING ( depth_first_traversal.lineage,
'-',
employee.name ),
employee.employee_id,
employee.manager_id,
employee.name
FROM depth_first_traversal
INNER JOIN employee
ON employee.manager_id = depth_first_traversal.employee_id
WHERE employee.manager_id <> employee.employee_id )
SELECT employee_id,
level,
lineage
FROM depth_first_traversal
ORDER BY lineage;
Here's the depth-first result set for Ainslie's company:
employee_id, level, lineage
1, 1, Ainslie
2, 2, Ainslie-Briana
5, 3, Ainslie-Briana-Electra
3, 2, Ainslie-Calista
6, 3, Ainslie-Calista-Fabriane
9, 4, Ainslie-Calista-Fabriane-Inari
10, 4, Ainslie-Calista-Fabriane-Jordan
13, 5, Ainslie-Calista-Fabriane-Jordan-Marlon
14, 5, Ainslie-Calista-Fabriane-Jordan-Nissa
7, 3, Ainslie-Calista-Genevieve
4, 2, Ainslie-Delmar
8, 3, Ainslie-Delmar-Hunter
11, 4, Ainslie-Delmar-Hunter-Khalil
12, 4, Ainslie-Delmar-Hunter-Lisette
Test Case: Jordan's Company
Ainslie's company is not a good example for testing and debugging because the data is unrealistic; i.e., both of these simple queries work as breadth-first traversals without all the fuss and muss of a RECURSIVE UNION:
SELECT *
FROM employee
ORDER BY employee_id;
SELECT *
FROM employee
ORDER BY name;
Here's a better example; same names but shuffled around, and random values used for employee_id:
636 Jordan
|
------------------------------------------------------
543 Briana 323 Delmar 914 Electra
| | |
------------ -------------------------- ----------
587 Fabriane 168 Calista 813 Genevieve 362 Hunter
| |
---------------------- -----------------------
611 Ainslie 893 Inari 119 Khalil 777 Lisette
|
---------------------
809 Marlon 326 Nissa
CREATE TABLE employee (
employee_id INTEGER NOT NULL,
manager_id INTEGER NOT NULL REFERENCES employee ( employee_id ),
name VARCHAR ( 20 ) NOT NULL,
salary NUMERIC ( 20, 2 ) NOT NULL,
PRIMARY KEY ( employee_id ) );
INSERT INTO employee VALUES ( 636, 636, 'Jordan', 1000000.00 );
INSERT INTO employee VALUES ( 543, 636, 'Briana', 900000.00 );
INSERT INTO employee VALUES ( 323, 636, 'Delmar', 900000.00 );
INSERT INTO employee VALUES ( 914, 636, 'Electra', 900000.00 );
INSERT INTO employee VALUES ( 587, 543, 'Fabriane', 750000.00 );
INSERT INTO employee VALUES ( 168, 323, 'Calista', 800000.00 );
INSERT INTO employee VALUES ( 813, 323, 'Genevieve', 750000.00 );
INSERT INTO employee VALUES ( 362, 914, 'Hunter', 800000.00 );
INSERT INTO employee VALUES ( 611, 168, 'Ainslie', 500000.00 );
INSERT INTO employee VALUES ( 893, 168, 'Inari', 100000.00 );
INSERT INTO employee VALUES ( 119, 362, 'Khalil', 100000.00 );
INSERT INTO employee VALUES ( 777, 362, 'Lisette', 100000.00 );
INSERT INTO employee VALUES ( 809, 893, 'Marlon', 100000.00 );
INSERT INTO employee VALUES ( 326, 893, 'Nissa', 100000.00 );
COMMIT;
The queries are the same as for Ainslie's company, just the data is different.
Here's the breadth-first result set for Jordan's company:
employee_id, level, lineage
636, 1, Jordan
543, 2, Jordan-Briana
323, 2, Jordan-Delmar
914, 2, Jordan-Electra
587, 3, Jordan-Briana-Fabriane
168, 3, Jordan-Delmar-Calista
813, 3, Jordan-Delmar-Genevieve
362, 3, Jordan-Electra-Hunter
611, 4, Jordan-Delmar-Calista-Ainslie
893, 4, Jordan-Delmar-Calista-Inari
119, 4, Jordan-Electra-Hunter-Khalil
777, 4, Jordan-Electra-Hunter-Lisette
809, 5, Jordan-Delmar-Calista-Inari-Marlon
326, 5, Jordan-Delmar-Calista-Inari-Nissa
Here's the depth-first result set for Jordan's company:
employee_id, level, lineage
636, 1, Jordan
543, 2, Jordan-Briana
587, 3, Jordan-Briana-Fabriane
323, 2, Jordan-Delmar
168, 3, Jordan-Delmar-Calista
611, 4, Jordan-Delmar-Calista-Ainslie
893, 4, Jordan-Delmar-Calista-Inari
809, 5, Jordan-Delmar-Calista-Inari-Marlon
326, 5, Jordan-Delmar-Calista-Inari-Nissa
813, 3, Jordan-Delmar-Genevieve
914, 2, Jordan-Electra
362, 3, Jordan-Electra-Hunter
119, 4, Jordan-Electra-Hunter-Khalil
777, 4, Jordan-Electra-Hunter-Lisette