Wednesday, April 11, 2012

Example: RECURSIVE UNION

Once upon a time, there was an up-to-date book describing SQL Anywhere.

That was then, this is now. The book's still available but long out of date: It doesn't describe SQL Anywhere 12, or 11, or 10, or even that old classic SQL Anywhere 9.0.2... it's so old that it only covers the "GA" version of SQL Anywhere 9: version 9.0.0.

Parts of that book still apply, however, including the following excerpt about RECURSIVE UNION which is posted here because someone asked this question: How to query all items associated with the parent item at once (Bill of material)?

As features go, RECURSIVE UNION is as complex as it is powerful... it's hard to imagine how even IBM a committee can produce syntax this confusing, but ANSI never ceases to amaze (take the MERGE statement, for example, or OLAP windows... but those are rants topics for another day).

Most folks tackling RECURSIVE UNION for the first time need all the examples they can get, and there are some in the SQL Anywhere Help; here are the relevant topics:



And now, here's the excerpt from the book:

3.37 Recursive UNION


The recursive union is a special technique that uses the WITH clause to define a local view based on a UNION ALL of two queries:
  • The first query inside the local view is an "initial seed query" that provides one or more rows to get the process rolling.

  • The second query contains a recursive reference to the local view name itself, and it appends more rows to the initial result set produced by the first query. The RECURSIVE keyword must appear in the WITH clause for the recursion to work.

  • The WITH clause as a whole appears in front of a third, outer, query that also refers to the local view; it is this outer query that drives the whole process and produces an actual result set.
Here is the syntax for a typical recursive union:
<typical_recursive_union>       ::= WITH RECURSIVE <local_view_name>
                                       "(" <alias_name_list> ")"
                                    AS "(" <initial_query_specification>
                                           UNION ALL
                                           <recursive_query_specification> ")"
                                    <outer_query_specification>
                                    [ <order_by_clause> ]
                                    [ <for_clause> ]

<initial_query_specification>   ::= <query_specification> that provides seed rows

<recursive_query_specification> ::= <query_specification> that recursively
                                       refers to the <local_view_name>

<outer_query_specification>     ::= <query_specification> that refers to
                                       the <local_view_name>

Note: A recursive process is one that is defined in terms of itself. Consider the factorial of a number: the factorial of 6 is defined as 6 * 5 * 4 * 3 * 2 * 1, or 720, for example, so the formula for factorial may be written using a recursive definition: "factorial ( n ) = n * factorial ( n - 1 )". It's sometimes a convenient way to think about a complex process, and if you can code it the way you think about it, so much the better. SQL Anywhere allows you to code recursive functions like factorial; for more information about the CREATE FUNCTION statement see Section 8.10 in Chapter 8, Packaging. This section talks about a different kind of recursive process, the recursive union.

Recursive unions can be used to process hierarchical relationships in the data. Hierarchies in the data often involve self-referencing foreign key relationships where different rows in the same table act as child and parent for one another. These relationships are very difficult to handle with ordinary SQL, especially if the number of levels in the hierarchy can vary widely.

Figure 3-1 shows just such a relationship, an organization chart for a company with 14 employees where the arrows show the reporting structure; e.g., Briana, Calista and Delmar all report to Ainslie, Electra reports to Briana, and so on.

Figure 3-1. Organization Chart

Here is a table definition plus the data to represent the organization chart in Figure 3-1; the employee_id column is the primary key identifying each employee, the manager_id column points to the employee's superior just like the arrows in Figure 3-1, and the name and salary columns contain data about the employee. Note that manager_id is set to 1 for employee_id = 1; that simply means Ainslie is at the top of the chart and doesn't report to anyone else within the company:
CREATE TABLE employee ( 
   employee_id  INTEGER NOT NULL,
   manager_id   INTEGER NOT NULL REFERENCES employee ( employee_id ),
   name         VARCHAR ( 20 ) NOT NULL,
   salary       NUMERIC ( 20, 2 ) NOT NULL,
   PRIMARY KEY ( employee_id ) );

INSERT INTO employee VALUES ( 1,  1,  'Ainslie',  1000000.00 );
INSERT INTO employee VALUES ( 2,  1,  'Briana',    900000.00 );
INSERT INTO employee VALUES ( 3,  1,  'Calista',   900000.00 );
INSERT INTO employee VALUES ( 4,  1,  'Delmar',    900000.00 );
INSERT INTO employee VALUES ( 5,  2,  'Electra',   750000.00 );
INSERT INTO employee VALUES ( 6,  3,  'Fabriane',  800000.00 );
INSERT INTO employee VALUES ( 7,  3,  'Genevieve', 750000.00 );
INSERT INTO employee VALUES ( 8,  4,  'Hunter',    800000.00 );
INSERT INTO employee VALUES ( 9,  6,  'Inari',     500000.00 );
INSERT INTO employee VALUES ( 10, 6,  'Jordan',    100000.00 );
INSERT INTO employee VALUES ( 11, 8,  'Khalil',    100000.00 );
INSERT INTO employee VALUES ( 12, 8,  'Lisette',   100000.00 );
INSERT INTO employee VALUES ( 13, 10, 'Marlon',    100000.00 );
INSERT INTO employee VALUES ( 14, 10, 'Nissa',     100000.00 );

Note: The employee table shown here is different from the employee table in the ASADEMO database.

Here is a SELECT which answers the question, "Who are Marlon's superiors on the way up the chart to Ainslie?":
WITH RECURSIVE superior_list 
   ( level,
     chosen_employee_id,
     manager_id,
     employee_id,
     name )
AS ( SELECT CAST ( 1 AS INTEGER ) AS level,
            employee.employee_id    AS chosen_employee_id,
            employee.manager_id     AS manager_id,
            employee.employee_id    AS employee_id,
            employee.name           AS name
       FROM employee
     UNION ALL
     SELECT superior_list.level + 1,
            superior_list.chosen_employee_id,
            employee.manager_id,
            employee.employee_id,
            employee.name
       FROM superior_list
            INNER JOIN employee
                    ON employee.employee_id = superior_list.manager_id
      WHERE superior_list.level <= 99
        AND superior_list.manager_id <> superior_list.employee_id )
SELECT superior_list.level,
       superior_list.name
  FROM superior_list
 WHERE superior_list.chosen_employee_id = 13
 ORDER BY superior_list.level DESC;

The final result set shows there are five levels in the hierarchy, with Jordan, Fabriane and Calista on the path between Marlon and Ainslie:
level  name
=====  ========
5      Ainslie
4      Calista
3      Fabriane
2      Jordan
1      Marlon

Here's how the above SELECT works:
  1. The WITH RECURSIVE clause starts by giving a name to the local view, superior_list, and a list of alias names for the 5 columns in that local view.

  2. Each row in the view result set will contain information about one of Marlon's superiors on the path between Marlon and Ainslie. The end points will be included, so there will be a row for Marlon himself.

  3. The level column in the view will contain the hierarchical level, numbered from 1 for Marlon at the bottom, 2 at the next level up, and so on.

  4. The chosen_employee_id column will identify the employee of interest; in this case, it will be the fixed value 13 for Marlon because that's who the question asked about. In other words, every row will contain 13, and how this comes about is explained in point 10 below.

  5. The manager_id column will identify the employee one level above this one, whereas employee_id and name will identify the employee at this level.

  6. The first query in the UNION ALL selects all the rows from the employee table, and assigns them all level number 1. These rows are the bottom starting points for all possible queries about "Who are this employee's superiors?". This is the non-recursive "seed query" which gets the process going. In actual fact, there will only be one row generated by this query; how that is accomplished is explained in point 10 below.

  7. The second query in the UNION ALL performs an INNER JOIN between rows in the employee table and rows which already exist in the superior_list result set, starting with the rows which came from the seed query. For each row already in superior_list, the INNER JOIN finds the employee row one level up in the hierarchy via "ON employee.employee_id = superior_list.manager_id". This recursive reference back to the local view itself is the reason for the RECURSIVE keyword on the WITH clause.

  8. For each new row added to the result set by the second query in the UNION ALL, the level value is set 1 higher than the level in the row already in superior_list. The chosen_employee_id is set to the same value as the chosen_employee_id in the row already in superior list. The other three columns, manager_id, employee_id and name, are taken from the row in employee representing the person one level up in the hierarchy.

  9. The WHERE clause keeps the recursion from running out of control. First of all, there is a sanity check on the level which stops the query when it hits the impossible number of 99. The second predicate in the WHERE clause, "superior_list.manager_id <> superior_list.employee_id", stops the recursion when Ainslie's row is reached; no attempt is made to look above her row when it shows up as one of the rows already existing in superior_list.

  10. The outer SELECT displays all the rows in the superior_list where the chosen_employee_id is 13 for Marlon. The outer WHERE clause effectively throws away all the rows from the first query in the UNION ALL except the one for Marlon. It also excludes all the rows added by the second query in the UNION ALL except for the ones on the path above Marlon. The ORDER BY sorts the result in descending order by level so Ainslie appears at the top and Marlon at the bottom.
Tip: Always include a level number in a recursive union result set, and a WHERE clause that performs a reasonableness check on the value. A loop in the data, or a bug in the query, may result in a runaway query, and it's a good idea to stop it before SQL Anywhere raises an error.

A CREATE VIEW statement can be used to store a complex recursive UNION for use in multiple, different, queries. The previous query can be turned into a permanent view by replacing the outer SELECT with a simple "SELECT *" and giving it a name in a CREATE VIEW statement, as follows:
CREATE VIEW v_superior_list AS
WITH RECURSIVE superior_list 
   ( level,
     chosen_employee_id,
     manager_id,
     employee_id,
     name )
AS ( SELECT CAST ( 1 AS INTEGER ) AS level,
            employee.employee_id    AS chosen_employee_id,
            employee.manager_id     AS manager_id,
            employee.employee_id    AS employee_id,
            employee.name           AS name
       FROM employee
     UNION ALL
     SELECT superior_list.level + 1,
            superior_list.chosen_employee_id,
            employee.manager_id,
            employee.employee_id,
            employee.name
       FROM superior_list
            INNER JOIN employee
                    ON employee.employee_id = superior_list.manager_id
      WHERE superior_list.level <= 99
        AND superior_list.manager_id <> superior_list.employee_id )
SELECT *
  FROM superior_list;

The outer query from the previous example is now a much simpler standalone query using the view v_superior_list:
SELECT v_superior_list.level,
       v_superior_list.name
  FROM v_superior_list
 WHERE v_superior_list.chosen_employee_id = 13
 ORDER BY v_superior_list.level DESC;

That query produces exactly the same result set as before:
level  name
=====  ========
5      Ainslie
4      Calista
3      Fabriane
2      Jordan
1      Marlon

Here is another query which uses the same view in a different way: the LIST function shows all superiors on one line, and the WHERE clause eliminates Khalil's own name from the list:
SELECT LIST ( v_superior_list.name,
              ', then ' 
              ORDER BY v_superior_list.level ASC ) AS "Khalil's Superiors"
  FROM v_superior_list
 WHERE v_superior_list.chosen_employee_id = 11
   AND v_superior_list.level > 1;

Here's the one-line result from the query above:
Khalil's Superiors
==================
Hunter, then Delmar, then Ainslie

Here is an example of a recursive union which can be used to answer top-down questions, including "What is the total salary of each employee plus all that employee's subordinates?"
CREATE VIEW v_salary_list AS
WITH RECURSIVE salary_list 
   ( level,
     chosen_employee_id,
     manager_id,
     employee_id,
     name, 
     salary )
AS ( SELECT CAST ( 1 AS INTEGER ) AS level,
            employee.employee_id  AS chosen_employee_id,
            employee.manager_id   AS manager_id,
            employee.employee_id  AS employee_id,
            employee.name         AS name,
            employee.salary       AS salary
       FROM employee
     UNION ALL
     SELECT salary_list.level + 1,
            salary_list.chosen_employee_id,
            employee.manager_id,
            employee.employee_id,
            employee.name,
            employee.salary
       FROM salary_list
            INNER JOIN employee    
                    ON employee.manager_id = salary_list.employee_id
      WHERE salary_list.level <= 99
        AND employee.manager_id <> employee.employee_id )
SELECT *
  FROM salary_list;

This view works differently than the previous example: unlike v_superior_list, v_salary_list walks the hierarchy from the top down. The first query in the UNION ALL seeds the result set with all the employees as before, but the second query looks for employee rows further down in the hierarchy by using the condition "ON employee.manager_id = salary_list.employee_id" as opposed to the condition "ON employee.employee_id = superior_list.manager_id" in v_superior_list.

Here's how v_salary_list can be used to compute the total payroll for each employee in the company; for each row in the employee table, a subquery computes the SUM of all v_salary_list.salary values where the chosen_employee_id matches employee.employee_id:
SELECT employee.name,
       ( SELECT SUM ( v_salary_list.salary )
           FROM v_salary_list
          WHERE v_salary_list.chosen_employee_id 
              = employee.employee_id )           AS payroll
  FROM employee
 ORDER BY 1;

Here's the final result set; at the top Ainslie's payroll figure is the sum of everyone's salary, and at the bottom Nissa's figure includes her own salary and no one else's:
name       payroll
=========  ==========
Ainslie    7800000.00
Briana     1650000.00
Calista    3250000.00
Delmar     1900000.00
Electra     750000.00
Fabriane   1600000.00
Genevieve   750000.00
Hunter     1000000.00
Inari       500000.00
Jordan      300000.00
Khalil      100000.00  
Lisette     100000.00
Marlon      100000.00
Nissa       100000.00


2 comments:

Holly said...
This comment has been removed by the author.
Unknown said...

Following the issue: How fast is "RECURSIVE UNION"? Each level is permanent 'select' and in case of distributed databases with multiple tables ? Any Evaluation Tests...?!