Monday, October 15, 2012

Example: RECURSIVE UNION Inverted ORDER BY

Once upon a time, sorting was a basic building block for many computer programs. Both RAM and disk drives were limited and expensive... sequential I/O with magnetic tape was used for almost everything, plus the ubiquitous sort/merge utility.

And so it came to pass that many techniques were developed to exploit the sorting process, techniques that are sometimes useful today when it comes time to exploit the ORDER BY clause.

The previous article about breadth-first and depth-first tree traversals told of one such technique: building strings for the sole purpose of sorting.

This article tells the story of another such technique: inverting values before building the sort string.

Question: How do I perform breadth-first and depth-first traversals of a hierarchical table when the left-to-right order is given by column values that are descending rather than ascending?

Here's the tree structure table for "Jordan's Company" from the previous article about breadth-first and depth-first tree traversals:

636 Jordan
                          |
 ------------------------------------------------------
 543 Briana          323 Delmar             914 Electra        
     |                    |                      |
------------   --------------------------   ---------- 
587 Fabriane   168 Calista  813 Genevieve   362 Hunter         
                   |                             |
         ----------------------       -----------------------
         611 Ainslie  893 Inari       119 Khalil  777 Lisette
                         |
               ---------------------         
               809 Marlon  326 Nissa
Here's what that table looks like if the left-to-right sort order on employee.name is descending rather than ascending:
636 Jordan
                          |
       ---------------------------------------------------------------
       914 Electra                   323 Delmar             543 Briana  
            |                             |                     |
       ----------             --------------------------   ------------ 
       362 Hunter             813 Genevieve  168 Calista   587 Fabriane
            |                                    |
-----------------------                ----------------------
777 Lisette  119 Khalil                893 Inari  611 Ainslie
                                           |
                                  ---------------------         
                                  326 Nissa  809 Marlon
The data is the exacty the same as before, it's just the interpretation of the employee.name column that has changed: descending left-to-right for all the rows with the same parent:
CREATE TABLE employee ( 
   employee_id  INTEGER NOT NULL,
   manager_id   INTEGER NOT NULL REFERENCES employee ( employee_id ),
   name         VARCHAR ( 20 ) NOT NULL,
   salary       NUMERIC ( 20, 2 ) NOT NULL,
   PRIMARY KEY ( employee_id ) );

INSERT INTO employee VALUES ( 636, 636, 'Jordan',   1000000.00 );
INSERT INTO employee VALUES ( 543, 636, 'Briana',    900000.00 );
INSERT INTO employee VALUES ( 323, 636, 'Delmar',    900000.00 );
INSERT INTO employee VALUES ( 914, 636, 'Electra',   900000.00 );
INSERT INTO employee VALUES ( 587, 543, 'Fabriane',  750000.00 );
INSERT INTO employee VALUES ( 168, 323, 'Calista',   800000.00 );
INSERT INTO employee VALUES ( 813, 323, 'Genevieve', 750000.00 );
INSERT INTO employee VALUES ( 362, 914, 'Hunter',    800000.00 );
INSERT INTO employee VALUES ( 611, 168, 'Ainslie',   500000.00 );
INSERT INTO employee VALUES ( 893, 168, 'Inari',     100000.00 );
INSERT INTO employee VALUES ( 119, 362, 'Khalil',    100000.00 );
INSERT INTO employee VALUES ( 777, 362, 'Lisette',   100000.00 );
INSERT INTO employee VALUES ( 809, 893, 'Marlon',    100000.00 );
INSERT INTO employee VALUES ( 326, 893, 'Nissa',     100000.00 );
COMMIT;

Answer 1: Breadth-First Traversal

Changing the breadth-first query from the previous article is easy: just add DESC to the ORDER BY clause!
WITH RECURSIVE breadth_first_traversal
   ( level,
     lineage,
     employee_id,
     manager_id,
     name )
AS ( SELECT CAST ( 1 AS INTEGER )          AS level,
            CAST ( employee.name 
                   AS LONG VARCHAR )       AS lineage,
            employee.employee_id           AS employee_id,
            employee.manager_id            AS manager_id,
            employee.name                  AS name
       FROM employee
      WHERE employee.employee_id = employee.manager_id
     UNION ALL
     SELECT breadth_first_traversal.level + 1,
            STRING ( breadth_first_traversal.lineage,
                     '-',
                     employee.name ),
            employee.employee_id,
            employee.manager_id,
            employee.name
       FROM breadth_first_traversal
            INNER JOIN employee
                    ON employee.manager_id = breadth_first_traversal.employee_id
      WHERE employee.manager_id <> employee.employee_id )
SELECT employee_id,
       level,
       lineage
  FROM breadth_first_traversal
 ORDER BY level,
       lineage DESC;
Such an easy change! ...here's the new breadth-first output:
636 Jordan
                          |
       ---------------------------------------------------------------
       914 Electra                   323 Delmar             543 Briana  
            |                             |                     |
       ----------             --------------------------   ------------ 
       362 Hunter             813 Genevieve  168 Calista   587 Fabriane
            |                                    |
-----------------------                ----------------------
777 Lisette  119 Khalil                893 Inari  611 Ainslie
                                           |
                                  ---------------------         
                                  326 Nissa  809 Marlon

employee_id, level, lineage
636,         1,     Jordan
914,         2,     Jordan-Electra
323,         2,     Jordan-Delmar
543,         2,     Jordan-Briana
362,         3,     Jordan-Electra-Hunter
813,         3,     Jordan-Delmar-Genevieve
168,         3,     Jordan-Delmar-Calista
587,         3,     Jordan-Briana-Fabriane
777,         4,     Jordan-Electra-Hunter-Lisette
119,         4,     Jordan-Electra-Hunter-Khalil
893,         4,     Jordan-Delmar-Calista-Inari
611,         4,     Jordan-Delmar-Calista-Ainslie
326,         5,     Jordan-Delmar-Calista-Inari-Nissa
809,         5,     Jordan-Delmar-Calista-Inari-Marlon
The depth-first query is another matter. You can't just add DESC to the ORDER BY clause...

Try it, you won't like it!

Here's the depth-first query from the previous article, with ORDER BY lineage changed to DESC:
WITH RECURSIVE depth_first_traversal
   ( level,
     lineage,
     employee_id,
     manager_id,
     name )
AS ( SELECT CAST ( 1 AS INTEGER )          AS level,
            CAST ( employee.name 
                   AS LONG VARCHAR )       AS lineage,
            employee.employee_id           AS employee_id,
            employee.manager_id            AS manager_id,
            employee.name                  AS name
       FROM employee
      WHERE employee.employee_id = employee.manager_id
     UNION ALL
     SELECT depth_first_traversal.level + 1,
            STRING ( depth_first_traversal.lineage,
                     '-',
                     employee.name ),
            employee.employee_id,
            employee.manager_id,
            employee.name
       FROM depth_first_traversal
            INNER JOIN employee
                    ON employee.manager_id = depth_first_traversal.employee_id
      WHERE employee.manager_id <> employee.employee_id )
SELECT employee_id,
       level,
       lineage
  FROM depth_first_traversal
 ORDER BY lineage DESC;
The output is so far from the correct depth-first result that it's really quite disappointing, especially after the easy success with the breadth-first query:
636 Jordan
                          |
       ---------------------------------------------------------------
       914 Electra                   323 Delmar             543 Briana  
            |                             |                     |
       ----------             --------------------------   ------------ 
       362 Hunter             813 Genevieve  168 Calista   587 Fabriane
            |                                    |
-----------------------                ----------------------
777 Lisette  119 Khalil                893 Inari  611 Ainslie
                                           |
                                  ---------------------         
                                  326 Nissa  809 Marlon

employee_id, level, lineage
777,         4,     Jordan-Electra-Hunter-Lisette
119,         4,     Jordan-Electra-Hunter-Khalil
362,         3,     Jordan-Electra-Hunter
914,         2,     Jordan-Electra
813,         3,     Jordan-Delmar-Genevieve
326,         5,     Jordan-Delmar-Calista-Inari-Nissa
809,         5,     Jordan-Delmar-Calista-Inari-Marlon
893,         4,     Jordan-Delmar-Calista-Inari
611,         4,     Jordan-Delmar-Calista-Ainslie
168,         3,     Jordan-Delmar-Calista
323,         2,     Jordan-Delmar
587,         3,     Jordan-Briana-Fabriane
543,         2,     Jordan-Briana
636,         1,     Jordan

CREATE FUNCTION invert_name

The solution is to "invert" the characters in the employee.name column so that each value will be appear in descending order when the lineage column is sorted in ascending order... this has a profoundly different effect than just sorting the lineage string in descending order.

The invert_name() function changes A to z, B to y, C to x and so on for upper case characters, and a to Z, b to Y, c to X and so on for lower case characters. For example, invert_name ( 'Jordan' ) returns 'qLIWZM', and invert_name ( 'qLIWZM' ) returns 'Jordan'.
Note: Different "invert_xxx" functions may be required depending on the semantics of the column used for left-to-right ordering. For example, the following code assumes that employee.name contains only mixed-case English letters with no punctuation or special characters at all. Other data types (e.g., numeric, timestamp, etc.) may need different logic altogether; an example is shown later in this article.
CREATE FUNCTION invert_name ( IN input_name LONG VARCHAR )
RETURNS LONG VARCHAR
BEGIN
DECLARE output_name   LONG VARCHAR;
DECLARE char_pos      INTEGER;
DECLARE input_char    VARCHAR ( 1 );
DECLARE ASCII_input   INTEGER;
DECLARE ASCII_output  INTEGER;
DECLARE ASCII_A       INTEGER;
DECLARE ASCII_Z       INTEGER;

SET ASCII_A     = ASCII ( 'A' );
SET ASCII_Z     = ASCII ( 'Z' );
SET output_name = input_name;
SET char_pos    = 1;

WHILE char_pos <= LENGTH ( output_name ) LOOP

   SET input_char   = SUBSTR ( output_name, char_pos, 1 );
   SET ASCII_input  = ASCII ( UPPER ( input_char ) );
   SET ASCII_output = ASCII_Z - ( ASCII_input - ASCII_A );

   IF CAST ( input_char AS VARBINARY ( 1 ) ) = CAST ( UPPER ( input_char ) AS VARBINARY ( 1 ) ) THEN
      SET output_name = STUFF ( output_name, char_pos, 1, LOWER ( CHAR ( ASCII_output ) ) );
      MESSAGE STRING ( '1 ', input_char, ' ', LOWER ( CHAR ( ASCII_output ) ) );
   ELSE
      SET output_name = STUFF ( output_name, char_pos, 1, CHAR ( ASCII_output ) );
      MESSAGE STRING ( '2 ', input_char, ' ', CHAR ( ASCII_output ) );
   END IF;

   SET char_pos = char_pos + 1;
END LOOP;

RETURN output_name;

END;

Answer 2: Depth-First Traversal

Here's the depth-first traversal query using invert_name():
WITH RECURSIVE depth_first_traversal
   ( level,
     lineage,
     employee_id,
     manager_id,
     name )
AS ( SELECT CAST ( 1 AS INTEGER )          AS level,
            CAST ( invert_name ( employee.name )
                   AS LONG VARCHAR )       AS lineage,
            employee.employee_id           AS employee_id,
            employee.manager_id            AS manager_id,
            employee.name                  AS name
       FROM employee
      WHERE employee.employee_id = employee.manager_id
     UNION ALL
     SELECT depth_first_traversal.level + 1,
            STRING ( depth_first_traversal.lineage,
                     '-',
                     invert_name ( employee.name ) ),
            employee.employee_id,
            employee.manager_id,
            employee.name
       FROM depth_first_traversal
            INNER JOIN employee
                    ON employee.manager_id = depth_first_traversal.employee_id
      WHERE employee.manager_id <> employee.employee_id )
SELECT employee_id,
       level,
       name,
       lineage 
  FROM depth_first_traversal
 ORDER BY lineage;
Here's the depth-first output showing how ORDER BY lineage works when invert_name() is used to produce some seriously funky lineage values:
636 Jordan
                          |
       ---------------------------------------------------------------
       914 Electra                   323 Delmar             543 Briana  
            |                             |                     |
       ----------             --------------------------   ------------ 
       362 Hunter             813 Genevieve  168 Calista   587 Fabriane
            |                                    |
-----------------------                ----------------------
777 Lisette  119 Khalil                893 Inari  611 Ainslie
                                           |
                                  ---------------------         
                                  326 Nissa  809 Marlon

employee_id, level, name,        lineage
636,         1,     'Jordan',    qLIWZM
914,         2,     'Electra',   qLIWZM-vOVXGIZ
362,         3,     'Hunter',    qLIWZM-vOVXGIZ-sFMGVI
777,         4,     'Lisette',   qLIWZM-vOVXGIZ-sFMGVI-oRHVGGV
119,         4,     'Khalil',    qLIWZM-vOVXGIZ-sFMGVI-pSZORO
323,         2,     'Delmar',    qLIWZM-wVONZI
813,         3,     'Genevieve', qLIWZM-wVONZI-tVMVERVEV
168,         3,     'Calista',   qLIWZM-wVONZI-xZORHGZ
893,         4,     'Inari',     qLIWZM-wVONZI-xZORHGZ-rMZIR
326,         5,     'Nissa',     qLIWZM-wVONZI-xZORHGZ-rMZIR-mRHHZ
809,         5,     'Marlon',    qLIWZM-wVONZI-xZORHGZ-rMZIR-nZIOLM
611,         4,     'Ainslie',   qLIWZM-wVONZI-xZORHGZ-zRMHORV
543,         2,     'Briana',    qLIWZM-yIRZMZ
587,         3,     'Fabriane', qLIWZM-yIRZMZ-uZYIRZMV

Salary Descending Left-to-Right

Here's the same data as before, with one change; the employee.salary values have been modified so their descending values determine the left-to-right ordering. In other words, the left-to-right ordering of rows with the same parent is determined by the descending values of salary:
636 Jordan
                          |
       ---------------------------------------------------------------
       914 Electra                   323 Delmar             543 Briana  
            |                             |                     |
       ----------             --------------------------   ------------ 
       362 Hunter             813 Genevieve  168 Calista   587 Fabriane
            |                                    |
-----------------------                ----------------------
777 Lisette  119 Khalil                893 Inari  611 Ainslie
                                           |
                                  ---------------------         
                                  326 Nissa  809 Marlon
INSERT INTO employee VALUES ( 636, 636, 'Jordan',   1000000.00 );
INSERT INTO employee VALUES ( 543, 636, 'Briana',    910000.00 );
INSERT INTO employee VALUES ( 323, 636, 'Delmar',    920000.00 );
INSERT INTO employee VALUES ( 914, 636, 'Electra',   930000.00 );
INSERT INTO employee VALUES ( 587, 543, 'Fabriane',  750000.00 );
INSERT INTO employee VALUES ( 168, 323, 'Calista',   810000.00 );
INSERT INTO employee VALUES ( 813, 323, 'Genevieve', 820000.00 );
INSERT INTO employee VALUES ( 362, 914, 'Hunter',    800000.00 );
INSERT INTO employee VALUES ( 611, 168, 'Ainslie',   510000.00 );
INSERT INTO employee VALUES ( 893, 168, 'Inari',     520000.00 );
INSERT INTO employee VALUES ( 119, 362, 'Khalil',    210000.00 );
INSERT INTO employee VALUES ( 777, 362, 'Lisette',   220000.00 );
INSERT INTO employee VALUES ( 809, 893, 'Marlon',    110000.00 );
INSERT INTO employee VALUES ( 326, 893, 'Nissa',     120000.00 );
COMMIT;
The breadth-first traversal query uses
  • employee.salary instead of employee.name,

  • REPEAT() and RIGHT() functions to zero-pad the salary values for string sorting, and

  • lineage DESC in the ORDER BY:
WITH RECURSIVE breadth_first_traversal
   ( level,
     lineage,
     employee_id,
     manager_id,
     name )
AS ( SELECT CAST ( 1 AS INTEGER )          AS level,
            CAST ( RIGHT ( STRING ( REPEAT ( '0', 20 ), employee.salary ), 20 ) 
                   AS LONG VARCHAR )       AS lineage,
            employee.employee_id           AS employee_id,
            employee.manager_id            AS manager_id,
            employee.name                  AS name
       FROM employee
      WHERE employee.employee_id = employee.manager_id
     UNION ALL
     SELECT breadth_first_traversal.level + 1,
            STRING ( breadth_first_traversal.lineage,
                     '-',
                     RIGHT ( STRING ( REPEAT ( '0', 20 ), employee.salary ), 20 ) ),
            employee.employee_id,
            employee.manager_id,
            employee.name
       FROM breadth_first_traversal
            INNER JOIN employee
                    ON employee.manager_id = breadth_first_traversal.employee_id
      WHERE employee.manager_id <> employee.employee_id )
SELECT employee_id,
       level,
       lineage
  FROM breadth_first_traversal
 ORDER BY level,
       lineage DESC;
Here's the breadth-first output:
636 Jordan
                          |
       ---------------------------------------------------------------
       914 Electra                   323 Delmar             543 Briana  
            |                             |                     |
       ----------             --------------------------   ------------ 
       362 Hunter             813 Genevieve  168 Calista   587 Fabriane
            |                                    |
-----------------------                ----------------------
777 Lisette  119 Khalil                893 Inari  611 Ainslie
                                           |
                                  ---------------------         
                                  326 Nissa  809 Marlon

employee_id, level, lineage
636,         1,     00000000001000000.00
914,         2,     00000000001000000.00-00000000000930000.00
323,         2,     00000000001000000.00-00000000000920000.00
543,         2,     00000000001000000.00-00000000000910000.00
362,         3,     00000000001000000.00-00000000000930000.00-00000000000800000.00
813,         3,     00000000001000000.00-00000000000920000.00-00000000000820000.00
168,         3,     00000000001000000.00-00000000000920000.00-00000000000810000.00
587,         3,     00000000001000000.00-00000000000910000.00-00000000000750000.00
777,         4,     00000000001000000.00-00000000000930000.00-00000000000800000.00-00000000000220000.00
119,         4,     00000000001000000.00-00000000000930000.00-00000000000800000.00-00000000000210000.00
893,         4,     00000000001000000.00-00000000000920000.00-00000000000810000.00-00000000000520000.00
611,         4,     00000000001000000.00-00000000000920000.00-00000000000810000.00-00000000000510000.00
326,         5,     00000000001000000.00-00000000000920000.00-00000000000810000.00-00000000000520000.00-00000000000120000.00
809,         5,     00000000001000000.00-00000000000920000.00-00000000000810000.00-00000000000520000.00-00000000000110000.00

CREATE FUNCTION invert_salary

Just like the invert_name() function was required for the earlier depth-first query, a similar invert_salary() function is required here... but the code's a lot simpler:
CREATE FUNCTION invert_salary ( IN input_salary NUMERIC ( 20, 2 ) )
RETURNS NUMERIC ( 20, 2 )
BEGIN

RETURN 999999999999999999.99 - input_salary;

END;
Here's the depth-first query using the invert_salary() function:
WITH RECURSIVE depth_first_traversal
   ( level,
     lineage,
     employee_id,
     manager_id,
     name )
AS ( SELECT CAST ( 1 AS INTEGER )          AS level,
            CAST ( RIGHT ( STRING ( REPEAT ( '0', 20 ), invert_salary ( employee.salary ) ), 20 ) 
                   AS LONG VARCHAR )       AS lineage,
            employee.employee_id           AS employee_id,
            employee.manager_id            AS manager_id,
            employee.name                  AS name
       FROM employee
      WHERE employee.employee_id = employee.manager_id
     UNION ALL
     SELECT depth_first_traversal.level + 1,
            STRING ( depth_first_traversal.lineage,
                     '-',
                     RIGHT ( STRING ( REPEAT ( '0', 20 ), invert_salary ( employee.salary ) ), 20 ) ),
            employee.employee_id,
            employee.manager_id,
            employee.name
       FROM depth_first_traversal
            INNER JOIN employee
                    ON employee.manager_id = depth_first_traversal.employee_id
      WHERE employee.manager_id <> employee.employee_id )
SELECT employee_id,
       level,
       name,
       lineage 
  FROM depth_first_traversal
 ORDER BY lineage;
Here's the depth-first ouptut using invert_salary():
636 Jordan
                          |
       ---------------------------------------------------------------
       914 Electra                   323 Delmar             543 Briana  
            |                             |                     |
       ----------             --------------------------   ------------ 
       362 Hunter             813 Genevieve  168 Calista   587 Fabriane
            |                                    |
-----------------------                ----------------------
777 Lisette  119 Khalil                893 Inari  611 Ainslie
                                           |
                                  ---------------------         
                                  326 Nissa  809 Marlon

employee_id, level, name,        lineage
636,         1,     'Jordan',    99999999998999999.99
914,         2,     'Electra',   99999999998999999.99-99999999999069999.99
362,         3,     'Hunter',    99999999998999999.99-99999999999069999.99-99999999999199999.99
777,         4,     'Lisette',   99999999998999999.99-99999999999069999.99-99999999999199999.99-99999999999779999.99
119,         4,     'Khalil',    99999999998999999.99-99999999999069999.99-99999999999199999.99-99999999999789999.99
323,         2,     'Delmar',    99999999998999999.99-99999999999079999.99
813,         3,     'Genevieve', 99999999998999999.99-99999999999079999.99-99999999999179999.99
168,         3,     'Calista',   99999999998999999.99-99999999999079999.99-99999999999189999.99
893,         4,     'Inari',     99999999998999999.99-99999999999079999.99-99999999999189999.99-99999999999479999.99
326,         5,     'Nissa',     99999999998999999.99-99999999999079999.99-99999999999189999.99-99999999999479999.99-99999999999879999.99
809,         5,     'Marlon',    99999999998999999.99-99999999999079999.99-99999999999189999.99-99999999999479999.99-99999999999889999.99
611,         4,     'Ainslie',   99999999998999999.99-99999999999079999.99-99999999999189999.99-99999999999489999.99
543,         2,     'Briana',    99999999998999999.99-99999999999089999.99
587,         3,     'Fabriane',  99999999998999999.99-99999999999089999.99-99999999999249999.99

1 comment:

Anonymous said...

Hi Breck, I surely do know where you got this question from:)

I thank you for your efforts to present a working method to number child nodes in an "ad lib" manner.

However, I still hope the query processing experts are willing to re-enable ROW_NUMBER() for this task.

I guess you will agree that it's far easier to add an fitting "ROW_NUMBER() OVER (ORDER BY WhateveryouLike [ASC|DESC])" clause than to write an UDF for that particular need.


Best regards

Volker Is Still Hoping, As Usual