Monday, May 21, 2012

Product Suggestion: Index Skip Key Processing

Here's simple query to display the MAX value of a column in each group:

SELECT g1, g2, g3, MAX ( t2 )
  FROM t
 GROUP BY g1, g2, g3 
 ORDER BY g1, g2, g3;
Here's what the table looks like; notice there's a UNIQUE index "ukey" on all the columns in the query:
CREATE TABLE t (
   g1           VARCHAR ( 12 ) NOT NULL,
   g2           VARCHAR ( 4 ) NOT NULL,
   g3           SMALLINT NOT NULL,
   t1           TIMESTAMP NOT NULL,
   t2           TIMESTAMP NOT NULL,
   other        VARCHAR ( 100 ) NOT NULL,
   CONSTRAINT pkey PRIMARY KEY ( g1, g2, g3, t1 ),
   CONSTRAINT ukey UNIQUE ( g1, g2, g3, t2 ),
   CONSTRAINT tchk CHECK ( t1 <= t2 ) );
In the world of IBM, that index might be used to efficiently satisfy the query by "skipping" from one group ( g1, g2, g3 ) to the next and plucking the MAX ( t2 ) value out of that group; after all, the index is sorted on t2 within each group ( g1, g2, g3 ).

Here an excerpt from the IBM docs on the subject, with the really interesting bits highlighted in bold:


IBM i 7.1 Information Center > Database > Performance and query optimization > Processing queries: Overview > Grouping optimization

Optimizing grouping by using index skip key processing

Index Skip Key processing can be used when grouping with the keyed sequence implementation algorithm which uses an existing index. It is a specialized version of ordered grouping that processes few records in each group rather than all records in each group.

The index skip key processing algorithm:

1. Uses the index to position to a group and

2. finds the first row matching the selection criteria for the group, and if specified the first non-null MIN or MAX value in the group

3. Returns the group to the user

4. "Skip" to the next group and repeat processing

This algorithm improves performance by potentially not processing all index key values for a group.

Index skip key processing can be used:

  • For single table queries using the keyed sequence grouping implementation when:

    • There are no column functions in the query, or

    • There is only a single MIN or MAX column function and the MIN or MAX operand is the next index key column after the grouping columns. There can be no other grouping functions in the query. For the MIN function, the key column must be an ascending key; for the MAX function, the key column must be a descending key. If the query is whole table grouping, the operand of the MIN or MAX must be the first key column.

      Example 1, using SQL:
      CREATE INDEX IX1 ON EMPLOYEE (SALARY DESC)
      
      DECLARE C1 CURSOR FOR
         SELECT MAX(SALARY) FROM EMPLOYEE;
      
      The query optimizer chooses to use the index IX1. The SLIC runtime code scans the index until it finds the first non-null value for SALARY. Assuming that SALARY is not null, the runtime code positions to the first index key and return that key value as the MAX of salary. No more index keys are processed.

      Example 2, using SQL:
      CREATE INDEX IX2 ON EMPLOYEE (WORKDEPT, JOB, SALARY)
      
      DECLARE C1 CURSOR FOR 
       SELECT WORKDEPT, MIN(SALARY)
        FROM EMPLOYEE
        WHERE JOB='CLERK'
        GROUP BY WORKDEPT
      
      The query optimizer chooses to use Index IX2. The database manager positions to the first group for DEPT where JOB equals 'CLERK' and returns the SALARY. The code then skips to the next DEPT group where JOB equals 'CLERK'.

Here's an excerpt that explains what "ordered grouping" means in the earlier phrase "specialized version of ordered grouping":

IBM i 7.1 Information Center > Database > Performance and query optimization > Processing queries: Overview > Grouping optimization

Index grouping implementation

There are two primary ways to implement grouping using an index: Ordered grouping and pre-summarized processing.

Ordered grouping

This implementation uses the Radix Index Scan or the Radix Index Probe access methods to perform the grouping. An index is required that contains all the grouping columns as contiguous leftmost key columns. The database manager accesses the individual groups through the index and performs the requested summary functions.

Since the index, by definition, already has all the key values grouped, the first group result can be returned in less time than the hashing method. This index performance is faster because the hashing method requires a temporary result. This implementation can be beneficial if an application does not need to retrieve all the group results, or if an index exists that matches the grouping columns.

When the grouping is implemented with an index and a permanent index does not exist that satisfies grouping columns, a temporary index is created. The grouping columns specified within the query are used as the key columns for this index.


Soooo, what does SQL Anywhere do with this query?

Welllll, as far as I can tell, it uses the Hash Group By algorithm together with a sequential scan of table t... all three million rows of table t.

Sometimes, if the table has tiny rows, it uses an eight-way parallel table scan (on a computer with eight processors), but it never does anything with the "ukey" index.

Here's a test you can run yourself; the combo table is used help INSERT sixteen groups of data spread across 3,000,000 rows in t. In other words, the SELECT returns only 16 rows:

CREATE TABLE combo (
   combo_number INTEGER NOT NULL PRIMARY KEY,
   g1           VARCHAR ( 12 ) NOT NULL,
   g2           VARCHAR ( 4 ) NOT NULL,
   g3           SMALLINT NOT NULL );

INSERT combo VALUES (  1, 'aaaaaaaaaa', 'bbbb', 1 );
INSERT combo VALUES (  2, 'bbbbbbbbbb', 'cccc', 2 );
INSERT combo VALUES (  3, 'cccccccccc', 'dddd', 1 );
INSERT combo VALUES (  4, 'dddddddddd', 'eeee', 2 );
INSERT combo VALUES (  5, 'eeeeeeeeee', 'ffff', 1 );
INSERT combo VALUES (  6, 'ffffffffff', 'gggg', 2 );
INSERT combo VALUES (  7, 'bbbbbbbbbb', 'bbbb', 1 );
INSERT combo VALUES (  8, 'cccccccccc', 'cccc', 2 );
INSERT combo VALUES (  9, 'gggggggggg', 'dddd', 1 );
INSERT combo VALUES ( 10, 'hhhhhhhhhh', 'eeee', 2 );
INSERT combo VALUES ( 11, 'iiiiiiiiii', 'ffff', 1 );
INSERT combo VALUES ( 12, 'bbbbbbbbbb', 'gggg', 2 );
INSERT combo VALUES ( 13, 'dddddddddd', 'bbbb', 1 );
INSERT combo VALUES ( 14, 'gggggggggg', 'cccc', 2 );
INSERT combo VALUES ( 15, 'jjjjjjjjjj', 'dddd', 1 );
INSERT combo VALUES ( 16, 'kkkkkkkkkk', 'eeee', 2 );
COMMIT;

BEGIN

DECLARE @insert_count BIGINT;
DECLARE @combo_number BIGINT;

SET @insert_count = 1;
SET @combo_number = 0;

WHILE @insert_count <= 3000000 LOOP 

   SET @combo_number = @combo_number + 1;
   IF @combo_number > 16 THEN
      SET @combo_number = 1;
   END IF;

   INSERT t
   SELECT combo.g1,
          combo.g2,
          combo.g3,
          DATEADD ( MINUTE, 10 * @insert_count, '1901-01-01 00:00' ),
          DATEADD ( MINUTE, 10 * @insert_count, '1901-01-01 00:05' ),
          REPEAT ( 'x', 100 ) 
     FROM combo
    WHERE combo.combo_number = @combo_number;

   IF MOD ( @insert_count, 1000 ) = 0 THEN
      COMMIT;
      MESSAGE STRING ( @insert_count, ' rows inserted' ) TO CLIENT;
   END IF;

   SET @insert_count = @insert_count + 1;

END LOOP;

COMMIT;
MESSAGE STRING ( 'All done' ) TO CLIENT;

END;

------------------------------------------------------
-- query 1

CALL sa_flush_cache();

SELECT g1, g2, g3, MAX ( t2 )
  FROM t
 GROUP BY g1, g2, g3 
 ORDER BY g1, g2, g3;

g1,g2,g3,MAX(t.t2)
'aaaaaaaaaa','bbbb',1,'1958-01-15 05:35:00.000'
'bbbbbbbbbb','bbbb',1,'1958-01-15 06:35:00.000'
'bbbbbbbbbb','cccc',2,'1958-01-15 05:45:00.000'
'bbbbbbbbbb','gggg',2,'1958-01-15 07:25:00.000'
'cccccccccc','cccc',2,'1958-01-15 06:45:00.000'
'cccccccccc','dddd',1,'1958-01-15 05:55:00.000'
'dddddddddd','bbbb',1,'1958-01-15 07:35:00.000'
'dddddddddd','eeee',2,'1958-01-15 06:05:00.000'
'eeeeeeeeee','ffff',1,'1958-01-15 06:15:00.000'
'ffffffffff','gggg',2,'1958-01-15 06:25:00.000'
'gggggggggg','cccc',2,'1958-01-15 07:45:00.000'
'gggggggggg','dddd',1,'1958-01-15 06:55:00.000'
'hhhhhhhhhh','eeee',2,'1958-01-15 07:05:00.000'
'iiiiiiiiii','ffff',1,'1958-01-15 07:15:00.000'
'jjjjjjjjjj','dddd',1,'1958-01-15 07:55:00.000'
'kkkkkkkkkk','eeee',2,'1958-01-15 08:05:00.000'
Query 1 took 14.7 seconds to execute... that's an average over 7 tests, with the cache flushed each time and the server restarted once.

Here's a glimpse into the plan:

Depending on your expectations, 14.7 seconds might not be bad, but a sequential table scan?

Really?

Surely one can do better... after all, there's a bloody index that covers all the columns in the query and it wasn't even considered:

Here's a clue...

Why wasn't the "ukey" index considered for query 1? Here's a clue from the SQL Anywhere 12 Help (with emphasis added):

Cost-based optimization with MIN and MAX functions

The min/max cost-based optimization is designed to exploit an existing index to compute efficiently the result of a simple aggregation query involving the MAX or MIN aggregate functions. The goal of this optimization is to be able to compute the result by retrieving only a few rows from the index. To be a candidate for this optimization, the query:

  • must not contain a GROUP BY clause

  • must be over a single table

  • must contain only a single aggregate function (MAX or MIN) in the query's SELECT-list

Taking that clue as a starting point, here's a version of query 1 that
  • separates the GROUP BY from the MAX into different derived tables gt and mt and

  • uses the LATERAL keyword to join them via the WHERE clause inside the second SELECT:
------------------------------------------------------
-- query 2

CALL sa_flush_cache();

SELECT gt.g1, gt.g2, gt.g3, mt.t2
  FROM ( 
         SELECT g1, g2, g3
           FROM t
          GROUP BY g1, g2, g3 
       ) AS gt,
       LATERAL
       (
         SELECT MAX ( t.t2 ) AS t2
             FROM t
            WHERE t.g1 = gt.g1
              AND t.g2 = gt.g2
              AND t.g3 = gt.g3
       ) AS mt
 ORDER BY gt.g1, gt.g2, gt.g3;
On average over six tests, query 2 took 13.9 seconds, which isn't much of an improvement over 14.7 seconds, probably within the margin of error for experiments like this.

The performance might be the same, but the plan sure has changed!

Woohoo! An Index Only Retrieval Scan!

The SELECT MAX() does a "Scan t using index ukey":

But wait... what's all that other stuff?

That other stuff is an eight-way Parallel Hash Group By, together with an eight-way Parallel Table Scan, used to compute the first derived table gt, the one that does the GROUP BY. The Index Only Retrieval Scan is used to compute the SELECT MAX ( t.t2 ) query.

Here's the view from the Foxhound Monitor showing the eight "INT: EXCHANGE" internal connections running all those parallel processes:

Which query is faster? GROUP BY or MAX?

If query 2 is split into separate SELECT statements, with a temporary table used to pass the GROUP BY results to the query doing doing the MAX(), it's the GROUP BY that takes most of the time:
------------------------------------------------------
-- query 3A - prepare for query 3B

CALL sa_flush_cache();

SELECT g1, g2, g3
  INTO #g
  FROM t
 GROUP BY g1, g2, g3; 

Execution time: 13.5 seconds

------------------------------------------------------
-- query 3B

CALL sa_flush_cache();

SELECT #g.g1, #g.g2, #g.g3, mt.t2
  FROM #g,
       LATERAL
       (
         SELECT MAX ( t.t2 ) AS t2
             FROM t
            WHERE t.g1 = #g.g1
              AND t.g2 = #g.g2
              AND t.g3 = #g.g3
       ) AS mt
 ORDER BY #g.g1, #g.g2, #g.g3;

Execution time: 0.4 seconds

So... in the (unlikely?) case that the GROUP BY result remains unchanged while the MAX() needs to be recalculated multiple times, it makes sense to jump through hoops to divide and conquer the problem with funky techniques like LATERAL.

In the real world, this is a job for Index Skip Key Processing, doncha think?


No comments: