Wednesday, May 30, 2012

Product Suggestion: INSERT SET


In the real world, many (almost all?) production INSERT statements look like this:

Syntax 1: INSERT table-name [ ( column-name, ... ) ] VALUES ( expression, ... )

   or

Syntax 2: INSERT table-name [ ( column-name, ... ) ] SELECT expression, ... FROM ...
In the real world, tables have dozens (hundreds?) of columns, and it is a real chore matching up the ( column-name ) list with the expression list.

It gets worse if you leave out the ( column-name ) list: you have to match the expression list up with the full CREATE TABLE column list, plus all the columns that have been added over the years with ALTER TABLE.

No such problem exists for the UPDATE statement; the column names and expressions are matched up in pairs:
UPDATE table-name SET column-name = expression, ...
SQL Anywhere needs the same thing:
Syntax 3: INSERT table-name SET column-name = expression, ...
That's the simple, real-world version of Syntax 3: INSERT SET.

For a more [ahem] sophisticated version of this suggestion, you can visit the Year 2002 in the Network News Tar Pit:
Newsgroups: sybase.public.sqlanywhere.product_futures_discussion
From: Breck Carter
Date: Sat, 13 Jul 2002 11:04:22 -0400
Local: Sat, Jul 13 2002 11:04 am
Subject: Insert Syntax 3

Insert statements involving long column name lists are difficult to 
write and maintain because the corresponding VALUES list (Insert 
Syntax 1) or select statement (Syntax 2) are coded separately. It's 
just too easy to get names and values in the wrong order and if the 
datatypes match there's no error message at runtime. 

The following "Insert Syntax 3" uses the more maintainable "SET column 
= value" syntax from the UPDATE statement: 

Insert Syntax 3 

INSERT [ INTO ] [ owner.]table-name SET set-item, ... 
[ FROM table-list ] 
[ WHERE search-condition ] 
[ ORDER BY expression [ ASC | DESC ], ... ] 

set-item : 
  column-name [.field-name...] = expression 
| column-name [.field-name...] = DEFAULT 
| column-name[.field-name...].method-name( [ expression ] ) 

Note: The special keyword "DEFAULT" should be allowed on the right 
side of a set-item. 

Note: The "@variable-name = expression" set-item is probably not 
necessary for INSERT. 

Insert Syntax 3 would be in addition to, NOT replacing, the current 
two formats: 

Insert Syntax 1 

INSERT [ INTO ] [ owner.]table-name [ ( column-name, ... ) ] 
VALUES ( expression | DEFAULT, ... ) 

Insert Syntax 2 

INSERT [ INTO ] [ owner.]table-name [ ( column-name, ... ) ] 
select-statement 

===== 

For the record, here's the corresponding Update syntax: 

Update Syntax 1 

UPDATE [ FIRST | TOP n ] table-list SET set-item, ... 
[ FROM table-list ] 
[ WHERE search-condition ] 
[ ORDER BY expression [ ASC | DESC ], ... ] 

set-item : 
  column-name [.field-name...] = expression 
| column-name[.field-name...].method-name( [ expression ] ) 
| @variable-name = expression 

==================================================================

Newsgroups: sybase.public.sqlanywhere.product_futures_discussion
From: "Richard Biffl"
Date: Sat, 13 Jul 2002 23:11:13 -0400
Local: Sat, Jul 13 2002 11:11 pm
Subject: Re: Insert Syntax 3

Allowing SET in an INSERT statement would make the statement more readable 
and maintainable -- though I'd probably stick with the old obscure standard 
form because it's standard -- but I don't see the function of the FROM, 
WHERE, and ORDER BY clauses to an INSERT statement, other than allowing your 
other, more controversial <g>, suggestion (INSERT OR UPDATE). 

==================================================================

Newsgroups: sybase.public.sqlanywhere.product_futures_discussion
From: "Paul Horan[TeamSybase]" 
Date: Sat, 13 Jul 2002 23:24:46 -0400
Local: Sat, Jul 13 2002 11:24 pm
Subject: Re: Insert Syntax 3

I think this borders on genius, personally...  Well done, Breck! 

You would need the FROM/WHERE/ORDER BY clauses because you're basically 
writing the SELECT statement from syntax 2, just splitting out the SELECT 
list from that statement into the separate SET statements. 

So, instead of 

INSERT into LongTable 
 (column list....) 
SELECT (expression list that hopefully lines up) 
  FROM OtherTable 
WHERE <condition> 
ORDER BY 1,2,3 

you would have: 

INSERT into LongTable 
  SET col1 = <previous SELECT's 1st expression>, 
  SET col2 = <previous SELECT's 2nd expression>, 
  SET col3 = DEFAULT, 
... 
FROM OtherTable 
WHERE <condition> 
ORDER BY 1,2,3 

Umm, yes please... 

Did he just say "genius"?


Dilbert.com



Monday, May 28, 2012

Latest SQL Anywhere EBFs: 11.0.1 Linux, SPARC, Windows

The three asterisks "***" show what's new since the previous list.

  • Only EBFs for the latest fully-supported versions of SQL Anywhere are shown here: 11.0.1 and 12.0.1.

  • Note: Just because an older version isn't "fully supported" doesn't mean you can't download files (or ask questions, or get help), it just means there won't be any more new EBFs released. For a complete list of everything available for download see Downloads - EBFs / Maintenance.

Current builds for the active platforms...

HP-UX Itanium    12.0.1.3523 EBF       15 Dec 2011
                 11.0.1.2753 EBF       13 Feb 2012

IBM AIX          12.0.1.3523 EBF       15 Dec 2011
                 11.0.1.2753 EBF       13 Feb 2012

Linux x86/x64    12.0.1.3723 EBF       04 May 2012
                 11.0.1.2811 EBF ***   17 May 2012 ***

Mac OS           12.0.1.3577 EBF       24 Feb 2012
                 11.0.1.2449 EBF       29 Jun 2010

Solaris SPARC    12.0.1.3574 EBF       23 Mar 2012
                 11.0.1.2811 EBF ***   14 May 2012 ***

Solaris x64      12.0.1.3523 EBF       15 Dec 2011
                 11.0.1.2753 EBF       13 Feb 2012

Windows x86/x64  12.0.1.3726 EBF       04 May 2012
                 11.0.1.2808 EBF ***   22 May 2012 ***
[Download EBFs here]


Friday, May 25, 2012

What's YOUR Klout score?



How did my Klout score get to be so high?

In case you were wondering,

Here's how I really feel about it...

Klout


Wednesday, May 23, 2012

Still More New Old Docs

Look up...

look way up...

OK, not that far


See the line that says "Technical Documents"?

That's a fixed link to the SQL Anywhere Technical Documents page on this blog.

Check it out...

  • The layout's been cleaned up to make the page easier to use,

  • Techwave presentations from 2001, 2002 and 2005 have been added (still looking for 2003 and 2004), and

  • the "authors" column has been filled in for a few more entries.
Behind the scenes, the all-important Google Custom Search Engine has been modified...
  • Instead of specifying one single url "http://sqlanywhere.blogspot.ca/2012/03/tales-from-doc-face.html" and expecting the Googlebot to take it from their, urls for all 472 documents have been specified.
I don't know if that will make any difference, but I'm willing to try anything to solve Google CSE's weird behavior; for example, search on "select" and you'll only get one or two hits:

Search on something else, like "OLAP", and it behaves much better.




Monday, May 21, 2012

Product Suggestion: Index Skip Key Processing

Here's simple query to display the MAX value of a column in each group:

SELECT g1, g2, g3, MAX ( t2 )
  FROM t
 GROUP BY g1, g2, g3 
 ORDER BY g1, g2, g3;
Here's what the table looks like; notice there's a UNIQUE index "ukey" on all the columns in the query:
CREATE TABLE t (
   g1           VARCHAR ( 12 ) NOT NULL,
   g2           VARCHAR ( 4 ) NOT NULL,
   g3           SMALLINT NOT NULL,
   t1           TIMESTAMP NOT NULL,
   t2           TIMESTAMP NOT NULL,
   other        VARCHAR ( 100 ) NOT NULL,
   CONSTRAINT pkey PRIMARY KEY ( g1, g2, g3, t1 ),
   CONSTRAINT ukey UNIQUE ( g1, g2, g3, t2 ),
   CONSTRAINT tchk CHECK ( t1 <= t2 ) );
In the world of IBM, that index might be used to efficiently satisfy the query by "skipping" from one group ( g1, g2, g3 ) to the next and plucking the MAX ( t2 ) value out of that group; after all, the index is sorted on t2 within each group ( g1, g2, g3 ).

Here an excerpt from the IBM docs on the subject, with the really interesting bits highlighted in bold:


IBM i 7.1 Information Center > Database > Performance and query optimization > Processing queries: Overview > Grouping optimization

Optimizing grouping by using index skip key processing

Index Skip Key processing can be used when grouping with the keyed sequence implementation algorithm which uses an existing index. It is a specialized version of ordered grouping that processes few records in each group rather than all records in each group.

The index skip key processing algorithm:

1. Uses the index to position to a group and

2. finds the first row matching the selection criteria for the group, and if specified the first non-null MIN or MAX value in the group

3. Returns the group to the user

4. "Skip" to the next group and repeat processing

This algorithm improves performance by potentially not processing all index key values for a group.

Index skip key processing can be used:

  • For single table queries using the keyed sequence grouping implementation when:

    • There are no column functions in the query, or

    • There is only a single MIN or MAX column function and the MIN or MAX operand is the next index key column after the grouping columns. There can be no other grouping functions in the query. For the MIN function, the key column must be an ascending key; for the MAX function, the key column must be a descending key. If the query is whole table grouping, the operand of the MIN or MAX must be the first key column.

      Example 1, using SQL:
      CREATE INDEX IX1 ON EMPLOYEE (SALARY DESC)
      
      DECLARE C1 CURSOR FOR
         SELECT MAX(SALARY) FROM EMPLOYEE;
      
      The query optimizer chooses to use the index IX1. The SLIC runtime code scans the index until it finds the first non-null value for SALARY. Assuming that SALARY is not null, the runtime code positions to the first index key and return that key value as the MAX of salary. No more index keys are processed.

      Example 2, using SQL:
      CREATE INDEX IX2 ON EMPLOYEE (WORKDEPT, JOB, SALARY)
      
      DECLARE C1 CURSOR FOR 
       SELECT WORKDEPT, MIN(SALARY)
        FROM EMPLOYEE
        WHERE JOB='CLERK'
        GROUP BY WORKDEPT
      
      The query optimizer chooses to use Index IX2. The database manager positions to the first group for DEPT where JOB equals 'CLERK' and returns the SALARY. The code then skips to the next DEPT group where JOB equals 'CLERK'.

Here's an excerpt that explains what "ordered grouping" means in the earlier phrase "specialized version of ordered grouping":

IBM i 7.1 Information Center > Database > Performance and query optimization > Processing queries: Overview > Grouping optimization

Index grouping implementation

There are two primary ways to implement grouping using an index: Ordered grouping and pre-summarized processing.

Ordered grouping

This implementation uses the Radix Index Scan or the Radix Index Probe access methods to perform the grouping. An index is required that contains all the grouping columns as contiguous leftmost key columns. The database manager accesses the individual groups through the index and performs the requested summary functions.

Since the index, by definition, already has all the key values grouped, the first group result can be returned in less time than the hashing method. This index performance is faster because the hashing method requires a temporary result. This implementation can be beneficial if an application does not need to retrieve all the group results, or if an index exists that matches the grouping columns.

When the grouping is implemented with an index and a permanent index does not exist that satisfies grouping columns, a temporary index is created. The grouping columns specified within the query are used as the key columns for this index.


Soooo, what does SQL Anywhere do with this query?

Welllll, as far as I can tell, it uses the Hash Group By algorithm together with a sequential scan of table t... all three million rows of table t.

Sometimes, if the table has tiny rows, it uses an eight-way parallel table scan (on a computer with eight processors), but it never does anything with the "ukey" index.

Here's a test you can run yourself; the combo table is used help INSERT sixteen groups of data spread across 3,000,000 rows in t. In other words, the SELECT returns only 16 rows:

CREATE TABLE combo (
   combo_number INTEGER NOT NULL PRIMARY KEY,
   g1           VARCHAR ( 12 ) NOT NULL,
   g2           VARCHAR ( 4 ) NOT NULL,
   g3           SMALLINT NOT NULL );

INSERT combo VALUES (  1, 'aaaaaaaaaa', 'bbbb', 1 );
INSERT combo VALUES (  2, 'bbbbbbbbbb', 'cccc', 2 );
INSERT combo VALUES (  3, 'cccccccccc', 'dddd', 1 );
INSERT combo VALUES (  4, 'dddddddddd', 'eeee', 2 );
INSERT combo VALUES (  5, 'eeeeeeeeee', 'ffff', 1 );
INSERT combo VALUES (  6, 'ffffffffff', 'gggg', 2 );
INSERT combo VALUES (  7, 'bbbbbbbbbb', 'bbbb', 1 );
INSERT combo VALUES (  8, 'cccccccccc', 'cccc', 2 );
INSERT combo VALUES (  9, 'gggggggggg', 'dddd', 1 );
INSERT combo VALUES ( 10, 'hhhhhhhhhh', 'eeee', 2 );
INSERT combo VALUES ( 11, 'iiiiiiiiii', 'ffff', 1 );
INSERT combo VALUES ( 12, 'bbbbbbbbbb', 'gggg', 2 );
INSERT combo VALUES ( 13, 'dddddddddd', 'bbbb', 1 );
INSERT combo VALUES ( 14, 'gggggggggg', 'cccc', 2 );
INSERT combo VALUES ( 15, 'jjjjjjjjjj', 'dddd', 1 );
INSERT combo VALUES ( 16, 'kkkkkkkkkk', 'eeee', 2 );
COMMIT;

BEGIN

DECLARE @insert_count BIGINT;
DECLARE @combo_number BIGINT;

SET @insert_count = 1;
SET @combo_number = 0;

WHILE @insert_count <= 3000000 LOOP 

   SET @combo_number = @combo_number + 1;
   IF @combo_number > 16 THEN
      SET @combo_number = 1;
   END IF;

   INSERT t
   SELECT combo.g1,
          combo.g2,
          combo.g3,
          DATEADD ( MINUTE, 10 * @insert_count, '1901-01-01 00:00' ),
          DATEADD ( MINUTE, 10 * @insert_count, '1901-01-01 00:05' ),
          REPEAT ( 'x', 100 ) 
     FROM combo
    WHERE combo.combo_number = @combo_number;

   IF MOD ( @insert_count, 1000 ) = 0 THEN
      COMMIT;
      MESSAGE STRING ( @insert_count, ' rows inserted' ) TO CLIENT;
   END IF;

   SET @insert_count = @insert_count + 1;

END LOOP;

COMMIT;
MESSAGE STRING ( 'All done' ) TO CLIENT;

END;

------------------------------------------------------
-- query 1

CALL sa_flush_cache();

SELECT g1, g2, g3, MAX ( t2 )
  FROM t
 GROUP BY g1, g2, g3 
 ORDER BY g1, g2, g3;

g1,g2,g3,MAX(t.t2)
'aaaaaaaaaa','bbbb',1,'1958-01-15 05:35:00.000'
'bbbbbbbbbb','bbbb',1,'1958-01-15 06:35:00.000'
'bbbbbbbbbb','cccc',2,'1958-01-15 05:45:00.000'
'bbbbbbbbbb','gggg',2,'1958-01-15 07:25:00.000'
'cccccccccc','cccc',2,'1958-01-15 06:45:00.000'
'cccccccccc','dddd',1,'1958-01-15 05:55:00.000'
'dddddddddd','bbbb',1,'1958-01-15 07:35:00.000'
'dddddddddd','eeee',2,'1958-01-15 06:05:00.000'
'eeeeeeeeee','ffff',1,'1958-01-15 06:15:00.000'
'ffffffffff','gggg',2,'1958-01-15 06:25:00.000'
'gggggggggg','cccc',2,'1958-01-15 07:45:00.000'
'gggggggggg','dddd',1,'1958-01-15 06:55:00.000'
'hhhhhhhhhh','eeee',2,'1958-01-15 07:05:00.000'
'iiiiiiiiii','ffff',1,'1958-01-15 07:15:00.000'
'jjjjjjjjjj','dddd',1,'1958-01-15 07:55:00.000'
'kkkkkkkkkk','eeee',2,'1958-01-15 08:05:00.000'
Query 1 took 14.7 seconds to execute... that's an average over 7 tests, with the cache flushed each time and the server restarted once.

Here's a glimpse into the plan:

Depending on your expectations, 14.7 seconds might not be bad, but a sequential table scan?

Really?

Surely one can do better... after all, there's a bloody index that covers all the columns in the query and it wasn't even considered:

Here's a clue...

Why wasn't the "ukey" index considered for query 1? Here's a clue from the SQL Anywhere 12 Help (with emphasis added):

Cost-based optimization with MIN and MAX functions

The min/max cost-based optimization is designed to exploit an existing index to compute efficiently the result of a simple aggregation query involving the MAX or MIN aggregate functions. The goal of this optimization is to be able to compute the result by retrieving only a few rows from the index. To be a candidate for this optimization, the query:

  • must not contain a GROUP BY clause

  • must be over a single table

  • must contain only a single aggregate function (MAX or MIN) in the query's SELECT-list

Taking that clue as a starting point, here's a version of query 1 that
  • separates the GROUP BY from the MAX into different derived tables gt and mt and

  • uses the LATERAL keyword to join them via the WHERE clause inside the second SELECT:
------------------------------------------------------
-- query 2

CALL sa_flush_cache();

SELECT gt.g1, gt.g2, gt.g3, mt.t2
  FROM ( 
         SELECT g1, g2, g3
           FROM t
          GROUP BY g1, g2, g3 
       ) AS gt,
       LATERAL
       (
         SELECT MAX ( t.t2 ) AS t2
             FROM t
            WHERE t.g1 = gt.g1
              AND t.g2 = gt.g2
              AND t.g3 = gt.g3
       ) AS mt
 ORDER BY gt.g1, gt.g2, gt.g3;
On average over six tests, query 2 took 13.9 seconds, which isn't much of an improvement over 14.7 seconds, probably within the margin of error for experiments like this.

The performance might be the same, but the plan sure has changed!

Woohoo! An Index Only Retrieval Scan!

The SELECT MAX() does a "Scan t using index ukey":

But wait... what's all that other stuff?

That other stuff is an eight-way Parallel Hash Group By, together with an eight-way Parallel Table Scan, used to compute the first derived table gt, the one that does the GROUP BY. The Index Only Retrieval Scan is used to compute the SELECT MAX ( t.t2 ) query.

Here's the view from the Foxhound Monitor showing the eight "INT: EXCHANGE" internal connections running all those parallel processes:

Which query is faster? GROUP BY or MAX?

If query 2 is split into separate SELECT statements, with a temporary table used to pass the GROUP BY results to the query doing doing the MAX(), it's the GROUP BY that takes most of the time:
------------------------------------------------------
-- query 3A - prepare for query 3B

CALL sa_flush_cache();

SELECT g1, g2, g3
  INTO #g
  FROM t
 GROUP BY g1, g2, g3; 

Execution time: 13.5 seconds

------------------------------------------------------
-- query 3B

CALL sa_flush_cache();

SELECT #g.g1, #g.g2, #g.g3, mt.t2
  FROM #g,
       LATERAL
       (
         SELECT MAX ( t.t2 ) AS t2
             FROM t
            WHERE t.g1 = #g.g1
              AND t.g2 = #g.g2
              AND t.g3 = #g.g3
       ) AS mt
 ORDER BY #g.g1, #g.g2, #g.g3;

Execution time: 0.4 seconds

So... in the (unlikely?) case that the GROUP BY result remains unchanged while the MAX() needs to be recalculated multiple times, it makes sense to jump through hoops to divide and conquer the problem with funky techniques like LATERAL.

In the real world, this is a job for Index Skip Key Processing, doncha think?


Friday, May 18, 2012

RUN anywhere

RUN anywhere is one of the slogans in this new TV commercial:



Yes, an actual SAP TV commercial, actually running in prime time on the Fox News Channel.

...and we all know what RUN anywhere means, don't we? nudge, nudge, wink, wink [ahem] SQL Anywhere [cough]


Wednesday, May 16, 2012

"Whaddaya mean, backups have failed for the last month?"

Oren Jacob and Galyn Susman tell the story of how all the Toy Story 2 files vanished:


The moral of the story?

Test your backups, of course.


Monday, May 14, 2012

Author, Author!

The "author" column has been added to list of SQL Anywhere technical documents found at Tales From The Doc Face.

...and now the bad news


It will take some time before the author column is completely filled in... right now, the authors are only shown for the freshly-added Techwave 2006 presentations.

Which brings us to the question, "Why weren't the authors included from the beginning?"

Answer: Sigh... just plain stupid, I guess... too much focus on the hard stuff (figuring out the publication dates and the SQL Anywhere versions), not enough on the easy task (capturing the authors).

What's the old saying?


Never enough time to do it right, always enough time to do it over?


Friday, May 11, 2012

From The Golden Age of Techwave

Do you remember 2007?

That's when outsiders were still invited to make presentations at Techwave.

And that's when 24 (twenty four!) separate technical presentations were made on the subject of SQL Anywhere ...actually, I think there were 25 but I can't find the second day of Glenn Paulley's SQL Anywhere Internals class.

Anyway, the 24 Techwave 2007 presentations that do exist on the web have been added to the list at Tales From The Doc Face.

Did you know?

Did you know that Tales From The Doc Face now lists 412 SQL Anywhere technical documents?

The "official list" of SQL Anywhere Technical Documents over at sybase.com only contains 312 entries, and that list includes all sorts of documents that have been left out of Tales From The Doc Face because they really aren't "technical documents": announcements, notices, bulletins, bug fixes, licensing details, certification reports, various listicles (components, requirements, platforms, links), and straight links to DCX topics.


Wednesday, May 9, 2012

Yet More New Old Docs: Techwave 2008

Eighteen SQL Anywhere presentations from Techwave 2008 have been added to the list at Tales From The Doc Face.

...and this time, they're all available online: none are missing, and none of them had to be copied so they can be indexed by Google.



Monday, May 7, 2012

Latest SQL Anywhere EBFs: Linux and Windows

The three asterisks "***" show what's new since the previous list.

  • Only EBFs for the latest fully-supported versions of SQL Anywhere are shown here: 11.0.1 and 12.0.1.

  • Note: Just because an older version isn't "fully supported" doesn't mean you can't download files (or ask questions, or get help), it just means there won't be any more new EBFs released. For a complete list of everything available for download see Downloads - EBFs / Maintenance.

Current builds for the active platforms...

HP-UX Itanium    12.0.1.3523 EBF       15 Dec 2011
                 11.0.1.2753 EBF       13 Feb 2012

IBM AIX          12.0.1.3523 EBF       15 Dec 2011
                 11.0.1.2753 EBF       13 Feb 2012

Linux x86/x64    12.0.1.3723 EBF ***   04 May 2012 ***
                 11.0.1.2794 EBF       18 Apr 2012

Mac OS           12.0.1.3577 EBF       24 Feb 2012
                 11.0.1.2449 EBF       29 Jun 2010

Solaris SPARC    12.0.1.3574 EBF       23 Mar 2012
                 11.0.1.2794 EBF       18 Apr 2012

Solaris x64      12.0.1.3523 EBF       15 Dec 2011
                 11.0.1.2753 EBF       13 Feb 2012

Windows x86/x64  12.0.1.3726 EBF ***   04 May 2012 ***
                 11.0.1.2803 EBF ***   03 May 2012 ***
[Download EBFs here]