Monday, February 18, 2008

Check Out Those Residual Predicates!

A "residual predicate" is SQL Anywhere's version of the pretty girl at the high school dance: You really want to dance with her but you're too afraid to ask. SQL Anywhere really wants to use that residual predicate to speed things up but it doesn't, for some reason or another.

What SQL Anywhere does do is show you residual predicates when it displays the Graphical Plan With Statistics. Not up top where it says "Table Scan" versus "Index Scan", mind you, but way down at the bottom of the display, after all the other counters and statistics. It's up to you to scroll down and have a look, and see if it's worth your while to make some change so SQL Anywhere can put that residual predicate to good use.

Here's an example of the Graphical Plan for a MobiLink download_cursor SELECT that took almost 5 minutes to run:

Even though that query selected no rows, it involves a join between the two largest tables in the database, 7 million rows and 3.8G of space according to the Foxhound "Curiosities" display:

The Graphical Plan shows that it did a primary key "Index Scan" for the sbx_event_raw_data table. So far, so good, at least it's not a sequential table scan.

...But wait a minute! Not so fast! This table took over half the execution time, that's not so good!

Let's keep looking... scroll down to the bottom of the same display and you see this "Residual predicate":

What it's saying is that the SELECT contained this predicate in the WHERE clause:
sbx_event_raw_data.modification_date > 2008-02-10 14:15:26.218000
and that this predicate was expected to be true 6.2696% of the time according to "Statistics" stored in the database.

According to Foxhound, a huge (67 megabyte) index already exists on the modification_history column:

So why didn't SQL Anywhere use that index? The answer lies in the number 6.2696 percent which is the "selectivity" of a predicate. Lower is better than higher when it comes to using an index on a predicate, and 6 percent selectivity is too high for SQL Anywhere to use an index for this particular predicate... it'll be worse than a sequential table scan because it will probably do the same thing (read every single page) but in random order.

Welllll, it turns out that the 6 percent figure is completely wrong: In reality this predicate is true exactly zero percent of the time because there are no rows with modification_date greater than February 10. You can't get any better than zero percent selectivity, so SQL Anywhere should have used that index.

The problem is, the statistics are wrong. The solution is, fix the statistics:
CREATE STATISTICS sbx_event_raw_data ( modification_date );
The CREATE STATISTICS statement took about 3 minutes to run. After that, the MobiLink download_cursor went from 5 minutes down to less than ten seconds.

And that's a pretty good deal, because this SELECT runs many times a day but the CREATE STATISTICS only needs to be run once in a while (or maybe just once).
Here's a disclaimer for my good friends at iAnywhere Solutions: This article is specifically about residual predicates, not database statistics. In a perfect world, database statistics would always be kept perfectly up to date, by SQL Anywhere, automatically... and CREATE STATISTICS would never be required. In the real world, SQL Anywhere is getting better and better with each new release, but sometimes the statistics get out of whack. And in this example, the client is not using the latest version of SQL Anywhere so "sometimes out of whack" is not so surprising. But again, that's not the point here... residual predicates is the point. Whenever you're looking at a plan, check out the residual predicates.

No comments: