Friday, April 15, 2011

Performance Versus Perfection

Sometimes you have to accept a different result set to get a query to perform adequately... sometimes, it's not enough to just add indexes or rely on the query optimizer. Here is the story of one such query...



In Foxhound, a "sample set" consists of a single snapshot of SQL Anywhere statistical properties for a target database: one set of server- and database-level properties for that target, plus one set of connection-level properties for each connection to that target. The sample set is stored in three tables with rroad_sample_set being the parent table:

-- DBA.rroad_sample_set (table_id 759) in foxhound1 - Apr 14 2011 7:32:37AM - Print - Foxhound © 2011 RisingRoad

CREATE TABLE DBA.rroad_sample_set ( -- 3,709,758 rows, 425M total = 285M table + 0 ext + 140M index, 121 bytes per row
sampling_id /* X */ UNSIGNED INT NOT NULL,
sample_set_number /* PK */ UNSIGNED BIGINT NOT NULL DEFAULT autoincrement,
sample_lost VARCHAR ( 1 ) NOT NULL DEFAULT 'N',
connected_ok VARCHAR ( 1 ) NOT NULL DEFAULT 'Y',
exception_description VARCHAR ( 32767 ) NOT NULL DEFAULT '',
sample_elapsed_msec BIGINT NOT NULL COMPUTE ( ... ),
sample_started_at /* X */ TIMESTAMP NOT NULL DEFAULT '1900-01-01',
sample_finished_at TIMESTAMP NOT NULL DEFAULT '1900-01-01',
canarian_query_started_at TIMESTAMP NOT NULL DEFAULT '1900-01-01',
canarian_query_finished_at TIMESTAMP NOT NULL DEFAULT '1900-01-01',
datediff_msec_between_target_and_local BIGINT NOT NULL DEFAULT 0,
CONSTRAINT ASA254 PRIMARY KEY ( -- 52.8M
sample_set_number )
);

CREATE CLUSTERED INDEX ix_sample_started_at ON DBA.rroad_sample_set ( -- 45M
sample_started_at );

CREATE INDEX ix_sampling_id ON DBA.rroad_sample_set ( -- 42.6M
sampling_id );

For the purposes of this article, here are the important columns:
  • sampling_id identifies which target database the sample applies to,

  • sample_set_number is the autoincrementing primary key which uniquely identifies the sample set across all samples for all targets,

  • sample_started_at records the timestamp when Foxhound started the recording process, and

  • sample_finished_at records when Foxhound finished recording the sample.
The sample_set_number and sample_started_at columns are both candidates for CLUSTERED indexes, but previous query optimization efforts (involving different queries elsewhere in Foxhound) indicated that a CLUSTERED index on sample_started_at was the more useful of the two.

Sadly, many important queries use sample_finished_at because that's the natural order of samples from the application's point of view; most samples take very little time to record, but some take longer and it is the time the sample finished at that is important. That was the case with the following Query From Hell which over 16 seconds to return a single value in one particular test (and sometimes required over a minute).

Version 1: 16 seconds


This query is used by the Foxhound Monitor History display to scroll backwards by one week (sample_finished_at <= DATEADD(week,-1,CURRENT TIMESTAMP) in the samples gathered for one particular target database (sampling_id = 8 in this case):

select top 1 rroad_sample_set.sample_set_number
from rroad_sample_set
where rroad_sample_set.sampling_id = 8
and rroad_sample_set.sample_finished_at
<= DATEADD(week,-1,CURRENT TIMESTAMP)
order by rroad_sample_set.sample_set_number desc;

An earlier optimization had already changed the query from ORDER BY sample_finished_at (which was functionally correct) to ORDER BY sample_set_number because there's an index on sample_set_number but no index on sample_finished_at. That change sometimes (rarely?) resulted in a slightly different return value, but because it was so very slow the decision was made to live with that possibility... after all, scrolling up and down isn't quite as important as calculating a bank balance [or so went the rationalization, more of which comes later].

The graphical plan showed that yes indeed, the query optimizer chose the primary key index on sample_set_number:



Version 2: 15 seconds


The next attempt at optimization changed the WHERE clause to use sample_started_at instead of sample_finished_at because there's an index on the former but not the latter.

select top 1 rroad_sample_set.sample_set_number
from rroad_sample_set
where rroad_sample_set.sampling_id = 8
and rroad_sample_set.sample_started_at
<= DATEADD(week,-1,CURRENT TIMESTAMP)
order by rroad_sample_set.sample_set_number desc;

This turned out to be a disappointment, probably because SQL Anywhere only uses one index per table and it was already using the index on sample_set_number. The graphical plan showed that the existence of a CLUSTERED index on sample_started_at, plus the fact the WHERE clause was using a range query (sample_started_at <= DATEADD...), wasn't enough to get the query optimizer to switch; it still used the primary key index on sample_set_number:



Version 3: 59 seconds


The next attempt, forcing SQL Anywhere to use the CLUSTERED index on sample_started_at, turned out to be a disaster; instead of just being slow the query now moved with all the grace and speed of continental drift (four times slower):

select top 1 rroad_sample_set.sample_set_number
from rroad_sample_set

FORCE INDEX ( ix_sample_started_at )

where rroad_sample_set.sampling_id = 8
and rroad_sample_set.sample_started_at
<= DATEADD(week,-1,CURRENT TIMESTAMP)
order by rroad_sample_set.sample_set_number desc;

The graphical plan shows that SQL Anywhere did obey the FORCE INDEX clause but still took forever:



Version 4: 0.08 seconds


The final attempt was a test of the following proposition:
Perhaps the ORDER BY is more important than the range query in the WHERE clause.
The ORDER BY was changed to use sample_started_at instead of sample_set_number, and the FORCE INDEX was ripped out:

select top 1 rroad_sample_set.sample_set_number
from rroad_sample_set
where rroad_sample_set.sampling_id = 8
and rroad_sample_set.sample_started_at
<= DATEADD(week,-1,CURRENT TIMESTAMP)
order by rroad_sample_set.sample_started_at desc;

Success! The graphical plan showed the same index was used but this time the numbers were much lower; clearly it was more important to use the CLUSTERED index for the ORDER BY than for the range query:



Here's the moral of the story: Sometimes you have to consider trading off perfection for performance, especially if absolute correctness isn't required. In this case, using a query that is guaranteed to return the exact row that was created one week earlier isn't necessary, for two reasons:
  • the perfect result changes over time depending on when the query is run, and

  • in practice all versions of the query returned the same result when run at the same time.
In other words, the imperfect query wasn't just good enough, it actually returned the right result.

And the original perfect query was unusable... nobody's going to wait 16 seconds to scroll.

1 comment:

Unknown said...

Great article, Is a must for all those developers working with SQL.