Wednesday, January 15, 2014

Capturing and Storing Plans With GRAPHICAL_PLAN()

An earlier article about calling the GRAPHICAL_PLAN() procedure in SQL Anywhere showed an alternative to Database Tracing wherein you add custom code to your stored procedures to capture query plans. Both techniques (custom code and Database Tracing) share one very important advantage: capturing realistic production query plans in their natural habitat is better than struggling (and failing) to simulate production scenarios in ISQL.

When used together with the Procedure Profiling feature, the custom code approach lets you focus on queries you know are slow (because the Procedure Profiler says so) rather than wading through the massive amounts of output that Database Tracing produces.

Yes, you can tell the Database Tracing feature to filter out fast queries, but that reveals another advantage to the custom code approach: You can easily capture different plans for the same query; for example, one kind of plan when it's running slow-like-continental-drift, and another when it runs quickly. Some of the hardest problems to solve involve queries with changing behavior, and the custom code approach lets you capture both behaviors and compare them. With Database Tracing, unless you capture everything (all the fast plans as well as the slow), you'll only see the slow plans.

This article expands on the earlier technique by storing the plans returned by GRAPHICAL_PLAN() in a table rather than immediately writing them to *.saplan files for display in ISQL's Graphical Plan Viewer.

Here is a step-by-step demonstration using a stored procedure that was running very slowly, the new "Connection History" page being added to the next version of the Foxhound database monitor.



Step 1: Restart the target database.

This will ensure the performance tests will at least begin with a fresh (empty) RAM cache. It's good to take advantage of a large cache to improve performance, but sometimes you can't rely on data being in memory so testing with an empty cache is important.

Step 2: Turn on the procedure profiler.

Here's how to launch the SQL Anywhere 16 procedure profiler in Sybase Central:
  • Connect to the target database.

  • Click on the Mode menu item, then

  • click on Application Profiling.

  • The Application Profiling frame will appear at the bottom of the window.

  • Check Stored procedure, function, trigger, or event execution time, then

  • click Finish.

  • The Profiling Stored Procedures, Function, Triggers, and Events display will appear.
To see the profile results:
  • Run your tests,

  • click on View - Refresh All, then

  • look in the Profiling Results tab.
To stop profiling:
  • Look in the Profiling Stored Procedures, Function, Triggers, and Events display, then

  • click on the Stop Collecting Profiling Information on Selected Databases.

Step 3: Test the application.

It helps to have a script to follow so you can do before-and-after comparisons to prove the changes you make are actually improvements. In this example, the script contained 19 distinct user commands via the GUI: open the page, scroll down one row, scroll down 20 rows, and so on.

Step 4: Examine the procedure profiling results.

There's no easy way to do this if you're running a lot of large stored procedures; you have to open each procedure in Sybase Central's Profiling Results tab, then scroll . . . slowly . . . all . . . the . . . way . . . through the SQL code while watching carefully for hot spots.

Step 5: Identify which queries need attention.

Here's the slowest query in the test; according to procedure profiler it was executed 19 times for a total of 40 seconds:



Step 6. Insert "eyecatcher" comments in the code.

Line numbers change, so does performance; adding an identifying comment to each slow query lets you keep track of hot spots even if you don't get around to working on them right away.
--\\\\\ Test 1 line 220 (19 40472 30 220) -- Determine oldest and newest connection history samples.

SELECT TOP 1
       rroad_group_2_property_pivot.sample_set_number
  INTO @oldest_connection_history_sample_set_number
  FROM rroad_group_2_property_pivot
 WHERE rroad_group_2_property_pivot.connection_id_string = @connection_id_string
 ORDER BY rroad_group_2_property_pivot.sample_set_number ASC;

Step 7. Turn off the procedure profiler.

The next step will be to capture graphical plans and you don't want the extra overhead of the procedure profiler to affect the results (although this may be a moot point since the procedure profiler seems to be a lightweight process as far as performance is concerned).

Step 8. Create a table to store the plans.

CREATE TABLE captured_plan (
   plan_id                 INTEGER NOT NULL DEFAULT AUTOINCREMENT PRIMARY KEY,
   query_title             VARCHAR ( 200 ) NOT NULL,
   capture_started_at      TIMESTAMP NOT NULL,
   capture_duration_msec   BIGINT NOT NULL,
   plan                    LONG VARCHAR NOT NULL DEFAULT '' );

The plan_id column is an artificial primary key.

The query_title column uniquely identifies each SQL query being studied so multiple plans for the same query can be studied together.

The capture_started_at column keeps a record of when the test was run; this can be handy when you're purging old rows.

The capture_duration_msec gives a rough guess of how long the query ran... it is not accurate because it includes the time required to build the graphical plan XML, but in many cases it is helpful when looking for fast versus slow queries; e.g., 100 msec is faster than 10,000 msec regardless of overhead.

The plan column holds the GRAPHICAL_PLAN() return values.

Step 9. Prepare to call GRAPHICAL_PLAN().

Each block of code (stored procedure, trigger, event, etc) that's going to call GRAPHICAL_PLAN() needs these four local variables:
/* START TEMPORARY CODE */
DECLARE @plan_id               INTEGER;
DECLARE @query_title           VARCHAR ( 200 );
DECLARE @capture_started_at    TIMESTAMP;
DECLARE @plan                  LONG VARCHAR;
/* END TEMPORARY CODE */

Step 10. Add calls to GRAPHICAL_PLAN().

Here's a template to copy and paste (and then edit) ahead of each slow query:
/* START TEMPORARY CODE */
SET @query_title = '[TEXT TO UNIQUELY IDENTIFY THE SQL QUERY]';
IF ( SELECT COUNT(*) FROM captured_plan WHERE query_title = @query_title ) 
< [MAXIMUMN NUMBER OF PLANS TO CAPTURE] 
THEN
   SET @plan_id = GET_IDENTITY ( 'captured_plan' );
   SET @capture_started_at = CURRENT TIMESTAMP;
   SET @plan = GRAPHICAL_PLAN (
      '
      [COPY AND PASTE THE SQL QUERY HERE]
      ',
      2, -- Detailed statistics including node statistics
      'asensitive',
      'READ-ONLY' );
   INSERT captured_plan VALUES ( 
      @plan_id,  
      @query_title,  
      @capture_started_at,  
      DATEDIFF ( MILLISECOND, @capture_started_at, CURRENT TIMESTAMP ), -- capture_duration_msec 
      @plan );
END IF;
/* END TEMPORARY CODE */
The template contains three placeholders for you to edit:
  • Line 2: Replace [TEXT TO UNIQUELY IDENTIFY THE SQL QUERY] with a string you will recognize when browsing through dozens of entries in the captured_plan table.

  • Line 4: Replace [MAXIMUMN NUMBER OF PLANS TO CAPTURE] with an integer like 1 or 2 to put a limit on the number of plans in case the query is executed many times.

  • Line 10: Replace [COPY AND PASTE THE SQL QUERY HERE] with the actual query, then double-up any embedded single quotes, and make any other necessary changes as discussed in Capture Plans With GRAPHICAL_PLAN().
The call to GRAPHICAL_PLAN() is coded ahead of the INSERT statement, rather than being included in the INSERT VALUES list, so a value can be calculated for the capture_duration_msec column.

Here's what the template looked like after editing:

/* START TEMPORARY CODE */
SET @query_title = 'Test 1 line 220';
IF ( SELECT COUNT(*) FROM captured_plan WHERE query_title = @query_title ) 
< 2  
THEN
   SET @plan_id = GET_IDENTITY ( 'captured_plan' );
   SET @capture_started_at = CURRENT TIMESTAMP;
   SET @plan = GRAPHICAL_PLAN (
      '
SELECT TOP 1
       rroad_group_2_property_pivot.sample_set_number
  FROM rroad_group_2_property_pivot
 WHERE rroad_group_2_property_pivot.connection_id_string = @connection_id_string
 ORDER BY rroad_group_2_property_pivot.sample_set_number ASC;
      ',
      2, -- Detailed statistics including node statistics
      'asensitive',
      'READ-ONLY' );
   INSERT captured_plan VALUES ( 
      @plan_id,  
      @query_title,  
      @capture_started_at,  
      DATEDIFF ( MILLISECOND, @capture_started_at, CURRENT TIMESTAMP ), -- capture_duration_msec 
      @plan );
END IF;
/* END TEMPORARY CODE */

Step 11. Restart the target database.

Step 12. Re-run the test.

Step 13. Study the graphical plans.

In this example, calls to GRAPHICAL_PLAN() were inserted ahead of seven slow queries, resulting in 12 plans being captured:
SELECT plan_id,
       LEFT ( query_title, 18 ) AS query_title,
       capture_started_at,
       capture_duration_msec
  FROM captured_plan 
 ORDER BY query_title, 
       plan_id;

    plan_id query_title        capture_started_at      capture_duration_msec 
----------- ------------------ ----------------------- --------------------- 
         11 Test 1 line 1286   2014-01-14 11:59:17.359                  6422 
          3 Test 1 line 1576   2014-01-14 11:56:39.953                  2093 
          7 Test 1 line 1576   2014-01-14 11:57:03.062                  1891 
          2 Test 1 line 1589   2014-01-14 11:56:35.781                  2656 
          6 Test 1 line 1589   2014-01-14 11:56:59.375                  2171 
          4 Test 1 line 1604   2014-01-14 11:56:43.562                    31 
          8 Test 1 line 1604   2014-01-14 11:57:06.484                    16 
          9 Test 1 line 1632   2014-01-14 11:57:54.578                    47 
         10 Test 1 line 1632   2014-01-14 11:58:05.953                    31 
         12 Test 1 line 1660   2014-01-14 11:59:36.406                  1906 
          1 Test 1 line 220    2014-01-14 11:55:58.000                 35125 
          5 Test 1 line 220    2014-01-14 11:56:57.812                   938 
Here's the code that created *.saplan files for the two plans for the query at line 220; plan_id = 1 for the first (slow) execution, and plan_id = 5 for the second (fast) execution:
SET TEMPORARY OPTION allow_write_client_file = 'On';
SELECT WRITE_CLIENT_FILE (
          STRING (
             'c:\\temp\\plan for ',
             query_title,
             ', plan_id ',
             plan_id,
             ', ',
             capture_duration_msec,
             ' msec',
             '.saplan' ),
          plan )
  FROM captured_plan 
 WHERE plan_id IN ( 1, 5 );

The WRITE_CLIENT_FILE() function was used instead of xp_write_file() because the target database was running on a different computer, and the SET TEMPORARY OPTION allow_write_client_file = 'On' statement was used so WRITE_CLIENT_FILE() would actually work.

Here are the two plans; they're identical except for the run times which shows that the RAM cache affected performance but didn't change the plan:

Step 14. Propose improvements and perform experiments.

Every application has at least one "Table From Hell". Foxhound is no exception, so it came as no surprise that the slow query in this example involved that table:
CREATE TABLE rroad_group_2_property_pivot (
   sampling_id                       UNSIGNED INTEGER NOT NULL,
   sample_set_number                 UNSIGNED BIGINT NOT NULL,
   connection_number                 BIGINT NOT NULL,
   connection_id_string              VARCHAR ( 50 ) NOT NULL COMPUTE ( STRING (  
                                        sampling_id,
                                        '-',
                                        connection_number, 
                                        '-',
                                        DATEFORMAT ( LoginTime, 'YYYYMMDDHHNNSS-SSS' ) ) ),
   ...
   PRIMARY KEY ( sample_set_number, connection_number ) );

CREATE INDEX xconnection_id_string ON rroad_group_2_property_pivot ( connection_id_string );

This table often contains millions of rows and consumes gigabytes of disk space; in this example it "only" contained 397,858 rows in 270M of disk space. Indexes do exist on the columns referenced in both the WHERE clause and the ORDER BY...

SELECT TOP 1
       rroad_group_2_property_pivot.sample_set_number
  FROM rroad_group_2_property_pivot
 WHERE rroad_group_2_property_pivot.connection_id_string = '1-21-20140114132910-406' 
 ORDER BY rroad_group_2_property_pivot.sample_set_number ASC
...but alas, SQL Anywhere's decision to use the PRIMARY KEY index did not result in very good performance.

SQL Anywhere did consider both the PRIMARY KEY index and the secondary CREATE INDEX xconnection_id_string...

Index Scan
Scan rroad_group_2_property_pivot using index rroad_group_2_property_pivot (all rows)

Indexes considered
rroad_group_2_property_pivot
seq
xconnection_id_string - Selectivity 3.741826117%

...and while it's tempting to try a FORCE INDEX ( xconnection_id_string ) clause to make it use the other index, that should never be the first path chosen. Even if it did improve performance, today's excellent FORCE INDEX choice could create tomorrow's Query From Hell, and it's often (always?) better to let SQL Anywhere decide how to build your plans.

Instead of making SQL Anywhere choose between two indexes on two different columns, one alternative is to create a single index involving both columns. In this case, connection_id_string was chosen as the first column in the index since it's the one performing a selection (connection_id_string = '1-21-20140114132910-406'), and sample_set_number was placed second in the index because it's used to ORDER BY the rows returned by the selection.

In this case, rather than create another index, the existing CREATE INDEX was changed from this:

CREATE INDEX xconnection_id_string ON rroad_group_2_property_pivot ( connection_id_string );

to this composite index:

CREATE INDEX xconnection_id_string ON rroad_group_2_property_pivot ( connection_id_string, sample_set_number );

When the query was retested in ISQL after restarting the target database, the result was startling; from 35 seconds down to (essentially) zero seconds:

Before...                           After...
Index Scan                          Index Only Retrieval Scan
index rroad_group_2_property_pivot  index xconnection_id_string
FirstRowRunTime 34.846              FirstRowRunTime 5.6441e-005

An "Index Only" scan is the Holy Grail of performance tuning; no data is read from the table, just the index, and the results here prove why it's popular.

Step 15. Make changes to optimize the queries.

In this case, "make changes" means putting the new CREATE INDEX into the Foxhound build.

Step 16. Comment-out the calls to GRAPHICAL_PLAN().

In this example, the new index definition was likely to improve many of the slow queries, so it's worth performing "after" performance tests on everything. The calls to GRAPHICAL_PLAN() add a lot of overhead, so they have to be removed... but not completely removed since some of the queries may still be slow, and they may be involved in the next round of optimization (thus requiring the calls to GRAPHICAL_PLAN() to be added back).

Step 17. Restart the target database.

Step 18: Turn on the procedure profiler.

Step 19. Re-run the test.

Step 20: Examine the new procedure profiling results.

Here's the slow query from before, now located at line 253. As predicted by the ISQL test in Step 14 the procedure profiler now reports a total of only 75 msec was required to run all 19 executions rather than the 40,472 msec reported in Step 5:

Step 21: Verify improvements and look for new hot spots.

In this example, the new CREATE INDEX did indeed speed up several of the old, slow queries, but not all of them... so the process begins again at Step 5, and repeats until the Third Rule Of Program Optimization applies:
3. When it's fast enough, stop.
What are the first two rules of program optimization?
1. Don't do it.

2. (for experts only) Don't do it yet.

No comments: