Monday, May 2, 2011

Misunderstanding SQL Anywhere

Here are five of my favorite misunderstandings about SQL Anywhere...



1. Index scans are always faster than table scans.

Not true... index scans are often faster, but not "always", not even "usually". Here's what the Help says about table scans:
"Sequential table scans are used when it is likely that most of the table pages have a row that match the query's search condition or a suitable index is not defined. Although sequential table scans may read more pages than index scans, the disk I/O can be substantially less expensive because the pages are read in contiguous blocks from the disk (this performance improvement is best if the database file is not fragmented on the disk). Sequential I/O reduces disk head movement and rotational latency. For large tables, sequential table scans also read groups of several pages at a time. This further reduces the cost of sequential table scans relative to index scans."
Table access is not limited to "index scans versus table scans"; there are actually 8 different table access methods.



2. Performance is affected by your SQL coding style.

In a complex query there is an almost infinite variety of ways you can code the SQL to obtain exactly the same result set and exactly the same execution plan. Unlike most programming languages SQL syntax is very loosely coupled to the underlying semantics, making SQL clauses more like spreadsheet cells where the order and arrangement of the cells make little difference to the underlying calculations.

For an introduction to how SQL Anywhere transforms your query, see Query processing phases, in particular the 12 different semantic query transformations described here.



3. SQL Anywhere writes your data to the database when it does a commit.

A commit guarantees your changes are permanent, but it doesn't guarantee they've been written to the physical database file. In fact, no high-performance database management system makes that guarantee because for a busy database with many connections executing many commits it would crush performance by beating the disk drives to death.

Here are some snippets from Chapter 9: Protecting in SQL Anywhere Studio 9 Developer's Guide:
During execution, database data may be stored in seven different locations:
  • Cache in memory

  • Database files on disk, including other dbspace file

  • Transaction log file on disk

  • Optional mirror log file on disk

  • Checkpoint log in the database file

  • Rollback log in the cache and database file

  • Temporary file on disk
The cache contains table and index pages that have been read from the database files into memory. The cache is always up to date; to be specific, logical database changes made by SQL statements are immediately reflected in the table and index pages in the cache, including both committed and uncommitted changes.

...there is no direct correspondence between COMMIT operations and data in the physical database files; uncommitted changes may be written to the database files while committed changes may not be written immediately.

The database files are up to date after a checkpoint; to be specific, all table and index pages in the physical database files are guaranteed to be up to date with respect to data in the cache whenever a checkpoint is complete.

The transaction log file, also known as the redo log, contains a sequential record of logical changes made to the database since this log file was created. Both committed and uncommitted changes may be written to the physical transaction log file, and they may be written before a COMMIT is executed on the corresponding connection. COMMIT and CHECKPOINT operations are recorded in the transaction log as well as other changes.

The transaction log file is up to date after a COMMIT; to be specific, the changes made by a particular transaction are guaranteed to be written to the physical transaction log by the time control is returned to the connection after a COMMIT ending that transaction is successfully completed. In fact, all changes, committed or otherwise, are guaranteed to be written to the transaction log whenever a COMMIT is performed on any connection.



4. The transaction log is used for rollbacks.

Nope... the rollback log is used for rollbacks, the transaction log is used for forward recovery.

Here's another excerpt from SQL Anywhere Studio 9 Developer's Guide:
A rollback log, also known as an undo log or row level undo log, is maintained for each connection to hold a sequential record of the reverse operations that would undo the logical changes made to the database by this connection since the last COMMIT or ROLLBACK was executed on the connection.

The rollback log pages are stored in the SYSTEM dbspace database file, together with table and data pages, and they are written to the physical database file before and during a checkpoint. They are not necessarily written to the physical database file when the logical database changes are first made by SQL statements. Note that rollback log pages are included in the checkpoint log.

The rollback log is up to date after a checkpoint; to be specific, all current rollback log pages in the physical database file are guaranteed to be up to date whenever a checkpoint is completed.

The data in a rollback log may include changes made before and after a checkpoint; there is no correspondence between a checkpoint and the end of any particular transaction.

The data in a rollback log is used during normal operation to roll back the changes when a ROLLBACK is executed on the connection, and it is then discarded. It is also discarded when a COMMIT is executed on the connection.



5. SQL Anywhere saves the execution plan when it compiles your SQL query, and uses the same plan every time it executes the query.

Some other database products save execution plans in the database, but not SQL Anywhere, not ever. What SQL Anywhere does do, sometimes, is temporarily save or "cache" the plan at runtime.

Here's what the Help says about the subject:
Normally, the optimizer selects an execution plan for a query every time the query is executed. Optimizing at execution time allows the optimizer to choose a plan based on current system state, and the values of current selectivity estimates and estimates based on the values of host variables. For queries that are executed frequently, the cost of query optimization can outweigh the benefits of optimizing at execution time. To reduce the cost of optimizing these statements repeatedly, the SQL Anywhere server considers caching plans... [more]


1 comment:

Anonymous said...

If you want to understand the role of the different logs (transaction/rollback/checkpont log), Breck's article is really a great summary.

Breck, thanks for collecting these topics (which I do tend to forget about, too). Where's the upvote button?

Best regards
Volker