SQL Anywhere: February 2008

Friday, February 29, 2008

Great Moments In History: Housing Bubble

It's the last day of the first month in the history of this blog...

So far there have been 3 kinds of postings here:

serious technical stuff like The Evolution of a SELECT,
public-service announcements like Techwave 2008 August 4 Las Vegas and
happy-fun-time entries like Oh No You Did Not! and the Great Moments In History series.

Well, this particular Great Moments entry isn't quite so easy to categorize: Yes, it is part happy-fun-time, at least if you're shopping for a house in south-west Florida. But it's also part serious technical stuff because it talks about the new FIRST_VALUE() and LAST_VALUE() aggregate functions in SQL Anywhere version 10.0.1.

Here are the specs for a real-world query:

Has the housing bubble finished bursting in south-west Florida?

Let's put that in more concrete terms:

Show me the price changes for houses currently on the market in south-west Florida.

Now let's have a look at the data that's actually available for this particular example: Every few days information on houses available for sale in a particular area has been gathered from emails and stored in a table. About 4,300 rows now exist, going back to 2005, covering about 1,300 separate houses. Some houses have only one row, some have several rows showing how the asking price has changed over the years.

Here is a query showing the history for one particular house:

SELECT email_date, street_address, zip_code, price
  FROM mls_entry
 WHERE street_address = '4207 Pelican Blvd'
 ORDER BY email_date;

The asking price has collapsed about 58% over the past 2 1/2 years... clearly the residential real estate bubble is bursting:

Here's what the base table looks like:

CREATE TABLE mls_entry ( -- 4,345 rows, 440k total
   pkey              INTEGER NOT NULL DEFAULT autoincrement,
   email_date        DATE NOT NULL,
   entry_line_number BIGINT NOT NULL DEFAULT 0,
   price             DECIMAL ( 11, 2 ) NOT NULL DEFAULT 0,
   mls_number        BIGINT NOT NULL DEFAULT 0,
   street_number     LONG VARCHAR NOT NULL DEFAULT '',
   street_address    LONG VARCHAR NOT NULL DEFAULT '',
   zip_code          VARCHAR ( 5 ) NOT NULL DEFAULT '',
   street_name       LONG VARCHAR NOT NULL 
      COMPUTE ( SUBSTR(street_address,LENGTH(street_number)+1) ),
   CONSTRAINT ASA74 PRIMARY KEY ( -- 56k
      pkey ) );

So let's go back and re-state the specs in terms of the mls_entry table:

Show the overall price changes for houses with two or more email entries, where the latest entry was received this year, and sort it by percentage price change.

What we want, for each street address, is a single row showing the price change between the earliest and latest entries. Not using the minimum and maximum prices, that would be too easy. We want to use the prices that correspond to the minimum and maximum email_date values. For example, for the house at 4207 Pelican Blvd it means reducing 16 mls_entry rows down to one row in the result.

Until version 10.0.1 there was no easy way to compute that using SQL Anywhere. Here is a query that works in earlier versions; it uses a join between two different derived tables, each with a correlated subselect, to gather data from the first and last rows for each address:

WITH price_change AS
   ( SELECT DISTINCT
            first_entry.street_name,
            first_entry.street_number,
            first_entry.zip_code,
            STRING 
               ( first_entry.street_address,
                 ' ',
                 first_entry.zip_code ) AS address,
            first_entry.email_date      AS first_date,
            last_entry.email_date       AS last_date,
            first_entry.price           AS first_price,
            last_entry.price            AS last_price
       FROM ( SELECT *
                FROM mls_entry AS F
               WHERE F.email_date
                   = ( SELECT MIN ( M.email_date )
                         FROM mls_entry AS M     
                         WHERE M.street_address       
                            = F.street_address
                          AND M.zip_code             
                            = F.zip_code ) ) 
            AS first_entry
            INNER JOIN 
            ( SELECT *
                FROM mls_entry AS L
               WHERE L.email_date
                   = ( SELECT MAX ( M.email_date )
                         FROM mls_entry AS M     
                        WHERE M.street_address       
                            = L.street_address
                          AND M.zip_code             
                            = L.zip_code ) )  
            AS last_entry
             ON first_entry.street_address 
              = last_entry.street_address
            AND first_entry.zip_code       
              = last_entry.zip_code )
SELECT address                        AS street_and_zip, 
       DATEDIFF ( DAY, first_date, last_date ) AS days,
       ( last_price - first_price )   AS change,
       CAST ( change / first_price  
          * 100.0 AS INTEGER )        AS "%",
       CAST ( last_price AS INTEGER ) AS price,
       last_date                      AS "date"
  FROM price_change
 WHERE last_date >= '2008-01-01'
   AND last_date <> first_date
 ORDER BY "%",
       zip_code,
       street_name,
       CAST ( street_number AS BIGINT );

The WITH price_change AS ( SELECT ... clause has been added to extend the "divide and conquer" process discussed back in The Evolution of a SELECT. You can think of the WITH clause as a local CREATE VIEW price_change, and here it simply separates the complex logic of gathering first-and-last rows from the final SELECT to do the calculations... like I said, divide and conquer to makes coding easier.

Here's the final result; you can see that a 58% price drop is not unique. In fact there are over 10 houses for which the asking price is half what it once was:

Of the 209 houses in the full result set, 200 show a decrease in asking price and only 2 show an increase, so the answer to our original question "has the bubble finished bursting?" is "probably not yet".

And now back to the new features: SQL Anywhere 9 introduced the OLAP WINDOW clause to let you create complex queries with less effort, and version 10.0.1 introduced two new aggregate functions to use with it: FIRST_VALUE() and LAST_VALUE(). Together these features completely eliminate the messy derived tables and correlated subselects shown earlier; here's how:

The following clause defines a window named date_window:

     WINDOW date_window AS ( 
               PARTITION BY street_address, zip_code
               ORDER BY email_date
               RANGE BETWEEN UNBOUNDED PRECEDING 
                         AND UNBOUNDED FOLLOWING ) )

Inside the WINDOW clause, the PARTITION BY breaks up the rows into separate groups or partitions, one per house address (street_address plus zip_code).

The ORDER BY then sorts the rows inside each partition, and the RANGE ... UNBOUNDED clause is OLAP-speak for "make the window the same size as the partition when applying aggregate functions to the window".

In other words, the WINDOW clause is like a GROUP BY on steroids: It doesn't just group (partition) rows on specified columns, but it preserves all the rows and all the columns so you can treat each partition like a miniature result set.

Once you've defined an ordered window, you can calculate the first and last prices for each house with these expressions in the SELECT list:

   FIRST_VALUE ( price ) OVER date_window
   LAST_VALUE ( price ) OVER date_window

Here's the whole query; it returns exactly the same results as shown above, and it's quite a bit simpler:

WITH price_change AS
   ( SELECT DISTINCT
            street_name,
            street_number,
            zip_code,
            STRING 
               ( mls_entry.street_address,
                 ' ',
                 mls_entry.zip_code ) AS address,
            FIRST_VALUE ( email_date )
               OVER date_window       AS first_date,
            LAST_VALUE ( email_date )
               OVER date_window       AS last_date,
            FIRST_VALUE ( price )
               OVER date_window       AS first_price,
            LAST_VALUE ( price )
               OVER date_window       AS last_price
       FROM mls_entry
     WINDOW date_window AS ( 
               PARTITION BY street_address, zip_code
               ORDER BY email_date
               RANGE BETWEEN UNBOUNDED PRECEDING 
                         AND UNBOUNDED FOLLOWING ) )
SELECT address                        AS street_and_zip, 
       DATEDIFF ( DAY, first_date, last_date ) AS days,
       ( last_price - first_price )   AS change,
       CAST ( change / first_price  
          * 100.0 AS INTEGER )        AS "%",
       CAST ( last_price AS INTEGER ) AS price,
       last_date                      AS "date"
  FROM price_change
 WHERE last_date >= '2008-01-01'
   AND last_date <> first_date
 ORDER BY "%",
       zip_code,
       street_name,
       CAST ( street_number AS BIGINT );

And guess what else? As cool as they are, FIRST_VALUE() and LAST_VALUE() are only two minor features among all the other OLAP features already in SQL Anywhere.

Thursday, February 28, 2008

Attend Techwave for Free

Present a session, attend the conference for free... the Tecwhave 2008 Call For Papers is now open, and it closes on Monday, March 24, 2008.

Tuesday, February 26, 2008

Great Moments In History: Y2K

Why does an elevator care what year it is?

Today, the polite answer is, "It doesn't."

The not-so-polite answer is "Are you nuts? That's like asking 'Why does a fish need a bicycle?'"

But back in the 1990's many people worried that not just elevators, but cars, planes, the electricity supply, the banking system were all going to stop working on January 1, 2000. The story went like this: Many old computer systems used two digits instead of four to hold year numbers, so "65" was interpreted as 1965. At the end of 1999 when "99" wrapped around to "00" the computers would think it was the year 1900 instead of 2000.

And stop working.

Yes, people really did expect that on New Years Day 2000 apartment elevators were going to wake up and say, "Hey, it's 1900, I haven't been invented yet, I'd better shut down!"

The problem was called "Y2K" or "The Millennium Bug", and for many years famous consultants roamed the land spreading fear and dread ...

"Saturday, January 1, 2000: Suddenly, nothing works. Not your phones, not the cash machine, not even your fancy new VCR ... Science fiction? Almost all computer experts agree, it's very, very possible.

Will your home PC be useless? Will your bank open? Will your money be there?! What about basic services? Electricity, water, mail, food delivery? Will medical devices work? Will social security checks arrive?"

... politicians were exhorted to "Do Something!", companies spent billions rewriting code, COBOL programmers came out of retirement, and ordinary folks stockpiled flashlights and cans of beans as The Day approached ...

"Today, in early September 1999, it's still not too late for an individual or a family to acquire a modest stockpile of candles, batteries, tuna fish, and rice; there are still water filters and generators, but it's getting harder to find wood stoves and other supplies." - Ed Yourdon, "The Y2K End Game" from www.yourdon.com/articles/y2kendgame.html retrieved on November 9, 1999

Well, not quite everyone went nuts.

Turns out many small businesses did nothing, and neither did anyone at all in huge countries like Russia. And when the January 1, 2000 arrived, and nothing bad happened to anyone, anywhere, folks realized that all the Y2K talk was just that, talk.

And then, of course, everyone stopped talking about Y2K... it was just too embarrassing to think about all the money that had been wasted.

Or all those beans in the basement.

Doomsayers did learn quite a lot from Y2K, however. For example, they learned not to pick a Doomsday scenario with an fixed expiry date like 2000-01-01.

It's better to pick a catastrophe that won't happen for a long time, preferably not in our lifetime... but still pick one that needs to be dealt with immediately, and (like Y2K) it needs to be very very expensive to fix.

Carbon credits, anyone?

Monday, February 25, 2008

The Evolution of a SELECT

Q: I've been using the following query from your book. I know what it does (shows the MobiLink synchronization status of all the remote databases), but can you explain how it works?

SELECT ml_user.name,
      ml_subscription.subscription_id,
      ml_subscription.progress,
      ml_subscription.publication_name,
      ml_subscription.last_upload_time,
      ml_subscription.last_download_time
 FROM ml_user
 JOIN ( SELECT a.user_id,
               a.subscription_id,
               a.progress,
               a.publication_name,
               a.last_upload_time,
               a.last_download_time
          FROM ml_subscription AS a
         WHERE a.progress = ( SELECT MAX ( b.progress )
                                FROM ml_subscription AS b
                               WHERE b.user_id = a.user_id
                                 AND b.publication_name = a.publication_name ) )
      AS ml_subscription
   ON ml_user.user_id = ml_subscription.user_id;

A: Funny you should ask about that query... in my dog-eared personal copy of the book I have written "too complex" next to it.

Here is an explanation of how it works, presented as a three step "evolution":

Simple but not quite correct
Correct but really too complex (the book version, above)
Simple and correct

Step 1: Simple But Not Correct

This following simple SELECT can be run against the MobiLink system tables in the consolidated database to show one row for each combination of remote database (ml_user.user_id) and publication (ml_subscription.publication_name):

SELECT ml_user.name,
      ml_subscription.subscription_id,
      ml_subscription.progress,
      ml_subscription.publication_name,
      ml_subscription.last_upload_time,
      ml_subscription.last_download_time
 FROM ml_user
 JOIN ml_subscription
   ON ml_user.user_id = ml_subscription.user_id;

That works just fine, at least initially. Here is the result for a single remote database that subscribes to a single publication, after the first synchronization round-trip has been run:

Here's the problem: When you DROP and re-CREATE the MobiLink subscription on the remote database, and re-synchronize, the MobiLink server inserts a new row in ml_subscription on the consolidated database without deleting the old row. Now the simple SELECT gives two rows when only one has any meaning:

Step 2: Correct But Complex

So, you want a SELECT that returns the second row but not the first. This is done by replacing "JOIN ml_subscription" with "JOIN ( big-fancy-nested-SELECTs ) AS ml_subscription"...

  JOIN ( SELECT a.user_id,
               a.subscription_id,
               a.progress,
               a.publication_name,
               a.last_upload_time,
               a.last_download_time
            FROM ml_subscription AS a
         WHERE a.progress = ( SELECT MAX ( b.progress )
                                FROM ml_subscription AS b
                               WHERE b.user_id = a.user_id
                                 AND b.publication_name = a.publication_name ) )
      AS ml_subscription

What that JOIN ( SELECT... does is create a "derived table" with columns that look just like the base table ml_subscription, but only contains rows containing the MAX ( progress ) values for each publicaton_name.

In particular, the correlated subselect "WHERE a.progress = ( SELECT MAX ( b.progress )..." eliminates the progress = 413161 row, but keeps the progress = 423060.

Here's what the whole thing looks like (the book version, again):

SELECT ml_user.name,
      ml_subscription.subscription_id,
      ml_subscription.progress,
      ml_subscription.publication_name,
      ml_subscription.last_upload_time,
      ml_subscription.last_download_time
 FROM ml_user
 JOIN ( SELECT a.user_id,
               a.subscription_id,
               a.progress,
               a.publication_name,
               a.last_upload_time,
             a.last_download_time
          FROM ml_subscription AS a
         WHERE a.progress = ( SELECT MAX ( b.progress )
                                FROM ml_subscription AS b
                               WHERE b.user_id = a.user_id
                                 AND b.publication_name = a.publication_name ) )
      AS ml_subscription
   ON ml_user.user_id = ml_subscription.user_id;

When that derived table participates in the outer "FROM ml_user JOIN ( SELECT... ON ml_user.user_id = ml_subscription.user_id", it only gets the progress = 423060 data:

Derived tables are wonderful things. They let you develop queries by a "divide and conquer" process, similar to the creation of views where you push complexity down into separate selects and then use the result sets as if they were tables. Unlike a view, however, you don't have to CREATE anything separate with a derived table, just code the separate SELECT right inside the FROM clause.

Another benefit to the divide and conquer process is that it sometimes lets you see further simplifications...

Step 3: Both Simple And Correct

The complex SELECT from Step 2 includes a column that is of absolutely no use: ml_subscription.subscription_id. MobiLink uses that column as an artificial primary key to identify rows that are otherwise duplicates: Different rows that have the same combination user_id and publication_name. For the purposes of administering MobiLink, only one of those rows is of interest (the latest one, with the maximum value of "progress").

So, if you don't display subscription_id, you can get rid of the whole complex derived table, and instead do a MAX ... GROUP BY to find the latest rows, thusly:

SELECT ml_user.name,
      ml_subscription.publication_name,
      MAX ( ml_subscription.progress )           AS progress,
      MAX ( ml_subscription.last_upload_time )   AS last_upload_time,
      MAX ( ml_subscription.last_download_time ) AS last_download_time
 FROM ml_user
 JOIN ml_subscription
   ON ml_user.user_id = ml_subscription.user_id
 GROUP BY ml_user.name,
      ml_subscription.publication_name;

Sunday, February 24, 2008

Dogma Wars: A Call For Topics

Once upon a time, long long ago, before the publisher put a limit on the size of my book, I was going include a chapter on Dogma Wars in database development. In keeping with the use of verbs-as-titles for all the other chapters, this chapter was going to be called "Debating":

Chapter 11 Debating

NULL versus NOT NULL
If It's Not Normalized, It's Not Right
Artificial Keys are OK! Artificial Keys are Evil!
Isolation Level 0 versus 1, 2, 3, Snapshot
No Access to Base Tables, Only "Base Views"
Optimistic versus Pessimistic Concurrency Control
SELECT * is OK! SELECT * is Evil!
Goofy Naming Conventions
CASE Tools Suck, Notepad Rules!
Schemas Via Ownership
Checking For Errors

Now it's years later and I'm wondering if anyone's interested in reading more, here, on this blog... about any of those topics, or any others. Post a comment here, or send me a private email, if you've got anything to say.

Saturday, February 23, 2008

Tip: Triggering an Audit Trail

Index to all the "Audit Trail" posts

February 23, 2008 Tip: Triggering an Audit Trail

February 28, 2009 Revisited: Triggering an Audit Trail

March 24, 2009 ALTER TABLE and the Audit Trail

March 18, 2013 Reporting and the Audit Trail

Let's say you want to keep a complete record of every INSERT, UPDATE and DELETE made to a particular table. Furthermore, let's say you want to store this audit trail in the database itself to make it easy to write queries.

You can use SQL Anywhere's CREATE TRIGGER feature to do all that, and with the help of several other features like INSERT WITH AUTO NAME the code becomes quite simple.

Here's some code-by-example... First, the base table t for which we want the audit trail:

CREATE TABLE t (
   pkey                    BIGINT NOT NULL DEFAULT AUTOINCREMENT,
   data1                   INTEGER NOT NULL DEFAULT 0,
   data2                   INTEGER NOT NULL DEFAULT 0,
   PRIMARY KEY ( pkey ) );

The first step is to create a "shadow table" for t containing each column from t plus several others:

CREATE TABLE t_log (
   log_id                  BIGINT NOT NULL DEFAULT AUTOINCREMENT,
   -- start of columns from table t...
   pkey                    BIGINT,
   data1                   INTEGER,
   data2                   INTEGER,
   -- ...end of columns from table t
   logged_action           VARCHAR ( 6 ) NOT NULL,
   logged_description      VARCHAR ( 1024 ) NOT NULL,
   logged_by               VARCHAR ( 20 ) NULL DEFAULT current user,
   PRIMARY KEY ( log_id ) );

One row will be inserted into this table each and every time a row is inserted, updated or deleted from t. Each "inserted" and "updated" row in t_log will show the new values in the corresponding t row, and eache "deleted" row in t_log will show the old values from the deleted t row.

It is possible that one row in t will generate several rows in t_log, as the same row in t is inserted, then updated, then deleted (and possible re-inserted with the same primary key). For that reason, while the primary key value can be copied from t to t_log, it can't be used as the primary key of t_log. A different, autoincrement, column "log_id" is added for that.

In fact, the t_log table doesn't need any of the constraints or default values that are defined in t. Some of them wouldn't even work; not just the original PRIMARY KEY clause (discussed above), but FOREIGN KEY constraints would also cause problems. For example, an "inserted" t_log row for a child table might cause a referential integrity error if a foreign key column pointed to a parent table row that was deleted.

Some other columns are added to the end of t_log; these are all optional except maybe the first one:

"logged_action" is filled in with 'INSERT', 'UPDATE' or 'DELETE'. This column is optional if you're only going to audit one kind of operation, e.g., just 'UPDATE', but it's probably necessary if you are going to audit more than one kind.

"logged_description" is filled in with a string passed from the application.

"logged_by" is automatically filled in with the user id of the connection making the change.

"logged_at" is automatically filled in with the date/time the change was made.

The optional logged_description column depends on the application being modified to execute a SQL Anywhere CREATE VARIABLE statement on the connection making the changes, and then executing SQL Anywhere SET statements to fill in different descriptions as the application makes different kinds of changes. This "connection-level variable" feature is unique to SQL Anywhere, and it lets you pass data to triggers as if they were procedure calls with parameters.

Now comes the audit trail trigger:

CREATE TRIGGER log_t
   AFTER INSERT, UPDATE, DELETE ON t
   REFERENCING OLD AS old_t NEW AS new_t
   FOR EACH STATEMENT
BEGIN

   IF VAREXISTS ( '@logged_description' ) = 0 THEN 
      CREATE VARIABLE @logged_description VARCHAR ( 1024 );
      SET @logged_description = '';
   END IF;

   CASE

      WHEN INSERTING THEN
         INSERT t_log
         WITH AUTO NAME
         SELECT *,
               'INSERT'            AS logged_action,
               @logged_description AS logged_description
           FROM new_t;

      WHEN UPDATING THEN
         INSERT t_log
         WITH AUTO NAME
         SELECT *,
               'UPDATE'            AS logged_action,
               @logged_description AS logged_description
           FROM new_t;

      WHEN DELETING THEN
         INSERT t_log
         WITH AUTO NAME
         SELECT *,
               'DELETE'            AS logged_action,
               @logged_description AS logged_description
           FROM old_t;

   END CASE;

END; -- log_t

If you want to audit more than one table, you will need to create separate shadow "t_log" tables and separate "log_t" triggers, but it's not as hard as it sounds. In particular, only the table names "t" and "t_log" and trigger name "log_t" need to be changed to create different triggers. The unique SQL Anywhere feature INSERT WITH AUTO NAME makes it unnecessary to code any of the column names from the original table t.

In fact, it's easy to write code that will automatically generate the CREATE TRIGGER statements; that technique might be the subject of a future Tip :)

The code in the trigger exploits the following features:

1. AFTER UPDATE instead of BEFORE UPDATE so feature 2 can be used.

2. FOR EACH STATEMENT instead of FOR EACH ROW so the REFERENCING names can be referred to as tables in FROM new_t and FROM old_t clauses.

3. IF VAREXISTS to determine if CREATE VARIABLE @logged_description needs to be executed (in case the application didn't do it).

4. CASE WHEN INSERTING / UPDATING / DELETING to determine whether new_t or old_t should be used, and what the value of logged_action should be.

5. WITH AUTO NAME so the SELECT list items will be automatically matched with the INSERT table column names by name.

6. AS logged_action and AS logged_description alias names to make sure WITH AUTO NAME works for these additional column values.

7. DEFAULT values defined for t_log columns get used for columns not in the INSERT SELECT lists.

Here's a dbisql script to test everything:

CREATE VARIABLE @logged_description VARCHAR ( 1024 );

SET @logged_description = 'Test 1: INSERT 5 rows.';
INSERT t 
SELECT row_num, row_num, row_num 
  FROM RowGenerator 
 WHERE row_num <= 5; 
COMMIT;

SET @logged_description = 'Test 1: UPDATE 2 rows.';
UPDATE t
  SET data1 = data1 + 1
 WHERE pkey <= 2; 
COMMIT;

SET @logged_description = 'Test 1: DELETE 3 rows.';
DELETE t
 WHERE pkey >= 3; 
COMMIT;

SELECT * FROM t_log ORDER BY log_id;

Here's what the audit trail looks like after all those changes:

Friday, February 22, 2008

What To Look For In SQL Server 2008

Here's an excerpt from a Microsoft SQL Server Feature request...

Description: To enhance the user experience, while waiting for all the various extranneous and unnecessary dialogue boxes to open while trying to do things like start jobs, get properties on objects, etc, the request is to have a .wav file included in the next Service Pack which would play the theme from Jeopardy from the moment a button is clicked to the moment the resulting dialogue box is opened up.

This would greatly help users who need something to keep them from banging their heads on their desks while waiting... and waiting... and waiting... for something as simple as: Enabling a job. Viewing steps of a job... Viewing properties of a database - oh, the list is endless...

Comments: Thanks for this feedback. I have converted this request to either a defect (something is broken) or a Design Change Request (something new to add to the product) and so I'll close this ticket out. We will consider your information for SQL Server 2008 or a subsequent release.

Thanks!

- Buck Woody, SQL Server Program Manager

Thursday, February 21, 2008

Today's Client Question: SQL Anywhere vs ASE

Q: One of my customers is insisting on changing the database from SQL Anywhere to ASE just because ASE has the word "Enterprise" in its name. Can you provide any ammunition to me why SQL Anywhere is as much Enterprise as ASE?

A: I've written the answer in the form of a FAQ...

Q: Is SQL Anywhere an enterprise database?

A: For 90% of the world's enterprises the answer is "most definitely yes!" SQL Anywhere is much easier to administer and develop for than other products, and advances in computer technology mean that performance is no longer the serious concern it used to be: maintenance and development costs are. You can buy a 3Ghz computer with 2G of RAM and a 500G disk drive for less than U$1000, and that's enough power to run an enterprise database in many companies.

For the other 10% of the world's enterprises, performance problems still require huge sums of money to be spent on equipment, software and especially support. New features in SQL Anywhere are pushing the envelope, however; for example, version 10 introduces intra-query parallelism and hot failover using database mirroring.

SQL Anywhere is now being marketed as an enterprise database solution, as well as a solution for mid-level and embedded applications. Millions of rows, thousands of busy users, hundreds of gigabytes, no problem; see SQL Anywhere Performance Benchmarks.

Q: Why don't more companies use SQL Anywhere for their enterprise databases?

A: One reason is that many companies think they are in the top 10% when they actually aren't.

Q: How big a database can SQL Anywhere handle?

A: There are many successful production SQL Anywhere databases that measure in the tens and hundreds of gigabytes in size.

Q: How many users can SQL Anywhere handle?

A: There are many successful production SQL Anywhere databases with hundreds and thousands of active connections.

Q: Why don't I hear about SQL Anywhere?

A: Many companies embed SQL Anywhere in their own products, and they take a "stealth" approach: they don't advertise the fact they are using SQL Anywhere to their own customers or the public at large.

Q: Why can't I find people with SQL Anywhere experience?

A: SQL Anywhere is so easy to learn and use that many companies take a "do it yourself" approach. Staff with SQL Anywhere skills often have other important duties; their DBA role is part-time.

Q: Why can't I find books about SQL Anywhere?

A: Because it's easier to use than other products, and the Help is so good, there isn't the same market for books. There is one recent book: SQL Anywhere Studio 9 Developer's Guide, ISBN 1-55622-506-7.

Q: Why should I pick SQL Anywhere over ASE?

A: For many reasons:

SQL Anywhere has a more powerful procedural SQL language. That means when appropriate, it is possible to push complex logic down into the database in an efficient manner: efficient for the developer to write, efficient for the engine to execute. SQL Anywhere has scheduled and triggered events, row-level triggers, local blob variables, global temporary tables, persistent local temporary tables, user-defined connection-level variables, user-defined SQL functions with no artificial semantic limitations... the list goes on and on.
SQL Anywhere has a more powerful SELECT statement, including local views via the WITH clause, the LIST aggregate function with ORDER BY, the recursive union for hierarchical queries, modern OLAP query operators, ORDER BY clauses on derived tables... this list goes on and on as well.
SQL Anywhere has great tools: stored procedure and trigger debugger, capturing of expensive queries, the index consultant, the graphical plan display.
SQL Anywhere has simpler administration. You won't get panic calls "The transaction log is full!" Installation is easier, configuration is easier, performance tuning is easier (and mostly unnecessary), backup is easier, recovery is easier, migration from one platform to another is easier (even to different-endian setups: install the software, copy the database file, start the server).
SQL Anywhere has a lower price tag, not just for licensing but for all those DBAs you do not need to hire. Everything comes "in the box"... no need for third-party tool purchases, no need to buy another DBMS to do OLAP, or another product to do replication.
SQL Anywhere has very bright future. The SQL Anywhere engineers definitely "get it", they understand what people need from a DBMS, what the default settings and default behavior should be, why the ANSI standards are important... they do things the way they should be done. SQL Anywhere has always had many important features before the other guy; examples include declarative referential integrity, row-level locking, scrollable cursors and the LEFT OUTER JOIN as an alternative to the unpredictable *= operator.

Wednesday, February 20, 2008

Defragmenting The Temporary File

All you Microsoft conspiracy theorists out there, get out your tinfoil hats! It turns out that Windows XP and 2003 do a really bad job of allocating contiguous disk space for new and growing files. It doesn't matter if you have just defragmented your hard drive, and here's the reason why:

... it turns out that Windows XP/2003 fragments files whether or not all the free space is defragmented ... "consolidating every piece of free space" is no guarantee that newly arriving files won't be fragmented by the file system. That was a maxim on FAT volumes in DOS, but certainly is not applicable to NTFS or FAT volumes under Windows XP/2003. - How File Fragmentation Occurs On Windows XP / Windows Server 2003

This has serious implications for SQL Anywhere's temporary file. This file is deleted and recreated each time the database engine is shut down, and it grows in size as needed to satisfy requirements as the server runs. Because of the way Windows works, this file becomes fragmented as soon as it is created, and it gets worse as the file grows.

Let's consider an empty SQL Anywhere 10 database that has just been started on a freshly defragmented hard drive. Foxhound shows that the database and transaction log files are both completely contiguous (only 1 fragment), but the temporary file already has 16 fragments:

Now let's run a couple of queries that use a lot of temporary space; the temporary file grows to 524M, with 523M of that becoming free space when the queries finish. The database and log files are still contiguous (no surprise there), but the temporary file now has 72 fragments:

For some databases this won't be a problem: If there aren't a lot of connections, or there isn't a lot of temp space usage, a heavily fragmented temporary file won't slow things down. Also, if a disk defragmenter is run periodically, and the server is never shut down, the temporary file will eventually get defragmented after it has grown to its "highwater mark".

But, if you are concerned, there is a way to ensure the temporary file is contiguous. The solution is to force the temporary file to grow to its maximum size each time the database starts, and then immediately defragment it. The ALTER DBSPACE statement is used to force it to grow, the contig.exe utility is executed via xp_cmdshell() to defragment the file as soon as it has grown, and all this code is placed inside a SQL Anywhere "DatabaseStart" event.

ALTER DBPACE and CREATE EVENT are SQL Anywhere statements; the contig.exe utility is available from Microsoft.

Here's what the code looks like, including some diagnostic output to the database engine console log (MESSAGE ... TO CONSOLE) and a text file "C:\temp\contig_log.txt":

CREATE EVENT database_start TYPE DatabaseStart 
HANDLER BEGIN

   DECLARE @temp_file_spec  LONG VARCHAR;
   DECLARE @return_code     INTEGER;
   DECLARE @sqlcode         INTEGER;
   DECLARE @sqlstate        VARCHAR ( 5 );
   DECLARE @errormsg        VARCHAR ( 32767 );

   MESSAGE STRING ( CURRENT TIMESTAMP, 
      ' DatabaseStart event...' ) TO CONSOLE;

   ALTER DBSPACE TEMPORARY ADD 600MB;

   SET @temp_file_spec = DB_PROPERTY ( 'TempFileName' );

   MESSAGE STRING ( CURRENT TIMESTAMP, 
      ' Running contig on ',
      @temp_file_spec ) TO CONSOLE;

   @return_code = CALL xp_cmdshell ( 
      STRING ( 
         'C:\\download\\sysinternals\\contig\\contig.exe -v ',
         @temp_file_spec,
         ' >>C:\\temp\\contig_log.txt' ),
      'no_output' );

   MESSAGE STRING ( CURRENT TIMESTAMP, 
      ' xp_cmdshell @return_code = ', 
      @return_code ) TO CONSOLE;

EXCEPTION WHEN OTHERS THEN
   SELECT SQLCODE, SQLSTATE, ERRORMSG()
         INTO @sqlcode, @sqlstate, @errormsg;
   MESSAGE STRING ( CURRENT TIMESTAMP, 
         ' DatabaseStart exception...' ) TO CONSOLE;
   MESSAGE STRING ( 'SQLCODE = ', @sqlcode ) TO CONSOLE;
   MESSAGE STRING ( 'SQLSTATE = ', @sqlstate ) TO CONSOLE;
   MESSAGE STRING ( 'ERRORMSG() = ', @errormsg ) TO CONSOLE;

END;

Now all three files are contiguous after the database is stopped and restarted:

This extra processing comes at a price: an extra 2 minutes is added to the database startup time as shown by the console log:

2008-02-20 11:13:22.781 DatabaseStart event...
Database server started at Wed Feb 20 2008 11:13
2008-02-20 11:14:00.468 Running contig on 
   C:\DOCUME~1\bcarter\LOCALS~1\Temp\asat0000.tmp
2008-02-20 11:15:42.328 xp_cmdshell @return_code = 0

Here's the contig_log.txt output from the defragmentation process; note that only 4 fragments existed immediately after the ALTER DBSPACE, rather than the 72 fragments that existed after the file grew during normal usage. This may be good enough for you; i.e., you may decide to only run the ALTER DBSPACE, not the xp_cmdshell - contig.exe step.

Contig v1.54 - Makes files contiguous
Copyright (C) 1998-2007 Mark Russinovich
Sysinternals - www.sysinternals.com

------------------------
Processing C:\DOCUME~1\bcarter\LOCALS~1\Temp\asat0000.tmp:
Scanning file...
Scanning disk...
File is 153617 physical clusters in length.
File is in 4 fragments.

Found a free disk block at 24084622 of length 244001 for entire file.
Moving 153617 clusters at file offset cluster 0 to disk cluster 24084622
File size: 629215232 bytes
Fragments before: 4
Fragments after : 1
------------------------
Summary:
     Number of files processed   : 1
     Number of files defragmented: 1
     Average fragmentation before: 4 frags/file
     Average fragmentation after : 1 frags/file

And what about Vista? I don't know; the situation might be different, but I somehow doubt it. Here is an article that talks a bit about Vista:

The Truth about Defragmentation - Wes Miller, November 2007 issue of the Microsoft TechNet Magazine

Tuesday, February 19, 2008

Tip: The All-Encompassing Try-Catch

One of the least-used SQL statements in all of SQL Anywhere is the "try-catch" EXCEPTION handler. If you look it up in the Help, you'll see a discussion of how to use it with DECLARE ... EXCEPTION FOR SQLSTATE declarations, but the open-ended all-encompassing form is easier to code and often more useful.

Let's say you want to execute some statement, and you don't care if it works or not, you just want your code to robustly carry on with the next statement: for example, you want to DROP a table that may or may not exist. If the table doesn't exist, you don't care, and you certainly don't want to get an error message because the DROP failed.

One solution is to wrap the DROP statement in a BEGIN - END block, and include an EXCEPTION handler that does nothing. Here's an example where a local temporary table is being dropped; because there is no information about local temporary tables in the system catalog there's no easy way to check if the table exists, so the exception handler is a good alternative:

BEGIN
DROP TABLE t;
   EXCEPTION WHEN OTHERS THEN
END;

CREATE LOCAL TEMPORARY TABLE t (
   x INTEGER NOT NULL PRIMARY KEY );

The clause "EXCEPTION WHEN OTHERS THEN" means "trap any exception that occurs in the BEGIN block", and the fact that no code follows the THEN means "ignore the exception".

Now let's say you have some code that might cause problems, and you want to capture diagnostic information if an exception occurs and then carry on. You can use the same "trap any exception..." syntax, but this time you put some diagnostic code after the THEN. Here's a template for a stored procedure:

CREATE PROCEDURE ...
BEGIN

   DECLARE @sqlcode   INTEGER;
   DECLARE @sqlstate  VARCHAR ( 5 );
   DECLARE @errormsg  VARCHAR ( 32767 );

   -- code outside the scope of the exception handler

   BEGIN

      -- code which will have exceptions handled

   EXCEPTION WHEN OTHERS THEN

      SELECT SQLCODE, SQLSTATE, ERRORMSG()
         INTO @sqlcode, @sqlstate, @errormsg;

      -- optional code: further processing, etc.

   END;

   -- code that can check if an exception occurred

END;

Here's a simple demonstration:

BEGIN
   DECLARE @sqlcode   INTEGER;
   DECLARE @sqlstate  VARCHAR ( 5 );
   DECLARE @errormsg  VARCHAR ( 32767 );
   DECLARE @date      DATE;
   BEGIN
      SET @date = 'xxx'; -- raise an exception
   EXCEPTION WHEN OTHERS THEN
      SELECT SQLCODE, SQLSTATE, ERRORMSG()
         INTO @sqlcode, @sqlstate, @errormsg;
   END;
   MESSAGE STRING ( 'SQLCODE = ', @sqlcode ) TO CLIENT;
   MESSAGE STRING ( 'SQLSTATE = ', @sqlstate ) TO CLIENT;
   MESSAGE STRING ( 'ERRORMSG() = ', @errormsg ) TO CLIENT;
END;

SQLCODE = -157
SQLSTATE = 53018
ERRORMSG() = Cannot convert 'xxx' to a timestamp

Monday, February 18, 2008

Check Out Those Residual Predicates!

A "residual predicate" is SQL Anywhere's version of the pretty girl at the high school dance: You really want to dance with her but you're too afraid to ask. SQL Anywhere really wants to use that residual predicate to speed things up but it doesn't, for some reason or another.

What SQL Anywhere does do is show you residual predicates when it displays the Graphical Plan With Statistics. Not up top where it says "Table Scan" versus "Index Scan", mind you, but way down at the bottom of the display, after all the other counters and statistics. It's up to you to scroll down and have a look, and see if it's worth your while to make some change so SQL Anywhere can put that residual predicate to good use.

Here's an example of the Graphical Plan for a MobiLink download_cursor SELECT that took almost 5 minutes to run:

Even though that query selected no rows, it involves a join between the two largest tables in the database, 7 million rows and 3.8G of space according to the Foxhound "Curiosities" display:

The Graphical Plan shows that it did a primary key "Index Scan" for the sbx_event_raw_data table. So far, so good, at least it's not a sequential table scan.

...But wait a minute! Not so fast! This table took over half the execution time, that's not so good!

Let's keep looking... scroll down to the bottom of the same display and you see this "Residual predicate":

What it's saying is that the SELECT contained this predicate in the WHERE clause:

sbx_event_raw_data.modification_date > 2008-02-10 14:15:26.218000

and that this predicate was expected to be true 6.2696% of the time according to "Statistics" stored in the database.

According to Foxhound, a huge (67 megabyte) index already exists on the modification_history column:

So why didn't SQL Anywhere use that index? The answer lies in the number 6.2696 percent which is the "selectivity" of a predicate. Lower is better than higher when it comes to using an index on a predicate, and 6 percent selectivity is too high for SQL Anywhere to use an index for this particular predicate... it'll be worse than a sequential table scan because it will probably do the same thing (read every single page) but in random order.

Welllll, it turns out that the 6 percent figure is completely wrong: In reality this predicate is true exactly zero percent of the time because there are no rows with modification_date greater than February 10. You can't get any better than zero percent selectivity, so SQL Anywhere should have used that index.

The problem is, the statistics are wrong. The solution is, fix the statistics:

CREATE STATISTICS sbx_event_raw_data ( modification_date );

The CREATE STATISTICS statement took about 3 minutes to run. After that, the MobiLink download_cursor went from 5 minutes down to less than ten seconds.

And that's a pretty good deal, because this SELECT runs many times a day but the CREATE STATISTICS only needs to be run once in a while (or maybe just once).

Here's a disclaimer for my good friends at iAnywhere Solutions: This article is specifically about residual predicates, not database statistics. In a perfect world, database statistics would always be kept perfectly up to date, by SQL Anywhere, automatically... and CREATE STATISTICS would never be required. In the real world, SQL Anywhere is getting better and better with each new release, but sometimes the statistics get out of whack. And in this example, the client is not using the latest version of SQL Anywhere so "sometimes out of whack" is not so surprising. But again, that's not the point here... residual predicates is the point. Whenever you're looking at a plan, check out the residual predicates.

Sunday, February 17, 2008

Where's The Beef?

I'm no financial genius. Proof hangs on my office wall in the form of a framed Bre-X Minerals share certificate...

Maybe not being a genius is the reason I don't understand how Sun can possibly justify paying 1 billion dollars for MySQL AB, including $800 million in cash.

That's $2,000,000 cash per MySQL employee.

For a while after hearing about the purchase, I was really confused because I thought Oracle had already bought MySQL.

But no, what Oracle bought was the company called Innobase OY. They're the folks who make InnoDB, the actual transactional database brains behind MySQL:

InnoDB is the most popular transactional storage engine for MySQL. Unlike MyISAM, InnoDB is not only ACID compliant, but it also supports row-level locking and referential integrity.

So Sun is paying $1B for a non-transactional, non-relational file system that anyone can download for free? Couldn't Sun just put a link to download MySQL on their website and save the money?

Can anyone explain that to me?

Anyone?

Saturday, February 16, 2008

Great Moments In History: SilverStream

Back in the dark days at the end of the last Millennium, the SilverStream application server was the hottest of the hot application development products. All the best PowerBuilder and Lotus engineers had quit Sybase and IBM to go work on SilverStream, so it had to be good, right?

Well, not so much: hideously hard to use, a performance nightmare, a disaster in the marketplace, crushed by the dot-com collapse, assimilated by Novell in 2002, and finally put to rest in 2005.

But a Great Moment In History nonetheless: Here's the SilverStream stock chart at it's peak:

And to give everyone a taste of what life was really like for programmers using SilverStream, here is a blog posting from back in the day:

The fRiDaY File,
for June 17, 1999

SilverStream By The Numbers

Communication among SilverStream programmers is more efficient when numbers are assigned to common conversational elements.

For example,

"How do I [do some thing] in SilverStream?"

may be answered simply:

"32! 33!"

The person asking the question might then express shock by

"Oh, man, 73! 1!"

Here's the list, SilverStream By The Numbers:...

1 - the Help sucks
2 - it's so slow
3 - Simple Actions are useless, click on Java
4 - you're at the 3GL level all the time except when you want to be
5 - that's really really really hard to do in SilverStream
6 - you can't use javadoc
14 - the Designer's locked up
15 - the Designer's using all the CPU

16 - the Designer's using all the memory
17 - use Task Manager to kill the Designer
18 - the Server's locked up
19 - the Server's using all the CPU
20 - the Server's using all the memory
21 - use Task Manager to kill the Server
22 - the Server won't start
23 - reboot NT
29 - everyone take a break, I'm going to resynch the database
30 - you need more RAM
31 - the day sure goes by fast when you 23 every few minutes
32 - that doesn't work in SilverStream
33 - I had to write my own
34 - nobody's asked that on the newsgroups
35 - nobody's answered that on the newsgroups
36 - I don't think anybody's tried that before
37 - I told you, 32, but you weren't listening
38 - Fulcrum sucks
39 - the debugger sucks
41 - you can't do that with a SilverStream view
42 - try creating a database view to fool SilverStream
43 - you can't put SQL in a SilverStream where clause
44 - you can't use real SQL any more
45 - the Expression Builder sucks
46 - Uncle SilverStream Knows Best
50 - hosed the server
51 - OK, who 50
52 - oops, I 50
68 - SilverStream isn't for programmers, it's for investment bankers
69 - how can a group that wrote such a cool program write such a bad one
72 - that would be easy with a DataWindow
73 - that would be easy in PowerBuilder
74 - that would be easy in PowerJ
75 - how much does Jaguar cost
78 - SilverStream doesn't work with [fill in product name: e.g., 79]
79 - Oracle
81 - I'm screwed
82 - you're screwed
83 - we're all screwed
84 - maybe that's a bug in the VM
85 - maybe that's a bug in SilverStream
97 - maybe that will be in the beta
98 - maybe that will be in the next release
99 - what a kludge

Friday, February 15, 2008

Don't Be Afraid!

"Learning By Doing" is a wonderful technique for many people. If the documentation isn't perfectly clear, or if the examples don't match your requirements, just writing some code and seeing what happens is often the best approach.

Fear, however, sometimes stands in the way of Learning By Doing:

Fear that your experimentation will disturb a critically-important development or production environment, and
Fear that setting up a separate test-bed will be difficult and time-consuming.

The first fear is often rational and justified. So is the second fear, if you're talking about Oracle or SQL Server or DB2, but not if you're using SQL Anywhere.

With SQL Anywhere, you can create and launch a new database on your workstation in a matter of seconds, and start your Learning By Doing adventure with no effect on anyone else. The software you use will be exactly the same as the version used in production, nothing "Lite" about it, so your experimental results will be meaningful.

Here is the Windows batch file that I regularly use to create and/or start an experimental database. The dbinit utility creates the database file ddd10.db, the dbspawn/dbeng10 step starts the database, and the last two commands start two dbisql sessions with "ddd10-1" and "ddd10-2" showing in the window title bars so they're easy to tell apart.

(Caution: The last 3 commands should each appear on a single line, but each one has been "wrapped" onto two lines to fit on the screen.)

REM dbinit...
REM -e  simple encryption
REM -et enable table encryption
REM -s  use checksums when writing pages

REM dbspawn...
REM -f  do not check for a running server.
REM -q  quiet mode — do not print messages.

REM dbeng10...
REM -c ...   initial RAM cache size
REM -o ...   where to put server console log text file
REM -os 10M  when to rename/restart console log text file
REM -x none  only allow "shared memory" client connections
REM -z       diagnose communication links
REM -zl      RememberLastStatement for LastStatement
REM -zp      RememberLastPlan for LastPlanText
REM -zt      RequestTiming for performance statistics
REM ddd.db   first database to start

REM dbisql...
REM -c ...  database connection string

"%SQLANY10%\win32\dbinit.exe" -et -s ddd10.db

"%SQLANY10%\win32\dbspawn.exe" -f "%SQLANY10%\win32\dbeng10.exe"
-o dbeng10_log_ddd10.txt -os 10M -x tcpip -zl -zp -zt ddd10.db

"%SQLANY10%\win32\dbisql.exe"
-c "ENG=ddd10;DBN=ddd10;UID=dba;PWD=sql;CON=ddd10-1"

"%SQLANY10%\win32\dbisql.exe"
-c "ENG=ddd10;DBN=ddd10;UID=dba;PWD=sql;CON=ddd10-2"

You can run the batch file whether or not the ddd10.db file already exists, and whether or not the engine is already running. If the database file already exists, the dbinit step will just display an error message and move to the next step. If the engine is already running, dbspawn/dbeng10 will display an error and move on to the dbisql steps.

That means if all you need is to restart an existing database, or start a dbisql session on a running database, you can run this batch file; you don't need to do something different.

And best part is this: If your experimentation runs completely awry and you want to start over, here's all you have to do:

Stop the engine,
Delete the files ddd10.db and ddd10.log, and
Run the batch file again.

Thursday, February 14, 2008

Valentine's Day

Wednesday, February 13, 2008

Today's Client Question: Converting From Oracle

Q: What should we look out for when converting from Oracle to SQL Anywhere? We're switching software for our remote databases so we can use MobiLink to synchronize with our central Oracle database and save a lot of money in maintenance fees.

A: For the most part, it's easier to go from Oracle to SQL Anywhere than the other way. Plus, you'll save more money than just the maintenance fees.

The first thing to look out for are the column data types. You can use SQL Anywhere's "Migrate Database" wizard to create a SQL Anywhere database from Oracle in a few minutes, but then you really should check each and every column to make sure you're happy with the new data type. Here are some of the wizard's choices that deserve your attention:

32-bit BINARY_FLOAT Oracle columns are migrated to the 64-bit DOUBLE data type in SQL Anywhere, whereas you may prefer the 32-bit REAL data type.

DATE columns become TIMESTAMP in SQL Anywhere, whereas you may prefer DATE.

NUMBER columns are migrated to use the 64-bit floating point DOUBLE data type, whereas you may only be using the columns to hold integer or fixed-point values in which case data types like INTEGER, BIGINT or NUMERIC might be more appropriate.

NUMBER ( precision ) columns become NUMERIC ( precision ), whereas you may prefer one of the binary integer data types like INTEGER or BIGINT.

Also, Oracle CHAR columns are fixed-length whereas all SQL Anywhere string data types are varying in length, even if you code CHAR instead of VARCHAR. This doesn't often pose a problem since applications don't usually rely on automatic-padding-to-waste-space.

Another big difference to deal with is that Oracle uses the PL/SQL syntax for stored procedures and triggers, and it is very different from the WATCOM SQL and Transact SQL syntax that SQL Anywhere uses. So different, in fact, that converting procedures and triggers will be a complete rewrite. The good news is that WATCOM SQL's easier to code and debug; e.g., you don't have to deal with problems like Oracle's "mutating tables" limitation...

There are a lot of other minor differences between Oracle and SQL Anywhere; here's a few I've run into on conversion projects:

REMAINDER becomes MOD
DECLARE goes after the procedure BEGIN
TO_NUMBER(USERENV('SESSION_ID')) becomes @@spid
IN OUT becomes INOUT
RAISE_APPLICATION_ERROR becomes RAISERROR
variable := expression; becomes SET variable = expression;
ELSIF becomes ELSEIF
SYSDATE becomes CURRENT TIMESTAMP
TO_CHAR becomes STRING (which is way more powerful)
FROM DUAL becomes FROM dummy (or just omit the FROM clause)
WHERE (+) operators must be replaced with OUTER JOIN clauses
DECODE becomes a CASE or IF expression
SQL%ROWCOUNT becomes @@ROWCOUNT
CREATE SEQUENCE becomes DEFAULT AUTOINCREMENT or GLOBAL AUTOINCREMENT
ROUND requires 2 arguments
GREATEST becomes GREATER and only works with 2 arguments
Date arithmetic does not return a fractional time component

Also note that in Oracle the empty string literal '' becomes a NULL value whereas in SQL Anywhere it is treated as a zero-length string.

Sunday, February 10, 2008

Things That Make Life Difficult

Saturday, February 9, 2008

Techwave 2008 August 4 Las Vegas

I'm not the only one who wishes Techwave moved back to the Disneyworld Dolphin in May, rather than Vegas in August.

But, the Mandalay Bay ain't bad, and the Charlie Palmer Steakhouse next door is the best, so I'll see you there: August 4 to 8, 2008.

Here's the Techwave link (not much there yet, check back in the spring).

Friday, February 8, 2008

Today's Client Question: Temp File Size

Q: Why does DB_PROPERTY ( 'TempTablePages' ) say that only 16K of space is used when our SQL Anywhere Version 9 temporary file has grown to 9G in size? Even Foxhound doesn't get the number right, it says the "Temp File" is only 4M.

A: All the numbers are correct, but some are more correct than others; it depends on what you're looking for.

Some background: The temporary file is created when the server starts, is deleted when it shuts down, and in between it grows in size but never shrinks. In your case, at some point since the server started it needed 9G of space, and that might indicate a problem... perhaps one or more runaway queries. Be warned: the server will most likely crash if the temporary file grows to fill the disk.

The DB_PROPERTY ( 'TempTablePages' ) value tells you how much temporary file space is currently used for temporary tables. Note the phrase "currently used" instead of "maximum space required". Also, "temporary tables" is not the only use for the temporary file; a runaway query might use huge amounts of temporary file space even when no temporary tables exist at all.

You can calculate the SUM() of all CONNECTION_PROPERTY ( 'TempFilePages' ) values to see a bigger picture: the total amount of space currently used in the temporary file. Here is a runaway query that uses a lot of temporary file space, followed by a query that shows the difference between TempTablePages and TempFilePages:

SELECT COUNT(*)
FROM ( SELECT *
FROM SYSTABLE CROSS JOIN SYSCOLUMN
UNION
SELECT *
FROM SYSTABLE CROSS JOIN SYSCOLUMN ) AS fat;

SELECT CAST ( TempFilePages * PageSize / 1024.0 AS BIGINT )
AS TempFilePagesK,
CAST ( TempTablePages * PageSize / 1024.0 AS BIGINT )
AS TempTablePagesK
FROM ( SELECT SUM ( CAST ( CONNECTION_PROPERTY ( 'TempFilePages',
                                       sa_conn_info.Number )
           AS DECIMAL ( 15, 6 ) ) ) AS TempFilePages,
    CAST ( DB_PROPERTY ( 'TempTablePages' )
           AS DECIMAL ( 15, 6 ) ) AS TempTablePages,
    CAST ( DB_PROPERTY ( 'PageSize' )
           AS DECIMAL ( 15, 6 ) ) AS PageSize
FROM sa_conn_info() ) AS properties;

In your case, that is where the 4M figure is coming from; Foxhound computes the SUM() for you. It's a bigger number than the 16K, but it's still the "currently used" number, not the temporary file size.

In the case of the SELECT COUNT(*) above, Figure 1 shows how the SUM() of TempFilePages grows and shrinks:

Figure 1: Foxhound Temp File History

Figure 2 shows the connection-level detail when the peak Temp File value of 226M was reached: one connection was using all the space to do the SELECT COUNT(*).

Figure 2: Foxhound Connection Detail

Figure 3 shows that after SELECT COUNT(*) finished, most of the temporary file is free space.

Figure 3: Foxhound DBSPACE Detail

You can run your own query to show the actual temporary file size, which is the "high water mark" for temporary file usage:

SELECT CAST ( temp_file_pages * page_size / 1024.0 / 1024.0
AS BIGINT ) AS TempFileK
FROM ( SELECT CAST ( DB_EXTENDED_PROPERTY ( 'FileSize', 'temp' )
       AS DECIMAL ( 15, 6 ) ) AS temp_file_pages,
    CAST ( DB_PROPERTY ( 'PageSize' )
       AS DECIMAL ( 15, 6 ) ) AS page_size ) AS sizes;

What Can Be Done About Temp File Usage?

SQL Anywhere Version 9 introduced the following database-level option to keep runaway queries from crashing the server:

SET OPTION PUBLIC.TEMP_SPACE_LIMIT_CHECK = 'ON';

Version 10 extended the idea (and here's another reason to upgrade to V10): You can specify different limits on the amount of temporary space allocated to different connections.

For example, if you have a lot of high-speed low-overhead OLTP connections, you can give them a low limit on temporary space to prevent fat queries from affecting response and throughput:

SET TEMPORARY OPTION MAX_TEMP_SPACE = '1M'; -- small

At the same time, if you have some connections doing important OLAP queries, you might want to give them a high limit so that they will be able to get their work done:

SET TEMPORARY OPTION MAX_TEMP_SPACE = '10G'; -- large

Some notes: The MAX_TEMP_SPACE option is available at the connection level even though the V10 help implies it is available only at the "PUBLIC" database level. Also, the MAX_TEMP_SPACE option only works if you also have TEMP_SPACE_LIMIT_CHECK set 'ON'.

Thursday, February 7, 2008

Today's Puzzle: Variable '@daffy' not found

Why does the following SQL give the error message "Variable '@daffy' not found"?

CREATE PROCEDURE mickey ( IN  @minnie INTEGER )
BEGIN
DECLARE @minnie INTEGER;
DECLARE @daffy  INTEGER;
EXCEPTION 
   WHEN OTHERS THEN
      SET @daffy = 1;
END;

CALL mickey ( 1 );

Wednesday, February 6, 2008

Makeover Madness

My colleague, Margaret Kammermayer, played hooky from work this morning to appear on this TV show to get this makeover...

Tuesday, February 5, 2008

Today's Client Question: Invalid Log

Q: If you don’t mind, I have one unrelated question; if you are busy, please feel free to ignore. In our setup we dump transaction log every 10 minutes and apply the same to the mirror site. Recently I am finding very strange behavior: I am unable to apply the Tran log to mirror site. We apply logs in sequential order and until successful, we never try applying next log file. Also, the mirror site is standby only, the ASA server is never up, it comes up for a short while only as a result of log application command dbsrv9 dbfilename -a logfilename.

While trying to apply the log I get following error – “Database cannot be started -- xxx.log is an invalid transaction log”.

I am surprised why this happens. If you have experienced this in past, please let me know what could be wrong.

Thanks!

A: Hmmmmm... you could try using dbtran on that file to see if it "sort of looks like a log file" or if it is truly messed up.

My guess is something Very Bad happened to the file.

I recommend two alternatives:

Use "live log backup" which is a continuous process and requires less administrative effort.
Migrate to SQL Anywhere Version 10.0.1 so you can use a mirrored database which is *really* cool, especially when switching over.

If you want more explanation about live log backup let me know. There is a little bit in my book on page 386, but it doesn't talk about the cool .BAT file with the endless GOTO loop that continuously tries to restart dbbackup on the secondary computer until the primary comes back up... that trick I learned from a client.

That reminds me of this old joke:

Question: What is a consultant?

Answer: A consultant is someone who borrows your watch to tell you the time and then keeps your watch.

I do not mind questions AT ALL, for two reasons: This is the free "after sales service" that all my clients get, PLUS you have been kind enough to beta test Foxhound!

Breck

Monday, February 4, 2008

Family Tech Support

Did you know that you can build an entire database-driven website using only SQL Anywhere? No other programming language, not even a web server is required, just the database?

That's because SQL Anywhere has a built-in HTTP server, plus web services and all the necessities for processing HTTP requests and returning HTML on-the-fly to the browser. That's how Foxhound works, for example.

Yabbut, yabbut... let's suppose you don't want to get that fancy. You don't want to learn about web services, and you don't want to (or can't) install a database server where your website sits... You just want to automate the creation of simple static HTML files for an ordinary web server like IIS.

You can still do that with SQL Anywhere... It's even simpler, and still no other programming language is required. Here's how it works for the RuralFinds web store:

Links to PayPal are used to implement the shopping cart and payment features. This allows the customer to use a credit card as well as PayPal while eliminating the expense and complexity of an in-house shopping cart and payment implementation.
Excel spreadsheets are used as an "EconoGUI" to enter and update the product information.
SQL Anywhere's proxy table feature is used to load the data from Excel into the item table.
Item photos are stored separately as JPG files, as befits a simple "static HTML" website.
Stored procedures create the home page index.html file, the individual item pages and the category "table of contents" pages, as well as special files for Google: the Google search results template and the Google sitemap file. Calls to xp_writefile() create the actual files.
Whenever the spreadsheets are changed, all the HTML files are regenerated from scratch and then copied to the web server. Currently, it takes about 5 seconds to create 200 files.

The decision to use PayPal isn't likely to change, it really does make everyone's life so much easier.

The Excel spreadsheets are another matter. Excel is a quick way to get a user interface up and running, but from a "Family Tech Support" point of view it's an administrative nightmare... I see a good old-fashioned client-server application in my future.

Pages