Monday, October 20, 2008

Database Monitor Email Alerts

The ability to send email alerts warning of Bad Things Happening to your SQL Anywhere database server is the release-defining feature of the new SQL Anywhere Server Monitor now available in beta. Here's one of those emails, this one warning about high CPU usage:



The monitor lets you pick and choose what you want to be warned about:



It's a two-part process to set up who gets the warnings. First, you set up the email server...



Then, you define the email addresses:



Email alerts work OK with ordinary SMTP servers like the one shown here, and the monitor does allow input of a user name and password for authentication purposes.

As far as I can tell everything works fine with ordinary SMTP servers, but it chokes on a server like smtp.gmail.com with the message "Email Server for Alerts not Configured". This may be because the Google Mail server requires transport layer security (TLS) and SQL Anywhere's email code doesn't support that.

I'm just guessing at the connection between the SQL Anywhere Monitor and SQL Anywhere's email support, but it's an educated guess: a standalone test calling xp_startsmtp from a SQL Anywhere 11 database returns -530 which (I think) means "Must issue a STARTTLS command first."

But, that's a minor quibble. The SQL Anywhere Monitor's support for email alerts gives it a significant advantage over its only real competition, the Foxhound monitor.

Sunday, October 12, 2008

First EBF for SQL Anywhere 11 Ships: 11.0.0.1490

Note: This is a re-posting of an item first posted on October 1... the Windows versions of the EBF had a problem with the installer but they're back now.
To get the latest Express Bug Fix for SQL Anywhere 11, go to here, click on SQL Anywhere, and scroll down to one of these choices:
Platform: Linux x86/Linux x64
SQL Anywhere - Express Bug Fix, Update of 11.0.0 to build 1490 (all components) - 11.0 - 01 Oct 2008 - EBF/Patch

Platform: Windows x64
SQL Anywhere - Express Bug Fix, Update of 11.0.0 to build 1490 (all components except SQL Anywhere for Windows CE) - 11.0 - 10 Oct 2008 - EBF/Patch

Platform: Windows x86
SQL Anywhere - Express Bug Fix, Update of 11.0.0 to build 1490 (all components except SQL Anywhere for Windows CE) - 11.0 - 10 Oct 2008 - EBF/Patch
Kudos on the fully-descriptive file names; e.g., SA11_Full-CE_x64.1100_1490_EBF.exe. Heads up on one part of the name, though: "Full-CE" doesn't mean "Full download for Windows CE", it means "Everything EXCEPT Windows CE".

Also shipping for the first time: SQL Anyhwhere 11.0 for MAC OS X.

Thursday, October 9, 2008

No More Excuses...

...for not making regular database backups.



I'm used to seeing price drops, but once in a while one comes along that makes me sit up and take notice: $155 for a 1 TB (well, 1000 MB actually) 7200 rpm USB 2.0 external drive that transfers data up to 480 MB per second.

OK, ok, so you're not going to use it for TPC-C benchmarking, but for backing up your database? Perfect! Especially if you buy two or more for safety.

And now it's affordable:

$155 in the United States,

$170 in Canada.

Wednesday, October 8, 2008

"Sybase is donating its software"

Now here's something you don't see very often:

"Sybase is donating its software"
I hope the idea spreads!

Tuesday, October 7, 2008

Loading Empty Lines

From time to time, I need to load raw text data into a table, line by line, preserving the original text and the original order of lines. The LOAD TABLE command does an excellent job, as follows:

CREATE TABLE raw_text (
line_number BIGINT NOT NULL
DEFAULT AUTOINCREMENT
PRIMARY KEY CLUSTERED,
line_text LONG VARCHAR NOT NULL DEFAULT '' );

LOAD TABLE raw_text ( line_text )
FROM 'c:/temp/xxx.txt'
DEFAULTS ON
DELIMITED BY ''
ESCAPES OFF
HEXADECIMAL OFF
QUOTES OFF
STRIP OFF;

SELECT * FROM raw_text ORDER BY line_number;
When the input file looks like this...
first line

third line
...the SELECT displays this...



Now, I said that LOAD TABLE did an "excellent job" and it does... if the input file doesn't have any empty lines, or if you don't want to preserve empty lines. Because they are completely ignored.

If you do want to preserve empty input lines, there is a solution: include the ROW DELIMITED BY option like this:
LOAD TABLE raw_text ( line_text )
FROM 'c:/temp/xxx.txt'
DEFAULTS ON
DELIMITED BY ''
ESCAPES OFF
HEXADECIMAL OFF
QUOTES OFF
ROW DELIMITED BY '\x0d\x0a'
STRIP OFF;

SELECT * FROM raw_text ORDER BY line_number;
Now the SELECT shows all three lines, including the empty one:



The ROW DELIMITED BY option was added in SQL Anywhere version 10, another reason to upgrade... but if you're not using version 10 already, consider moving straight to version 11.

Monday, October 6, 2008

Divide and Conquer by Working Backwards

In previous postings here and here I have talked about solving large SQL query problems by splitting them into smaller problems that are easier to solve: divide and conquer.

Yeah, right... easier said than done... what if it isn't obvious how to split up the big problem into manageable pieces?

One way is to propose "a perfect world" in which the large problem has a simple solution, and work backwards from there. For a database query problem, that means starting with "a perfect schema" designed with one purpose in mind, to make it easy to solve the large problem, and then work backwards step-by-step to devise less-perfect schemas.

At each step of the way the less-perfect schema should meet two criteria: it should be relatively easy to transform the less-perfect schema to the previous more-perfect schema, and each less-perfect schema should be closer to reality... where reality is the ugly real-world schema that made the query hard to write in the first place. The problem is solved when you have a series of queries that transforms the real-world schema into the solution result set.

In other words, "divide and conquer" describes the end result whereas "working backwards" provides a possible path to that result. Here's an example:

Using a bank account database containing separate tables for opening balance, deposits, withdrawals and checks, display each entry together with a closing balance after the entry is applied. The entries are to be ordered by date, then within date by opening balance first followed by deposits, withdrawals and checks.
date        description                    amount  closing_balance
2008-02-01 FY009 Opening Balance 2329.54 2329.54
2008-02-19 DEP - insurance - adjustment 174.29 2503.83
2008-02-21 DEP - tax - refund 482.95 2986.78
2008-02-21 WD - cash - ATM -100.00 2886.78
2008-02-21 CHK 301 - Township - dogtags -40.00 2846.78
2008-02-21 CHK 300 - hydro - invoice -283.53 2563.25
2008-08-15 WD - cash - ATM -100.00 2463.25

Start With The Perfect Schema

Here's what "the perfect schema" might look like, one single table containing the opening balance and all the deposits, withdrawals and checks as individual rows:
CREATE TABLE solution ( 
transaction_number INTEGER NOT NULL,
"date" DATE NOT NULL,
description VARCHAR ( 200 ) NOT NULL,
amount NUMERIC ( 9, 2 ) NOT NULL,
closing_balance NUMERIC ( 9, 2 ) NOT NULL,
PRIMARY KEY ( transaction_number );
The transaction_order column specifies the order in which the closing_balance values are calculated as well as the sort order for the final query:
SELECT solution."date",
solution.description,
solution.amount,
solution.closing_balance
FROM solution
ORDER BY solution.transaction_number;

1st Step Back: Calculate Closing Balance

It should come as no surprise that the real-world schema looks nothing like the perfect solution table. In fact, there are four tables, not one, and there are no transaction_number or closing_balance columns:
CREATE TABLE opening_balance ( 
opening_balance_date DATE NOT NULL,
amount NUMERIC ( 9, 2 ) NOT NULL,
description VARCHAR ( 200 ) NOT NULL,
PRIMARY KEY ( opening_balance_date ) );

opening_balance_date amount description
-------------------- ------- ---------------------
2008-02-01 2329.54 FY009 Opening Balance

CREATE TABLE deposit (
deposit_number INTEGER NOT NULL DEFAULT AUTOINCREMENT,
deposit_date DATE NOT NULL,
paid_by VARCHAR ( 50 ) NOT NULL,
amount NUMERIC ( 9, 2 ) NOT NULL,
description VARCHAR ( 200 ) NOT NULL,
PRIMARY KEY ( deposit_number ) );

deposit_number deposit_date paid_by amount description
-------------- ------------ --------- ------ -----------
1 2008-02-19 insurance 174.29 adjustment
2 2008-02-21 tax 482.95 refund

CREATE TABLE withdrawal (
withdrawal_number INTEGER NOT NULL DEFAULT AUTOINCREMENT,
withdrawal_date DATE NOT NULL,
paid_to VARCHAR ( 50 ) NOT NULL,
amount NUMERIC ( 9, 2 ) NOT NULL,
description VARCHAR ( 200 ) NOT NULL,
PRIMARY KEY ( withdrawal_number ) );

withdrawal_number withdrawal_date paid_to amount description
----------------- --------------- ------- ------ -----------
1 2008-02-21 cash 100.00 ATM
2 2008-08-15 cash 100.00 ATM

CREATE TABLE "check" (
check_number INTEGER NOT NULL,
issue_date DATE NOT NULL,
pay_to VARCHAR ( 50 ) NOT NULL,
amount NUMERIC ( 9, 2 ) NOT NULL,
check_for VARCHAR ( 35 ) NOT NULL,
PRIMARY KEY ( check_number ) );

check_number issue_date pay_to amount check_for
------------ ---------- -------- ------ ---------
300 2008-02-21 hydro 283.53 invoice
301 2008-02-21 Township 40.00 dogtags
But, we're working backwards... let's pretend there is a single table that holds all the entries, and it does have a transaction_number column but not closing_balance. Here's how it can be used to calculate closing_balance:
CREATE TABLE transaction ( 
transaction_number INTEGER NOT NULL,
"date" DATE NOT NULL,
description VARCHAR ( 200 ) NOT NULL,
amount NUMERIC ( 9, 2 ) NOT NULL,
PRIMARY KEY ( transaction_number );

CREATE VIEW solution AS
SELECT transaction.transaction_number,
transaction."date",
transaction.description,
transaction.amount,
SUM ( transaction.amount )
OVER balance_window AS closing_balance
FROM transaction
WINDOW balance_window AS (
ORDER BY transaction.transaction_number
RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW );
The transaction table is the "less-perfect schema", one step closer to reality (it doesn't have the balance column). The solution table has been replaced by a view which performs the transformation to the perfect schema as follows:
  • A OLAP WINDOW clause is used to define a "balance_window" for each row in the result set.

  • The balance_window is cumulative: it includes the current row as well as all the preceding rows: BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

  • The transaction_number column controls the ordering of rows in the balance_window.

  • The aggregate expression SUM ( transaction.amount ) OVER balance_window calculates the closing_balance value for each row.
Tip for English Speakers: The phrase "UNBOUNDED PRECEDING" may be translated from Pointy-Haired-ANSI-Speak to English as follows: "first row". In other words, the BETWEEN clause really means BETWEEN FIRST ROW AND CURRENT ROW. Similarly, "UNBOUNDED FOLLOWING" really means "last row".

2nd Step Back: Calculate Transaction Number

The next step backwards is to propose an even-less-perfect table raw_transaction that doesn't have either closing_balance or transaction_number, but does include a transaction_type column that differentiates between opening_balance, deposit, withdrawal and check as sort_order = '1', '2', '3' and '4':
CREATE TABLE raw_transaction ( 
"date" DATE NOT NULL,
sort_order VARCHAR ( 1 ) NOT NULL,
description VARCHAR ( 200 ) NOT NULL,
amount NUMERIC ( 9, 2 ) NOT NULL,
PRIMARY KEY ( "date",
sort_order,
description,
amount );
The raw_transaction table may be transformed into the transaction view (formerly the transaction table) by using the OLAP ROW_NUMBER() aggregate function to calculate the transaction_number column:
CREATE VIEW transaction (
transaction_number,
"date",
description,
amount )
AS
SELECT ROW_NUMBER() OVER (
ORDER BY
raw_transaction."date",
raw_transaction.sort_order )
AS transaction_number,
raw_transaction."date",
raw_transaction.description,
raw_transaction.amount
FROM raw_transaction;
The OVER ( ORDER BY ... ) clause tells ROW_NUMBER() what row order to use when doing the numbering. Astute observers will note that date and sort_order aren't fully deterministic because more than one row can have the same values, but that doesn't matter because the original specs weren't fully deterministic either. In other words, if there are two withdrawals on the same day it doesn't matter which order they appear in, only that they appear together, after all the deposits and before all the checks on that day.

3rd Step Back: Gather The Raw Transactions

The final step backwards replaces the raw_transaction table with a view that uses a UNION of four selects that pound the individual tables into one lowest-common-denominator representation of the data.

Here is that view, plus the other three queries that together form the fully set oriented "divide and conquer" solution:
Correction:The first two columns in the CREATE VIEW column list appeared in the wrong order when this article was first posted. The error might have been caught if the actual output had been included (it is now, down at the bottom) instead of just the original "specs".
CREATE VIEW raw_transaction (
"date",
sort_order,
description,
amount )
AS
SELECT opening_balance.opening_balance_date,
'1',
opening_balance.description,
opening_balance.amount
FROM opening_balance
UNION ALL
SELECT deposit.deposit_date,
'2',
STRING (
'DEP - ',
deposit.paid_by,
' - ',
deposit.description ),
deposit.amount
FROM deposit
UNION ALL
SELECT withdrawal.withdrawal_date,
'3',
STRING (
'WD - ',
withdrawal.paid_to,
' - ',
withdrawal.description ),
-withdrawal.amount
FROM withdrawal
UNION ALL
SELECT "check".issue_date,
'4',
STRING (
'CHK ',
"check".check_number,
' - ',
"check".pay_to,
' - ',
"check".check_for),
-"check".amount
FROM "check";

CREATE VIEW transaction (
transaction_number,
"date",
description,
amount )
AS
SELECT ROW_NUMBER() OVER (
ORDER BY
raw_transaction."date",
raw_transaction.sort_order )
AS transaction_number,
raw_transaction."date",
raw_transaction.description,
raw_transaction.amount
FROM raw_transaction;

CREATE VIEW solution AS
SELECT transaction.transaction_number,
transaction."date",
transaction.description,
transaction.amount,
SUM ( transaction.amount )
OVER balance_window AS closing_balance
FROM transaction
WINDOW balance_window AS (
ORDER BY transaction.transaction_number
RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW );

SELECT solution."date",
solution.description,
solution.amount,
solution.closing_balance
FROM solution
ORDER BY solution.transaction_number;
The final output shows the two checks in a different order from the original example, but that's OK because the specs didn't say how they should be sorted:
date        description                    amount  closing_balance
2008-02-01 FY009 Opening Balance 2329.54 2329.54
2008-02-19 DEP - insurance - adjustment 174.29 2503.83
2008-02-21 DEP - tax - refund 482.95 2986.78
2008-02-21 WD - cash - ATM -100.00 2886.78
2008-02-21 CHK 300 - hydro - invoice -283.53 2603.25
2008-02-21 CHK 301 - Township - dogtags -40.00 2563.25
2008-08-15 WD - cash - ATM -100.00 2463.25

Saturday, October 4, 2008

SQL Anywhere Monitor Beta

The SQL Anywhere Monitor is an add-on to SQL Anywhere 11:

"... a browser-based administration tool that provides you with information about the health and availability of SQL Anywhere databases and MobiLink servers. The SQL Anywhere Monitor collects metrics and performance data from databases and MobiLink servers running on other computers, while a separate computer accesses the SQL Anywhere Monitor via a web browser."
Here's where you can download a beta copy:
http://www.sybase.com/panoramabeta

Monday, September 29, 2008

In Praise Of Knobs

Glenn Paulley is one of politest people I've ever met... I've never known him to explode in anger, and this May 6, 2008 post is probably the closest we'll ever see to a Glenn Paulley Rant:

Additional knobs are NOT the answer!
Here's what Glenn said about manual settings and parameters affecting database performance, also known as knobs:
"Knobs" are expedient, and they can be used as "quick fixes" to workaround performance problems in shipping code. But avoiding the introduction of additional knobs is a worthwhile goal because "knobs" are not in the best interests of our customers.

In my view, adding additional performance knobs of the types described above is simply counterproductive: the answer in the longer term is self-managing database systems.
In summary, Glenn is clearly saying self-management is good and knobs are bad. It might be unfair, but it's easy to infer that Glenn is also claiming knobs are the antithesis of self-management: if your software is self-managing, you won't have any knobs... or at least if you increase your software's level of self-management, you'll have fewer knobs.

One month later, on June 4, 2008 Glenn returned to the topic in this post: What self-management is (and isn’t). This post doesn't mention knobs at all, but sticks to the point "self-management is good".

And he's absolutely right on that point: self-management is good. I also agree that increasing the level of self-management in SQL Anywhere is a worthwhile goal. And I'll go further: progress has been excellent, so far, in making SQL Anywhere self-managing.

But... here's where my rant starts... knobs aren't bad... at least, they're not necessarily bad. In fact, every new self-management feature or improvement added to SQL Anywhere requires at least one new knob: a switch or setting that turns the feature off and returns manual control. This is a historical fact, and it's going to continue to be true, otherwise there's going to be big trouble ahead.

Here's an example from the aviation world: The B-2 bomber is an excellent example of a self-managing system... if it wasn't for the automatic computer controls the thing wouldn't even fly, no human being could manage it.

Back on Febrary 23, 2008 however, the pilots of this B-2 probably wished they had a few more knobs and a bit less "automatic computer control"... it was the computer system that sent it into a stall.

Knobs are not a panacea, but sometimes they are critical, especially for functions that are self-managed. In fact, each new self-management feature will probably require several new knobs, not just one to turn the feature off, but other new knobs to control the behavior of the new feature.

Here's an example from SQL Anywhere, where the story of database cache self-management can be told in the form of...
A Brief History of Knobs

Back in Versions 5.5 and 6 there was no cache self-management, and SQL Anywhere had only one knob affecting the cache:
   -c - Set the cache size
Version 7 added Dynamic Cache Sizing. It also added 3 new knobs for controlling the cache size and one new knob to display it:
   -ca 0    - Disable dynamic cache sizing
-ch size - Set the upper limit
-cl size - Set the lower limit
-cs - Display cache usage in console window
Version 8 added Address Windowing Extensions and a new knob to turn it on and off:
   -cw - Enable use of AWE on Windows.
Version 9 added Cache Warming, plus another three new knobs:
   -cc {+|-} - Collect information for cache warming  
-cr {+|-} - Warm the cache with database pages
-cv {+|-} - Controls messages about cache warming
Version 10 brought the total number to knobs to ten:
   -cm size - Specifies the size of the AWE cache 
For many years, one of those ten knobs was critical to the performance of cache self-management: For versions 7, 8 and 9, if you wanted your database cache to automatically grow past 256M, you had to use the -ch knob to specify a larger value... otherwise, it didn't matter if you had 2G of RAM installed, your cache would stop growing at 256M.

So here's my point to Glenn: Don't go knocking those knobs! You'll be adding lots more as you increase the level of self-management, whether you like it or not. And they'll be valuable and helpful.

Sunday, September 28, 2008

IvanAnywhere On Space TV

Scroll to the one-minute-point in this video to see a short clip on IvanAnywhere:

Friday, September 26, 2008

V11 EBF Coming Soon

The first Express Bug Fix (EBF) for SQL Anywhere Version 11 is coming soon... not yet, but soon, according to Jason Hinsperger writing in the SQL Anywhere "General" newsgroup.

That might be good news for some people, but personally I'm not waiting for any particular fixes... in my opinion Version 11 has been the cleanest General Availability (GA) release in the history of SQL Anywhere. My theory about the reason? All the extra testing getting ready for the TPC-C benchmark.

I wish I could give you a direct link for downloading the latest SQL Anywhere EBF, but the best I can do is tell you to start here, click on SQL Anywhere, and start scrolling... but not yet... soon.

Thursday, September 25, 2008

"There can be only one!"

Here's what Google says about it's Blog Search service:

"... when there are entire blogs that seem to be a good match for your query, these will appear in a short list just above the main search results."
Here's what it displays when you search on "SQL Anywhere":



OK, OK, enough ego-surfing... the real news is that there are over 256,000 hits on "SQL Anywhere". Maybe folks are discovering that SQL Anywhere scored 5th in the Top Ten TPC-C by Performance list.

To do your own blog searches, go to blogsearch.google.com.

Tuesday, September 9, 2008

URL Forwarding Via Reverse Proxy

For security reasons, many internet sites only allow access via certain ports, such as port 80 for ordinary HTTP and 443 for secure HTTPS.

For the official list of port number see www.iana.org/assignments/port-numbers.
That causes a problem when you want to have two or more web servers on the same IP address. For example, suppose you want all HTTP traffic addressed to your domain name xyz.com to go to your main web server, EXCEPT for traffic addressed to xyz.com/special which should be sent to a different web server.

That was exactly my problem; here are the details:
  • All traffic addressed to risingroad.com, regardless of port number, is sent to the IP address 64.7.134.118.

  • A Linksys router listens to all the traffic on 64.7.134.118.

  • The router forwards all traffic on port 80 to the local IP address 192.168.1.51. In this context, the router is acting like a firewall, and that firewall has port 80 "opened up" for access to a computer running inside the firewall. Some other port numbers are open as well (email, etc) but there are no other open ports available for HTTP traffic.

  • A Microsoft IIS web server is running on a Windows 2003 server at local IP address 192.168.1.51. Its purpose it to process regular HTTP traffic sent to risingroad.com.

  • A SQL Anywhere web server is running on a Windows Vista 64-bit computer at local IP address 192.168.1.52. Its purpose is to process HTTP traffic sent to the specific address risingroad.com/enwiki_search.
The following simple approach comes to mind: Open up a different port, say port 81, and have the router send port 80 traffic to 192.168.1.51 and port 81 traffic to 192.168.1.52. Other companies might not be willing to do that, but I am; why won't that work?

It won't work because other companies don't just block incoming HTTP traffic on any port other than 80, they also block outbound traffic. That means anyone using a browser inside those other companies can't get out to sites that are running HTTP servers on port 81... or even 8080 which is listed as an "HTTP Alternate".

The solution is to use a "reverse proxy" program to forward or redirect incoming HTTP traffic to the appropriate destination. I chose the IQ Reverse Proxy from Fastream Technologies.
If you find the term "reverse proxy" confusing, you're not alone. The word "proxy" might apply since the program is "acting for" HTTP servers at other locations. However, the word "reverse" just serves to obfuscate, especially since the term "port forwarding" applies to traffic going in the same direction as reverse proxy traffic. It may be simpler to think of "reverse proxy" as an "URL forwarder" or "IP redirector".
Here's how the URL forwarding works:

1. The Linksys router forwards port 80 to 192.168.1.51; ports 8080 and 81 are not open to the outside world:



2. The Fastream IQ reverse proxy program runs on 192.168.1.51 and listens to port 80:



3. The Microsoft IIS HTTP server also runs on 192.168.1.51, but it now listens to port 8080:



4. The SQL Anywhere HTTP server running on 192.168.1.52 is configured to listen to port 81:



5. The two target HTTP servers (IIS and SQL Anywhere) are defined to the Fastream IQ reverse proxy program in terms of their IP addresses and port numbers:
192.168.1.51   8080  - IIS
192.168.1.52 81 - SQL Anywhere


6. Special URL forwarding rules are defined to the Fastream IQ reverse proxy program as follows: one particular URL goes to SQL Anywhere and everything else goes to IIS:
*.//risingroad.com/enwiki_search  - goes to 192.168.1.52:81
*.//Default - goes to 192.168.1.51:8080


Tip: It may be possible to have Fastream handle URLs with GET-style parameters properly, but I couldn't figure it out; I had to use strict POST method processing for all web service parameters. That means no visible "http:\\x.y.com?a=...&b=..." parameters on the URLs, just the hidden POST parameters.

Wednesday, September 3, 2008

Loading Wikipedia

This article talks about loading 264 million lines of Wikipedia XML into a SQL Anywhere table, parsing and gathering this text into 7.3 million blobs in another table, and then building a full text index on two columns in that table: the title of the Wikipedia entry and the blob containing the article.

The final result can be seen using a simple Google-style interfacebased on a SQL Anywhere 11 web service.

Note: This demonstration is no longer available on the internet.


Don't expect Google-like speeds, however, or even the same results... SQL Anywhere's index-based full text search facility is not exactly the same thing as the Google search engine.

The purpose to this exercise is two-fold: To demonstrate the new full text search facility on a non-trivial data set, and to continue exploring ways to use SQL Anywhere to solve programming problems inside the database server. After all, the name of this blog is "SQL Anywhere" so you're not going to see articles about Java or C or Perl or PHP any time soon (unless, of course, it's about writing SQL Anywhere stored procedures in Perl or one of the other languages supported in Version 11 :)

Step 1: Download The Wikipedia File.

Go to this page and download this 3.9G compressed file: enwiki-latest-pages-articles.xml.bz2.

Time to download: Hours and hours, especially when the new FireFox 3 download manager unexpectedly paused the download when I wasn't looking... grrrr!

Step 2: Unzip The Wikipedia File.



Time to unzip: 20m

Here's what the text looks like:



The two interesting elements are the <title> nested within each <page>, plus the <text> element nested within the page <revision> element.

Step 3. Create The Database.

Here are the commands used to initialize the database and start the 64-bit version of dbsrv11.exe on a Core Quad processor with 4G of RAM and 4 hard drives running 64-bit Windows Vista (the lines have been wrapped to fit on screen):
"%SQLANY11%\bin64\dbinit.exe" -p 8192 -s 
-t e:\data\enwiki\enwiki.log
c:\data\enwiki\enwiki.db

"%SQLANY11%\bin64\dbspawn.exe"
-f "%SQLANY11%\bin64\dbsrv11.exe"
@$dbsrv11_enwiki_http_options.txt
To improve I/O performance all the busy files were placed on different physical drives in a non-RAID configuration:
Database:        c:\data\enwiki\enwiki.db
Transaction log: e:\data\enwiki\enwiki.log
Temporary file: f:\data\enwiki\sqla*.tmp
Input text: g:\data\enwiki\enwiki-latest-pages-articles.xml
The dbinit -t option is used to specify the transaction log location, and the dbsrv11 -dt option is used (see below) to specify where to put the sqla*.tmp temporary files.

Here are the dbsrv11 options contained in the configuration file $dbsrv11_enwiki_http_options.txt (the -xs line has been wrapped to fit on screen):
-c 3G 
-ch 3G
-dt f:\data\enwiki
-gn 10
-o dbsrv11_log_enwiki.txt
-os 10M
-x tcpip
-xs "http(log=dbsrv11_log_inbound_http_enwiki.txt;
lf=@T - @I - @U - @R - @L - @W - @E;
lsize=1M;dbn=enwiki;port=81;
maxsize=1M;to=600;kto=600)"
-ze
-zl
-zp
-zt
c:\data\enwiki\enwiki.db
The -c and -ch options were used to fix the database cache size at 3G.

The -gn 10 option was used temporarily to optimize server performance for a single busy connection: the dbisql session used to load the Wikipedia data. An even smaller number might have helped more, but there's a rule of thumb that says "Don't set gn lower than 10 if you know what's good for you!"

The -o and -os options specify where to write the database console log file, and when to rename and restart it. Except perhaps for embedded databases and handheld devices, every SQL Anywhere server in the world should specify these two parameters to help diagnose runtime problems.

The -x option speeds server startup by specifying that only tcpip is necessary.

The -xs option tells dbsrv11 to start the builtin HTTP server and how to configure it. The log parameter specifies where to write the HTTP log records, lf specifies what to include in the log, and lsize specifies when to rename and restart it. The dbn parameter must match the database name that contains the web services to be used, and the port parameter specifies what port to listen to for HTTP requests. The maxsize parameter specifies the largest HTTP request that will be requested, and the to and kto parameters are used to increase the HTTP timeout and keep-alive durations.

The -ze, -zl, -zp and -zt options all serve to increase the level of diagnostic tracing performed by the server.

To further increase I/O performance by eliminating disk defragmentation in the database file, Diskeeper 2008 was run after the following command was run to increase the database file size:
ALTER DBSPACE ADD 50G;
Time to ALTER DBSPACE ADD 50G: 11m 17s.

Tip: If you have Diskeeper 2008 "Automatic Defragmentation" enabled, you might want to disable it while the Wikipedia loading processes are running. Diskeeper isn't going to cause any errors, but it will happily pound away on the same drives you're trying to use... even if it's working on a different drive, it still might be using the same controller or other resources. You can set a schedule inside Diskeeper to enable and disable automatic defragmention:



Step 4: Load The Text.

The enwiki_text table stores each line of text as a LONG VARCHAR column, and the line_number column is automatically incremented to preserve the input order. This automatic ordering is achieved by using three features: the DEFAULT AUTOINCREMENT clause in the CREATE TABLE statement, and two clauses in the LOAD TABLE statement: First, the column list "( line_text )" is coded to specifically exclude the line_number column so SQL Anywhere won't expect any values to be present in the input file. Second, the DEFAULTS ON clause instructs LOAD TABLE to actually calculate the DEFAULT AUTOINCREMENT value for each new line_number; by default, LOAD TABLE doesn't apply defaults.
CREATE TABLE enwiki_text (
line_number BIGINT NOT NULL DEFAULT AUTOINCREMENT
PRIMARY KEY CLUSTERED,
line_text LONG VARCHAR NOT NULL DEFAULT '' );

LOAD TABLE enwiki_text ( line_text )
FROM 'c:/data/enwiki/enwiki-latest-pages-articles.xml'
DEFAULTS ON
DELIMITED BY ''
ESCAPES OFF
HEXADECIMAL OFF
QUOTES OFF
STRIP OFF;
The other clauses of the LOAD TABLE tell it to accept the input text as-is: don't look for any inter-field delimiters, don't expect any special escape characters or hexadecimal notation or quotes, and don't strip any blanks. This is a standard "what you see is what you get" LOAD TABLE command, very useful for loading raw text without any edits.

Time for LOAD TABLE enwiki_text: 1h 48m.

Tip: If you don't have four separate disk drives for the four busy files mentioned in Step 3 above, you can put the input text file on the same drive as the database file. However, try to put the temporary file on a separate drive from both the input text file and the database file. The LOAD TABLE process will perform heavy I/O on the temporary file and input text file simultaneously during its first phase of processing, and during its second phase it will work on the temporary file and database file simultaneously. Here's a snapshot of the Windows Vista Resource Monitor during the second phase of LOAD TABLE processing:



Step 5: Find The Pages.

The Wikipedia text consists of a <page> ... </page> element for each one of the 7.3 million pages, preceded and followed by a few other tags that we're not interested in.

Parsing this XML text is not a complex job for SQL Anywhere's OPENXML procedure, but it's a BIG job... too big to do in one statement.

So, we're going to divide and conquer: First, find the first and last lines of each <page> element, and later, pass each <page> element through the OPENXML parser. The enwiki_text_xref table is used to store the first and last line numbers:
CREATE TABLE enwiki_text_xref ( 
page_number BIGINT NOT NULL DEFAULT AUTOINCREMENT
PRIMARY KEY CLUSTERED,
from_line_number BIGINT NOT NULL,
to_line_number BIGINT NULL );
CONSTRAINT fk_from
NOT NULL FOREIGN KEY ( from_line_number )
REFERENCES enwiki_text ( line_number ),
CONSTRAINT fk_to
FOREIGN KEY ( to_line_number )
REFERENCES enwiki_text ( line_number ) );
Here's how the from_line_number column is filled in for each page:
INSERT enwiki_text_xref ( from_line_number )
SELECT enwiki_text.line_number
FROM enwiki_text
WHERE enwiki_text.line_text LIKE '%<page>%'
ORDER BY enwiki_text.line_number;
Yes, it's a full-table scan using LIKE '%<page>%', but it's just done once, and it's fast:



Time for INSERT enwiki_text_xref: 38m 28.9s.

The second column is a bit trickier: Each row's to_line_number is equal to the next row's from_line_number, minus one. How do you code that in SQL, without an old-fashioned FETCH loop that looks at every single row?

The answer is to use the LAST_VALUE() function introduced in SQL Anywhere 10.0.1:
CREATE VIEW to_line_numbering AS
SELECT page_number,
( LAST_VALUE ( from_line_number )
OVER line_number_window ) - 1
AS to_line_number
FROM enwiki_text_xref
WINDOW line_number_window AS (
ORDER BY page_number
RANGE BETWEEN CURRENT ROW
AND 1 FOLLOWING );

UPDATE enwiki_text_xref
INNER JOIN to_line_numbering
ON to_line_numbering.page_number
= enwiki_text_xref.page_number
SET enwiki_text_xref.to_line_number
= to_line_numbering.to_line_number;
The magic is in the CREATE VIEW, not the UPDATE; the UPDATE simply trusts the view to properly calculate to_line_number.

The SELECT in the CREATE VIEW works as follows: The WINDOW clause names and defines a moving set of rows (a window) for each row in the candidate result set. For each row in enwiki_text_xref, the window contains the CURRENT ROW and the next row (1 FOLLOWING). The ORDER BY clause determines how the rows are ordered for this purpose, and the RANGE clause specifies how the first and last rows in the window are identified.

The LAST VALUE aggregate functions asks for the last value of the from_line_number column, taken from the WINDOW named in the OVER clause. In English, this is the from_line_number column from the next row... that's what the WINDOW contains, this row and the next one, and therefore the "LAST" value from the WINDOW comes from the next row.

The result is reduced by 1 to give the to_line_number of the current row, and that's what the view returns to the UPDATE.

Time for UPDATE #1 enwiki_text_xref: 15m 35.4s.

You can read about OLAP windows in the Help, and in Glenn Paulley's white paper. You can also see another example of LAST_VALUE() here.

Astute readers will notice that the view doesn't return a proper value of to_line_number for the last row in enwiki_text_xref... it's the last row, so there is no "next row" to use to calculate the LAST VALUE(). Here's the code to finish things up:
BEGIN
DECLARE @last_page_line_number BIGINT;

SELECT TOP 1 enwiki_text.line_number
INTO @last_page_line_number
FROM enwiki_text
WHERE enwiki_text.line_text LIKE '%</page>%'
ORDER BY enwiki_text.line_number DESC;

UPDATE TOP 1
enwiki_text_xref
SET enwiki_text_xref.to_line_number
= @last_page_line_number
ORDER BY page_number DESC;

END;
Time for UPDATE #2 enwiki_text_xref: 0s (yes, zero, even with the LIKE '%</page>%' predicate).

Not only is the additional processing time for filling enwiki_text_xref table small, but so is the table: 390M for enwiki_text_xref versus 22G for enwiki_text.

Step 6: Parse The Pages.

Here's what the final Wikipedia table looks like, with the important columns being page_title and page_text:
CREATE TABLE enwiki_entry ( 
page_number BIGINT NOT NULL
PRIMARY KEY CLUSTERED,
from_line_number BIGINT NOT NULL,
to_line_number BIGINT NOT NULL,
page_title VARCHAR ( 1000 ) NOT NULL,
page_text LONG VARCHAR NOT NULL,
CONSTRAINT fk_page
NOT NULL FOREIGN KEY ( page_number )
REFERENCES enwiki_text_xref ( page_number ),
CONSTRAINT fk_from
NOT NULL FOREIGN KEY ( from_line_number )
REFERENCES enwiki_text ( line_number ),
CONSTRAINT fk_to
NOT NULL FOREIGN KEY ( to_line_number )
REFERENCES enwiki_text ( line_number ) );
Once again, here's what the input text looks like:



Data for the enwiki_entry.page_title column comes from the <title> element nested within each <page>, and the page_text column is filled from the <text> element nested within each page's <revision> element.

Here's the code that fills enwiki_entry as follows:
  • A cursor FOR loop is used to process one page at a time, rather than all 7.3 million pages at once.

  • The cursor SELECT uses the enwiki_text_xref table to find the @from and @to rows in enwiki_text for each <page> element.

  • The SELECT LIST() statement gathers the XML text for each page into the LONG VARCHAR variable @page_xml.

  • The INSERT enwiki_entry statement calls OPENXML() to fill the page_title and page_text columns; for a detailed discussion of parsing XML with OPENXML() see OPENXML() Rocks!

  • A COMMIT is issued for every 10,000 pages processed.
BEGIN
DECLARE @page_counter BIGINT;
DECLARE @page_xml LONG VARCHAR;

SET @page_counter = 0;

FOR f_fetch AS c_fetch NO SCROLL CURSOR FOR
SELECT enwiki_text_xref.page_number AS @page,
enwiki_text_xref.from_line_number AS @from,
enwiki_text_xref.to_line_number AS @to
FROM enwiki_text_xref
ORDER BY enwiki_text_xref.page_number
FOR READ ONLY
DO
SELECT LIST (
enwiki_text.line_text,
'\x0d\x0a'
ORDER BY enwiki_text.line_number ) AS @page_xml
INTO @page_xml
FROM enwiki_text
WHERE enwiki_text.line_number
BETWEEN @from
AND @to;

SET @page_counter = @page_counter + 1;

INSERT enwiki_entry (
page_number,
from_line_number,
to_line_number,
page_title,
page_text )
SELECT @page,
@from,
@to,
TRIM ( COALESCE ( page_title, '???' ) ),
TRIM ( COALESCE ( page_text, '???' ) )
FROM OPENXML (
@page_xml,
'/page' )
WITH (
page_title VARCHAR ( 1000 ) 'title',
page_text LONG VARCHAR 'revision/text' );

IF MOD ( @page_counter, 10000 ) = 0 THEN
COMMIT;
END IF;
END FOR;
COMMIT;
END;
Time to INSERT enwiki_entry: 3h 55m 6s

Tip: If you don't have four separate disk drives for the four busy files mentioned in Step 3 earlier, at least try to put the transaction log on a separate drive from the database file. Here's a snapshot of the Windows Vista Resource Monitor showing how both files are heavily used during the INSERT enwiki_entry process:



Step 7: Index The Pages.

For big, fairly static data sets like snapshot copies of Wikipedia, it's probably most convenient to take full control over the creation of full text indexes. One way to do this is to initially create an empty index using the MANUAL REFRESH clause of CREATE TEXT INDEX, and then later build the index via REFRESH TEXT INDEX.

Here's the code for a full text index on enwiki_entry:
CREATE TEXT INDEX tx_page_text 
ON enwiki_entry ( page_title, page_text )
MANUAL REFRESH;

REFRESH TEXT INDEX tx_page_text
ON enwiki_entry
WITH EXCLUSIVE MODE
FORCE BUILD;
Time to REFRESH TEXT INDEX tx_page_text: 2h 14m 47s

Tip: When designing a full text index, don't forget to include critical columns, such as "title" or "description" columns that might accompany large document columns in the same table. You can't ALTER a text index to include columns you've forgotten, but you can exclude columns you don't need when doing later searches.

Step 8: Test The Index.

Here's a sample query that uses the full text index to find Wikipedia pages containing both 'Wayne' and 'Gretzky'. The result set is sorted in decreasing order by the full text score, and the second group of 100 rows are displayed:
SELECT TOP 100
START AT 101
ROW_NUMBER()
OVER ( ORDER BY score DESC ) AS entry_number,
ROUND ( score, 1 ) AS score,
enwiki_entry.page_title AS title,
LEFT ( enwiki_entry.page_text, 500 ) AS excerpt
FROM enwiki_entry
CONTAINS ( enwiki_entry.page_title,
enwiki_entry.page_text,
'Wayne Gretzky' )
WHERE enwiki_entry.page_text NOT LIKE '#REDIRECT%'
ORDER BY entry_number;

The WHERE clause eliminates all content-free Wikipedia "redirection" pages. The new ROW_NUMBER() window function is used instead of the old NUMBER(*) function so that clauses like TOP 100 START AT 101 will show entry_number values 101, 102, 103, ... instead of 1, 2, 3, ...

Friday, August 29, 2008

What Does "Enterprise Database" Mean?

Back in February I answered this question in "SQL Anywhere vs ASE":

Q: Is SQL Anywhere an enterprise database?

A: For 90% of the world's enterprises the answer is "most definitely yes!" SQL Anywhere is much easier to administer and develop for than other products, and advances in computer technology mean that performance is no longer the serious concern it used to be: maintenance and development costs are. You can buy a 3Ghz computer with 2G of RAM and a 500G disk drive for less than U$1000, and that's enough power to run an enterprise database in many companies.

For the other 10% of the world's enterprises, performance problems still require huge sums of money to be spent on equipment, software and especially support. New features in SQL Anywhere are pushing the envelope, however; for example, version 10 introduces intra-query parallelism and hot failover using database mirroring.

SQL Anywhere is now being marketed as an enterprise database solution, as well as a solution for mid-level and embedded applications. Millions of rows, thousands of busy users, hundreds of gigabytes, no problem.
Here's the real question:
Exactly what is an "enterprise-level" database?
For OLTP systems, the answer might be a simple measurement:
The size of the workload that can be successfully handled by the database, expressed as the the number of simultaneous busy users, where databases handling workloads above that can be called "enterprise" and those below that are called something else, like department-level.
What do you think that threshold is? Here's my answer:
The threshold for "enterprise database" is a number in the range of 100 to 10,000 simultaneous busy users. Anything larger than 10,000 and you're excluding most of the world's enterprises. Anything smaller than 100 and you're not acknowledging that large systems really are qualitatively different.
Let's consider some government statistics... in paticular, let's look at the US Census Bureau figures for 2005 showing the number of companies with...
fewer than 20    20-99     100-499     500+
employees employees employees employees
------------- --------- --------- ---------
5,357,887 520,897 87,285 17,477
Let's leave out all the small companies, and only look at those with 100 or more employees:
83% of companies with 100 or more employees have fewer than 500 employees.
Let's have another look at those big companies, the ones that have 500 or more employees: they actually employ over 57 million people, but the average is only 3,300 employees per company.

Here's another list: Fortune 500 2006 Top Companies by Employees shows that the top 50 employee counts range from 1.9 million for first place (Walmart) down to 122,200 for 50th place (Northrop Grumman).

Now, "number of busy users" is not the same as "number of employees". Sometimes, a company's database will have more busy users than they have employees; e.g., an airline reservation system has many busy users outside the organization. On the other hand, many internal enterprise databases are used by a small minority of the company's employees, and an even smaller number are busy at the same time.

But the real problem is this: You're not going to find any public ranking of enterprises by numbers of database users.

So you're going to have to take a guess at the threshold for "enterprise database", and I'm going with the range of 100 to 10,000 busy users. For many systems, that corresponds to a range of 1,000 to 100,000 actual users since not all users are busy all the time.

Friday, August 8, 2008

Techwave 2008 Notes

It's the last day of Techwave 2008; here are some random notes...

The free SQL Anywhere 11 Web Edition has a new FAQ here.

So far, Eric Farrar's presentation "Offline Wikipedia with SQL Anywhere" has been my favorite, not just because of the application but because of the many problems he solved and the tips and techniques he included. Like shrinking a database by copying tables and swapping dbspaces instead of running unload/reload.

I learned that using -gn 1 or 2 IS a good idea if you want to dedicate a SQL Anywhere database engine to running a single process... like building a full text index on a huge table.

Lotsa new buzzwords floating around: zero touch, bare metal, management stack, granular control, tasked with, enable credibility, platform innovator, snap in, conduit DMZ...

I want to write an article "Top 10 Reasons Full Text Search Is Cool". The number one reason is probably going to be "It's built right in to the SQL Anywhere query engine"... followed by "It's not using Verity" in case anyone didn't get the first point.

IvanAnywhere didn't make it to the conference. His videos were playing, though... a kind of remote telepresence for a remote telepresence.

More to come, eventually...

Thursday, August 7, 2008

Comparing Database Schemas

[ An improved version of this article may be found here. It shows how to use this technique on all database versions 5.5 through 11. ]



Question: How do I compare the schemas of two SQL Anywhere databases?

Answer, from Bruce Hay at Sybase iAnywhere: Check out the new "-no" option for dbunload in SQL Anywhere 11 (unload objects in name order, no data)? It's intended for this purpose.



Here's how it works:
  • Step 1: Run dbunload against the first SQL Anywhere database to create a text file containing CREATE TABLE and other schema definitions. The new -no option tells dbunload to unload just the schema, not the data, and do it in name order so it will be easier to compare with another text file sorted in the same order.

  • Step 2: Run dbunload again, on the second SQL Anywhere database, to produce another text file.

    These dbunload steps only work with the SQL Anywhere Version 11 copy of dbunload.exe, and they only work on Version 10 and 11 databases running on the Version 11 database engine.

  • Step 3: Run a text comparison utility like Compare It! to show the differences between the two text files.
Here's an example of Windows commands that compare the Version 10 and 11 demo databases:
"%SQLANY11%\Bin32\dbunload.exe" 
-c "ENG=demo10;DBN=demo;UID=dba;PWD=sql"
-no -r c:\temp\demo10.sql

"%SQLANY11%\Bin32\dbunload.exe"
-c "ENG=demo11;DBN=demo;UID=dba;PWD=sql"
-no -r c:\temp\demo11.sql

"C:\Program Files\Compare It!\wincmp3.exe"
c:\temp\demo10.sql c:\temp\demo11.sql
Here's what Compare It! shows for one difference:



Compare It! can also produce a report in HTML format:



Here's why using dbunload is a good approach:
  • The dbunload utility doesn't miss anything; it unloads every single schema object in the database... tables, views, procedures, triggers, everything.

  • The dbunload utility is always going to be more up-to-date than any other software except the SQL Anywhere engine itself. In other words, even PowerDesigner falls behind dbunload in terms of supporting SQL Anywhere syntax for new features.

  • Running dbunload on actual databases guarantees you will be comparing the current database schemas, not out-of-date information stored is separate model files.

  • The text file comparison view presents differences in terms of actual database syntax, not some funky abstract logical view used by data modelling tools.

Wednesday, August 6, 2008

SQL Anywhere Is Free!

The free "Web Edition" of SQL Anywhere 11 is a pretty good deal:

  • no charges

  • no limits on server functionality

  • no limits on the number of users

  • MobiLink is included, and so is UltraLite
It's called the "Web Edition" for a reason, however: all end user interactions must originate from a web browser. In other words, it's not a "free client server edition". And it's not a "free embedded OEM edition" either.

You can download the Web Edition from here.

When you install it, you will see not one, but two license agreements, the second one containing all the fine print specific to the Web Edition. Section 3 contains the interesting bits, and I've pasted it below so you can see what you're getting in to.
3. Use of Program.

3.1 Web Server Database. Each instance of this SQL Anywhere Web Edition permits you to access the SQL Anywhere Server component from an unlimited number of Seats, and may include internal users within the Customer's organization and external users outside the Customer's organization, provided that (a) except as specifically authorized below, all access, direct and indirect, to the SQL Anywhere Server originates from a Web Browser; (b) the primary purpose of the SQL Anywhere Server is to support interactive Web Browser applications; and (c) the SQL Anywhere Server is installed on only one Web Server.

3.2 Synchronization. Each instance of this SQL Anywhere Web Edition permits you to install and use the MobiLink Synchronization Server component, provided that (a) the MobiLink Synchronization Server consolidated database is the SQL Anywhere Server described in Section 3.1, a SQL Anywhere Server with a valid purchased License, or a MySQL database server; (b) all synchronization requests to the MobiLink Synchronization Server are from SQL Anywhere UltraLite components described in Section 3.3; and (c) the MobiLink Synchronization Server is installed on only one computer.

3.3 Local Web Database. Each instance of this SQL Anywhere Web Edition permits you to distribute and install the SQL Anywhere UltraLite component on an unlimited number of Seats, and may include internal users within the Customer's organization and external users outside the Customer's organization, provided that (a) all access, direct and indirect, to the SQL Anywhere UltraLite component is from a Web Browser; (b) the sole purpose of the SQL Anywhere UltraLite component is to support interactive Web Browser applications; and (c) data stored in the SQL Anywhere UltraLite component is synchronized using the MobiLink Synchronization Server described in Section 3.2.

3.4 Administration, Maintenance, and Data Import/Export. You may install and use the SQL Anywhere client software and tools on additional Seats solely for administration and maintenance of the SQL Anywhere Server described in Section 3.1 and the MobiLink Synchronization Server described in Section 3.2. Administration and maintenance may include, but is not limited to, importing or exporting of data in the SQL Anywhere Server, or running non-interactive reporting applications.

3.5 Diagnostic Tracing. When using the Application Profiling feature in SQL Anywhere, you have the option of storing tracing data within the database being profiled or in a separate tracing database stored on a separate machine. You may install the SQL Anywhere Server components on a separate Server solely for use with the tracing database used in conjunction with the Application Profiling feature. No other components may be installed on the separate Server, and the SQL Anywhere Server may not be used to process other applications, without obtaining an additional license.

3.6 High Availability Option: SQL Anywhere has a High Availability Option where an additional SQL Anywhere server runs on a secondary computer for Use in the event of a failure of the primary server. A SQL Anywhere High Availability Option deployment is only permitted if applicable licenses are purchased.

3.7 MobiLink Server Farm. A MobiLink Server Farm is an environment where more than one MobiLink server is synchronizing the same set of remote databases with one application or database management system. A MobiLink Server Farm deployment requires MobiLink to run in shared state mode and is only permitted if applicable licenses are purchased.

3.8 Security Option. RSA Encryption is included in the SQL Anywhere Web Edition. Software for Encryption using ECC or FIPS-certified technology, which may be included with the Program, may not be installed or used without the purchase of a separate license at an additional charge. If the needed license has been purchased, the Encryption may only be used in conjunction with Sybase SQL Anywhere. You are advised that these Encryption Programs and related Documentation are cryptographic in nature and therefore are highly regulated. Obtaining any necessary export or import approval for any of your applications or the Security Options or Documentation is your responsibility.

3.9 Sample Code. Sample Code is provided AS IS, without warranty or liability of any kind. You may use, reproduce, modify and distribute the Sample Code that is provided without limitation, on the condition that you retain the copyright notice and disclaimer as to the original iAnywhere code.

Tuesday, August 5, 2008

SQL Anywhere 11 Makes Top Ten TPC-C

For the first time Sybase iAnywhere has published a TPC-C benchmark test using SQL Anywhere, and that benchmark test made it on to the Top Ten TPC-C list.

TPC stands for Transaction Processing Performance Council:

"The TPC is a non-profit corporation founded to define transaction processing and database benchmarks and to disseminate objective, verifiable TPC performance data to the industry."
TPC-C is the name of one particular test, The Order-Entry Benchmark:
"As an OLTP system benchmark, TPC-C simulates a complete environment where a population of terminal operators executes transactions against a database. The benchmark is centered around the principal activities (transactions) of an order-entry environment. These transactions include entering and delivering orders, recording payments, checking the status of orders, and monitoring the level of stock at the warehouses."
Test results are ranked two different ways, one absurd and the other meaningful.

First the absurd: If you have $17,111,788 to spend, and you've got room for 11,000 disk drives, you too can own the top ranked system in the Top Ten TPC-C by Performance list. With that system you'll be able to process 6,085,166 transactions per minute submitted by your 5,184,000 users.

In other words, cost is no object... like I said, absurd.

The meaningful list includes cost in the ranking, and this is the list where SQL Anywhere 11 appears: Top Ten TPC-C by Price/Performance...



With a figure of 85 cents per transaction per minute, SQL Anywhere doesn't just break into the Top Ten, it debuts at Number Five!

This is huge news, for many reasons:
  • The test handled 16,500 users. That's more than most real-world enterprises. SQL Anywhere isn't just a departmental database any more.

  • By going public with this benchmark, Sybase iAnywhere shows that it is serious about SQL Anywhere performance and that it is willing to compete with Oracle, SQL Server and DB2.

  • The initial release of SQL Anywhere 11 was used, not some version that's been tweaked and tuned for running TPC-C.

  • The total system cost for the hardware, software and support was $17,540.16. That means SQL Anywhere ranks Number One in total system cost by a very wide margin: $20,000 cheaper than the next cheapest system in the Top 10 price/performance list.

  • Economies of scale apply to larger, more expensive hardware configurations, not the affordable setup used for SQL Anywhere 11. The other 9 systems in the Top 10 price/performance list cost an average of $166,895.11 versus $17,540.16 for SQL Anywhere, and that makes a 5th place showing even more impressive.

  • The test achieved a rate of 20,705 transactions per minute. That's a great number, especially when viewed from a historical perspective: In 1997 the best cost-is-no-object performance was only 30,390 transactions per minute, and best price/performance was $42.53 per transaction per minute... fifty times higher than SQL Anywhere's figure of $0.85 per transaction per minute.
Here's another view of the Top Ten list. The left side ranks price/performance with SQL Anywhere's $0.85 per transaction per minute shown in red. The right side shows how SQL Anywhere's total system cost of $17,540.16 compares so very favorably with the other systems, especially the half-million dollar systems that didn't fare too well when it came to economies of scale:

Top 10 Cool New Features in SQL Anywhere 11

Read it here.

Friday, July 25, 2008

Top 10 In 11

Here's why there haven't been any postings here for a few days: I've been working on Top 10 Cool New Features In SQL Anywhere 11... over 30 distinct features covered, 6500 words, plus a day job to deal with.

From the overview...

SQL Anywhere 11 has a surprisingly large number of new features, especially when you consider the huge number of features introduced only two years ago in SQL Anywhere 10. This article presents a Top 10 list of new features in SQL Anywhere 11 that developers are likely to find most interesting. Some of the features that didn't make the Top 10 list are described as well, including features that might be important to management but aren't necessarily cool as far as developers are concerned. Also included is a short list of features that are more weird than wonderful.
Want to read more? Come to Techwave.