Saturday, February 28, 2009

Revisited: Triggering an Audit Trail

I'm really glad Jeff Atwood's given me permission to reinvent the wheel because the code in Tip: Triggering an Audit Trail needed work.



It's pretty much a rewrite, not a refactoring:

  • the audit log of table t is now called logged_t instead of t_log so all the logged_* tables appear separately in an alphabetic list instead of mixed in with the base tables.

  • The log_id column is now UNSIGNED because you never know when you might need another bazillion rows :)

  • The values allowed in logged_action are now defined in a CHECK constraint. Feel free to drop that constraint, it's more for documentation than anything else since the trigger is the only piece of code that fills in a value... on the other hand, there's no safety like safety.

  • The values in logged_action have been expanded for clarity, and a new one added: 'before UPDATE'.

  • The before- and after-images of updated rows are stored instead of just the after-images. In this case the value of logged_at is set explicitly so both before- and after-images have exactly the same value.

  • The logged_action column is defined as VARCHAR ( 50 ) instead of the minimum required VARCHAR ( 13 ) so folks don't have accidents when translating the code to different languages. There's no difference in storage efficiency or performance between the two definitions; that depends on the amount of data stored, not the declaration of maximum length.

  • The logged_description and logged_by columns have been dropped because this application doesn't need them. If you do, see the original Tip: Triggering an Audit Trail.

  • The three audit-related columns are all located at the front of the logged_t row, for efficiency. This no longer gives Foxhound a reason to produce "long columns (max length 100 or more) not all at end of row" messages in the Table-level Curiosities section. If the base table t does have all its long columns at the end of the row, they are also at the end of row in logged_t.
Here's the revised code, presented as an end-to-end example from the real world:
CREATE TABLE exchange_rate (
from_currency_to_currency VARCHAR ( 50 ) NOT NULL PRIMARY KEY,
exchange_rate DECIMAL ( 15, 5 ) NOT NULL,
updated_at TIMESTAMP NOT NULL DEFAULT TIMESTAMP );

CREATE TABLE logged_exchange_rate (
log_id UNSIGNED BIGINT NOT NULL DEFAULT AUTOINCREMENT PRIMARY KEY,
logged_action VARCHAR ( 50 ) NOT NULL CHECK ( logged_action IN (
'after INSERT', 'before UPDATE', 'after UPDATE', 'before DELETE' ) ),
logged_at TIMESTAMP NOT NULL DEFAULT TIMESTAMP,
-- Columns from table to be logged...
from_currency_to_currency VARCHAR ( 50 ),
exchange_rate DECIMAL ( 15, 5 ),
updated_at TIMESTAMP );

CREATE TRIGGER log_exchange_rate
AFTER INSERT, UPDATE, DELETE ON exchange_rate
REFERENCING OLD AS old_exchange_rate
NEW AS new_exchange_rate
FOR EACH STATEMENT
BEGIN
DECLARE @current_timestamp TIMESTAMP;

CASE

WHEN INSERTING THEN

INSERT logged_exchange_rate
WITH AUTO NAME
SELECT 'after INSERT' AS logged_action,
*
FROM new_exchange_rate;

WHEN UPDATING THEN

SET @current_timestamp = CURRENT TIMESTAMP;

INSERT logged_exchange_rate
WITH AUTO NAME
SELECT 'before UPDATE' AS logged_action,
@current_timestamp AS logged_at,
*
FROM old_exchange_rate;

INSERT logged_exchange_rate
WITH AUTO NAME
SELECT 'after UPDATE' AS logged_action,
@current_timestamp AS logged_at,
*
FROM new_exchange_rate;

WHEN DELETING THEN

INSERT logged_exchange_rate
WITH AUTO NAME
SELECT 'before DELETE' AS logged_action,
*
FROM old_exchange_rate;

END CASE;

END;

INSERT exchange_rate VALUES ( 'USD/CAD', 1.25210, DEFAULT );

UPDATE exchange_rate SET exchange_rate = 1.25211;

DELETE exchange_rate;

INSERT exchange_rate VALUES ( 'USD/CAD', 1.25212, DEFAULT );

COMMIT;

SET TEMPORARY OPTION TIMESTAMP_FORMAT = 'YYYY-MM-DD HH:NN:SS.SSSSSS';

BEGIN
SELECT * FROM exchange_rate;
SELECT * FROM logged_exchange_rate ORDER BY log_id;
END;



One Row Only

Captain Ramius: Re-verify our range to target... one ping only.

Capt. Vasili Borodin: Captain, I - I - I just...

Captain Ramius: Give me a ping, Vasili. One ping only, please.

Capt. Vasili Borodin: Aye, Captain.
Here's a snippet of code from the SQL Anywhere database setup script for a small application. The settings table is a place to store global configuration parameters separate from the application code:
CREATE TABLE settings (
one_row_only INTEGER NOT NULL DEFAULT 1
CHECK ( one_row_only = 1 )
PRIMARY KEY,
no_picture_yet_image_folder LONG VARCHAR NOT NULL,
no_picture_yet_image_file_name LONG VARCHAR NOT NULL );

CREATE TRIGGER one_row_only BEFORE DELETE ON settings
FOR EACH ROW
BEGIN
ROLLBACK TRIGGER WITH RAISERROR 99999
'Do not attempt to delete from settings';
END;

INSERT settings VALUES (
DEFAULT,
'C:\\Inetpub\\images',
'no_picture_yet.jpg' );
COMMIT;
Once this code's been run, the settings table is available for SELECT and UPDATE but the following operations are impossible:
  • inserting another row (the CHECK and PRIMARY KEY constraints),

  • changing the primary key value (the CHECK constraint), or

  • deleting the row (the TRIGGER).
INSERT settings VALUES ( 1, 'Hello', 'World' ); -- primary key violation
INSERT settings VALUES ( 2, 'Hello', 'World' ); -- check condition is false
UPDATE settings SET one_row_only = 3; -- check condition is false
DELETE settings; -- trigger catches it

SELECT * FROM settings;

one_row_only,no_picture_yet_image_folder,no_picture_yet_image_file_name
1,'C:\\Inetpub\\images','no_picture_yet.jpg'

UPDATE settings SET no_picture_yet_image_file_name = 'future.jpg';

one_row_only,no_picture_yet_image_folder,no_picture_yet_image_file_name
1,'C:\\Inetpub\\images','future.jpg'
In other words, there will always be exactly one row in this table, and that fact is guaranteed by the the database itself... perfect for a global configuration settings table.

Marketecture vs Tarchitecture

Today's Dilbert introduced me to the term "marketecture"... which I'd never heard before:

Dilbert.com

Not unexpectedly, the Wikipedia definition presents the wisdom of crowds with its definition of marketecture as complete bullsiht. That rang a bell with me... I've sat through enough incomprehensible product architecture presentations by marketing folks to last a lifetime.

But then... fourth down in my Google search on "marketecture" came this fascinating title: The Difference between Marketecture and Tarchitecture.

Tarchitecture ?!?

You gotta be kidding! That has to be a joke! Maybe even fodder for Center Console (my other blog).

Turns out, no, it makes for very interesting reading. Not so much about marketecture per se, but about why the marketing point of view is so very important to a product's success, just as important as the technical point of view.

I have always said that even though I absolutely do not understand how folks in sales and marketing think or work, I firmly believe that what they do is critically important.

And marketing done badly spells disaster for the product.

Tuesday, February 24, 2009

TechWave Goes Global

TechWave Goes Global as Sybase Adds Multiple Locations to Its Annual User Conference

Tue. February 24, 2009; Posted: 09:00 AM

DUBLIN, Calif., Feb 24, 2009 (BUSINESS WIRE) -- SY | Quote | Chart | News | PowerRating -- Sybase, Inc. (NYSE:SY), the largest enterprise software and services company exclusively focused on managing and mobilizing information, today announced that it will host a series of regional TechWave user conferences worldwide in the second half of 2009.
Read more here, and see http://www.sybase.com/techwave.

Today's Database Deadpool Entries

No, this blog is NOT going to morph into DatabaseDeadpool.com... you're welcome to register that domain if you want. Or this one.

But, when another data-based organization goes off the air because of problems with the data, it rings a bell. The point is not whether the organization is big or small, commercial or volunteer, traditional or Web 2.0, the point is the data is important to someone.

And the question is, if you are the one managing that data, are you protecting it?

The answer in some cases is "apparently not".

Today, BikeReg.com ...


Here's the email about BikeReg.com from the president of Pioneer Registration Services:
From: BikeReg.com Support
Date: Mon, Feb 23, 2009 at 8:17 PM
Subject: BikeReg.com Status

On February 21, BikeReg.com suffered a severe hardware failure. Due to this there was some data loss. There was not an issue of security, just loss of data.

We have recovered a large amount of the data at this point. In order to verify our data recovery, we are asking for your help. If you have registered for an event in February please forward us a copy of your confirmation receipt. This will allow us to double check that you are properly registered for your event. If we need additional information for confirming your registration we will be in contact.

If you do not have a confirmation email available, please send along:

Event Name
Category Registered
Your Name
Your email used during registration

Please do not reply with sensitive payment information.

Thanks for your understanding and patience.

-Steve Roszko-
President / CEO

Pioneer Registration Services, LLC
Sport Specific Online Event Registration Services
BikeReg.com | SkiReg.com | RunReg.com
AthleteReg.com | PioneerReg.com
Sounds a bit like Ma.gnolia.com, doesn't it? Not quite as severe (Ma.gnolia's recovery attempt failed), but as with Ma.gnolia the end users of BikeReg.com are being asked to help.
(Whaddaya think? Would that approach work with YOUR customers?)
It looks like the parent company is still on the air, but all four of the sites it manages are not:



BikeReg.com has been off the air long enough to pollute Google's cache:



So, let me ask the two big questions: (1) does Pioneer use a database? and (2) do they back it up? The evidence seems to indicate "apparently not".

How about you?

... tomorrow, Wikipedia?


In this article Are We Safeguarding Social Data? the author talked to Ed Chi, a senior research scientist at the Palo Alto Research Center (PARC):
Chi also suggested that Wikipedia's data is not getting backed up as often as it should be. Losing data from Ma.gnolia is a personal loss for many, but Wikipedia is one of the great achievements of the digital age, and losing even a day's worth of edits would be significant. We can't afford to lose sight of the basic upkeep that will allow us to hold on to what we create.

Losing a large volume of socially created data would be a cultural loss. Wikipedia, Flickr, Twitter, WordPress, etc. store troves of data that will be valuable to future generations. I hope the data is being treated accordingly.

Monday, February 23, 2009

Why C?

Here's why...



- from Chriss' Blog - Über Gott und die Welt

Sunday, February 22, 2009

Get MAC Addresses Via SQL

The Media Access Control (MAC) address of a computer's network adapter is sometimes used as a unique identifier for that computer. This article presents code that lets you query the MAC address(es) via SQL using SQL Anywhere on Windows.

The MAC address is not perfect as a unique identifier; here are the main reasons:

  • the MAC address changes when you change network adapters,

  • a computer can have more than one network adapter, and thus more than one MAC address (up to 16), and

  • it is possible to change (hack) the MAC address.
But, as Engineers often say, for all practical purposes the MAC address works as a unique id.

The code shown here has been tested using SQL Anywhere 11 running on XP SP3 and Vista 64-bit; the 32-bit version of SQL Anywhere was used on Vista 64 simply because the code shown here hasn't been compiled for 64 bits yet.

Here's the SQL code your application needs to retrieve all the MAC addresses in use by the computer that's running SQL Anywhere:
SELECT * 
FROM mac_address()
ORDER BY mac_address;
Here's the output from tests on three different computers; in each case, the result set from the SELECT is shown, followed by the output from the Windows ipconfig /all command to show that the result sets match "Physical Address" values:
-------------------------------------------------------------
Machine 1 - Windows XP SP3 - one network adapter

mac_address
0x001636193239

C:\>ipconfig /all
...
Ethernet adapter Local Area Connection:
...
Physical Address. . . . . . . . . : 00-16-36-19-32-39

-------------------------------------------------------------
Machine 2 - Windows XP SP3 - two network adapters

mac_address
0x0011d82bccd7
0x00179a3b5162

C:\>ipconfig /all
...
Ethernet adapter Local Area Connection:
...
Physical Address. . . . . . . . . : 00-17-9A-3B-51-62
...
Ethernet adapter Local Area Connection 2:
...
Physical Address. . . . . . . . . : 00-11-D8-2B-CC-D7

-------------------------------------------------------------
Machine 3 - Vista 64 - one network adapter

mac_address
0x001cc04d09b5

C:\>ipconfig /all
...
Ethernet adapter Local Area Connection:
...
Physical Address. . . . . . . . . : 00-1C-C0-4D-09-B5
Here is the SQL procedure called by the SELECT above; mac_address() calls a C procedure called getmac, breaks up the VARBINARY ( 96 ) return value into 6-byte substrings and returns them as a result set:
CREATE PROCEDURE mac_address()
RESULT ( mac_address VARBINARY ( 6 ) )
BEGIN

DECLARE @mac_addresses VARBINARY ( 96 );
DECLARE @return_code INTEGER;
DECLARE @diagnostic_code INTEGER;
DECLARE @diagnostic_string VARCHAR ( 255 );
DECLARE @pos INTEGER;

DECLARE LOCAL TEMPORARY TABLE mac_address (
mac_address VARBINARY ( 6 ) NOT NULL PRIMARY KEY )
NOT TRANSACTIONAL;

CALL getmac (
@mac_addresses,
@return_code,
@diagnostic_code,
@diagnostic_string );

IF @return_code = 0 THEN

SET @pos = 1;
WHILE @pos < LENGTH ( @mac_addresses ) LOOP
INSERT mac_address VALUES (
SUBSTR ( @mac_addresses, @pos, 6 ) );
SET @pos = @pos + 6;
END LOOP;

ELSE
MESSAGE STRING ( 'DIAG ', CURRENT TIMESTAMP,
' getmac error: @return_code = ', @return_code,
', @diagnostic_code = ', @diagnostic_code,
', @diagnostic_string = "', @diagnostic_string,
'"' ) TO CONSOLE DEBUG ONLY;
END IF;

SELECT mac_address
FROM mac_address;

END;
Here is the SQL declaration of the getmac EXTERNAL procedure, pointing to the getmac entry point in the getmac.dll (IMO there's no good reason for using different names for something that's really the same, it just makes life unnecessarily complex):
CREATE PROCEDURE getmac (
OUT mac_addresses VARCHAR ( 96 ),
OUT return_code INTEGER, -- OK if = 0
OUT diagnostic_code INTEGER,
OUT diagnostic_string VARCHAR ( 255 ) )
EXTERNAL NAME 'getmac@C:\\temp\\getmac.dll';
Here is the getmac.cpp code, followed by the getmac.def file; the code is based on "Method Three: Use GetAdaptersInfo" from Three ways to get your MAC address by Khalid Shaikh.
// Required files:
// getmac.cpp -- provided
// getmac.def -- provided
// getmac.sln
// getmac.vcproj
//
// To open the DLL project in Visual C++ 2005:
// Doubleclick on getmac.vcproj
//
// To specify the location of SQL Anywhere *.h
// include files in Visual C++ 2005:
// Solution Explorer
// - select getmac
// - right mouse
// - Properties
// - expand Configuration Properties
// - select C/C++
// - select General
// - see Additional Include Directories
// For SQL Anywhere 10:
// "C:\Program Files\SQL Anywhere 10\h"
// For SQL Anywhere 11:
// "C:\Program Files\SQL Anywhere 11\SDK\Include"
//
// To specify which DLL functions can be named
// in CREATE PROCEDURE ... EXTERNAL statements:
// Edit the getmac.def file.
//
// To build the DLL:
// Build menu item
// - Build getmac, or Rebuild getmac
//
// To find and copy the resulting DLL...
// C:\...\getmac\Release\getmac.dll
//
// Attribution for SQL Anywhere C API sample code...
// *******************************************************
// Copyright 1994-2007 iAnywhere Solutions, Inc.
// All rights reserved.
// This sample code is provided AS IS, without warranty
// or liability of any kind.
// You may use, reproduce, modify and distribute this
// sample code without limitation, on the condition that
// you retain the foregoing copyright notice and
// disclaimer as to the original iAnywhere code.
// *******************************************************
//
// Attribution and comment header from
// www.codeguru.com/cpp/i-n/network/networkinformation
// /article.php/c5451
// *******************************************************
// GetMACAdapters.cpp : Defines the entry point for the
// console application.
// Author: Khalid Shaikh [Shake@ShakeNet.com]
// Date: April 5th, 2002
// This program fetches the MAC address of the localhost
// by fetching the information through GetAdapatersInfo.
// It does not rely on the NETBIOS protocol and the
// ethernet adapter need not be connect to a network.
// Supported in Windows NT/2000/XP
// Supported in Windows 95/98/Me
// Supports multiple NIC cards on a PC.
// *******************************************************

#include
#include
#include
#include
#include

#if defined( _MSC_VER )
#include
#endif

#include "extfnapi.h"
#include "Shlobj.h"
#include
#pragma comment(lib, "iphlpapi.lib")

#if !defined( _unused )
#define _unused( i ) ( (i) = (i) )
#endif

#if !defined( TRUE )
#define TRUE 1
#endif

#if !defined( FALSE )
#define FALSE 0
#endif

#if defined( WIN32 )
#define _UINT32_ENTRY unsigned int FAR __stdcall
#define _VOID_ENTRY void FAR __stdcall
#else
#define _UINT32_ENTRY unsigned int
#define _VOID_ENTRY void
#endif

#define int64 __int64
#define uint64 unsigned __int64

#if defined( WIN32 )
int __stdcall LibMain(
HANDLE inst,
ULONG reasoncalled,
LPVOID reserved )
/*********************/
{
_unused( inst );
_unused( reasoncalled );
_unused( reserved );
return( 1 );
}
#endif

__declspec (dllexport ) _UINT32_ENTRY
extfn_use_new_api ( void )
{
return ( EXTFN_API_VERSION );
}

//------------------------------------------
// getmac
//------------------------------------------

__declspec(dllexport) _VOID_ENTRY getmac (
an_extfn_api *api, void *arg_handle ) {

an_extfn_value api_mac_addresses;
an_extfn_value api_return_code;
an_extfn_value api_diagnostic_code;
an_extfn_value api_diagnostic_string;
char * mac_addresses;
DWORD mac_addresses_size;
DWORD return_code;
DWORD diagnostic_code;
char * diagnostic_string;
IP_ADAPTER_INFO AdapterInfo [ 16 ]; // up to 16 NICs
DWORD dwBufLen;
PIP_ADAPTER_INFO pAdapterInfo;

mac_addresses = ( char * ) malloc ( 96 );
strcpy_s ( mac_addresses, 96, "" );
mac_addresses_size = 0;

return_code = 0;
diagnostic_code = 0;
diagnostic_string = ( char * ) malloc ( 255 );
strcpy_s ( diagnostic_string, 255, "" );

dwBufLen = sizeof ( AdapterInfo );

return_code = GetAdaptersInfo (
AdapterInfo, // [out] buffer to receive data
&dwBufLen); // [in] size of receive data buffer

if ( return_code != ERROR_SUCCESS ) {
strcpy_s ( mac_addresses, 96, "" );
mac_addresses_size = 0;
diagnostic_code = 1;
strcpy_s ( diagnostic_string, 255,
"GetAdaptersInfo call failed" );
}

if ( return_code == ERROR_SUCCESS ) {

pAdapterInfo = AdapterInfo;

return_code = memcpy_s (
&mac_addresses [ 0 ],
6,
pAdapterInfo -> Address,
6 );

if ( return_code == ERROR_SUCCESS ) {
mac_addresses_size = 6;
pAdapterInfo = pAdapterInfo -> Next; // linked list
}
else {
strcpy_s ( mac_addresses, 96, "" );
mac_addresses_size = 0;
diagnostic_code = 2;
strcpy_s ( diagnostic_string, 255,
"memcpy_s call failed" );
}

while ( ( return_code == ERROR_SUCCESS )
&& ( pAdapterInfo )
&& ( mac_addresses_size < 96 ) ) {

return_code = memcpy_s (
&mac_addresses [ mac_addresses_size ],
6,
pAdapterInfo -> Address,
6 );

if ( return_code == ERROR_SUCCESS ) {
mac_addresses_size = mac_addresses_size + 6;
pAdapterInfo = pAdapterInfo -> Next; // in list
}
else {
strcpy_s ( mac_addresses, 96, "" );
mac_addresses_size = 0;
diagnostic_code = 3;
strcpy_s ( diagnostic_string, 255,
"memcpy_s call failed" );
}

} // while

} // if ( return_code == ERROR_SUCCESS )

api_mac_addresses.type = DT_VARCHAR;
api_mac_addresses.data = mac_addresses;
api_mac_addresses.piece_len = mac_addresses_size;
api_mac_addresses.len.total_len = mac_addresses_size;
api -> set_value (
arg_handle, 1, &api_mac_addresses, 0 );

api_return_code.type = DT_INT;
api_return_code.data = &return_code;
api -> set_value (
arg_handle, 2, &api_return_code, FALSE );

api_diagnostic_code.type = DT_INT;
api_diagnostic_code.data = &diagnostic_code;
api -> set_value (
arg_handle, 3, &api_diagnostic_code, FALSE );

api_diagnostic_string.type = DT_VARCHAR;
api_diagnostic_string.data = diagnostic_string;
api_diagnostic_string.piece_len
= ( a_sql_uint32 )( strlen ( diagnostic_string ) );
api_diagnostic_string.len.total_len
= ( a_sql_uint32 )( strlen ( diagnostic_string ) );
api -> set_value (
arg_handle, 4, &api_diagnostic_string, 0 );

free ( mac_addresses );
free ( diagnostic_string );

} // getmac
Here is the getmac.def file:
EXPORTS extfn_use_new_api
EXPORTS getmac

Saturday, February 21, 2009

Time for a Web 2.0 Database Dead Pool?

First it was Ma.gnolia, now it's Reciva: "Could not connect to the database server."



Reciva's Corporate business card is still online, and it sure looks like it's more than just a one-person operation, but who knows these days?



For the record, I first noticed Reciva was down around 10 AM EST on Saturday February 21, 2009, and ranted, er, blogged about it here.

...and it's still down more than two hours later.

Thursday, February 19, 2009

Displaying PDFs Via HTTP

Q: What additional software do I need in order to display PDFs via SQL Anywhere's built-in HTTP server?

A: Nothing, you've got it all: SQL Anywhere on the server side and Adobe Reader on the client side.



Here's the code for a web service written in "Hello World" style... it really is this simple:

CREATE SERVICE root 
TYPE 'RAW' AUTHORIZATION OFF USER DBA
AS CALL p();

CREATE PROCEDURE p()
RESULT ( html_string LONG VARCHAR )
BEGIN

CALL dbo.sa_set_http_header( 'Content-Type', 'application/pdf' );

SELECT xp_read_file ( 'This is Robot Chicken.pdf' );

END;
Here's how to start the SQL Anywhere engine with the HTTP server enabled:
   "%SQLANY11%\bin32\dbsrv11.exe" -xs http ddd11.db
Here's the URL:
   http://localhost/
Some notes:
  • The key is specifying the 'Content-Type' as 'application/pdf'.

  • Other content-types are possible, such as 'image/jpg'; see MIME Reference (which, curiously, does not mention 'image/png').

  • The point about 'image/png' may be moot; see HTTP content-type and browser support ...which might help explain Safari's (lack of) popularity :)

  • The service name "root" is special; it means you don't have to code it in the URL, although http://localhost/root still works.

  • In the real world, you might not read the file from disk via xp_read_file(), you might store the PDF in a LONG BINARY database column and select that. That way ALL your data is protected by SQL Anywhere.

Tuesday, February 17, 2009

Ma.gnolia Postmortem

I wrote about the "data loss and corruption" at Ma.gnolia back on February 2, and now the final word is in...

Tuesday, February 17, 2009, 10:50 AM PST:

Unfortunately, database file recovery has been unsuccessful and I won't be able to recover members' bookmarks from the Ma.gnolia database.
The good news is, founder Larry Halff promises that Ma.gnolia.com will be back in a month or so.

Hear that, and much more, about what went wrong and what the future holds, in this "Whither Ma.gnolia?" video...

Sunday, February 15, 2009

Find The Mistake Contest

In case you missed it, there's a contest going on: Find the simple mistake with the subtle effect in SELECT FROM Excel Spreadsheets.

The rules are explained in the comments on that post.

Two clues have been given, so the maximum possible score is now 80 out of 100. I reserve the right to deduct an arbitrary number of points for each clue; right now it's 10.

You can post your contest entries as comments there, or here, or send me an email at breck dot carter at gmail dot com. Entries will be scored in the order I read them.

Private entries are ENCOURAGED... send me an email, I won't post it if you don't want me to. Unless you win, of course, then I have to post SOMETHING... like, "Hey, anonymous won!"

Friday, February 13, 2009

Revisiting REGEXP_SUBSTR

In a past life (way back before Christmas) I said some harsh things about SQL Anywhere's support for regular expressions:

It's going to be a very long time before I learn what "positive lookbehind zero-width assertion" means, or code something like '(?<=new\\s)york' in an otherwise readable SELECT. If you already know about regular expressions, you might be happy, but maybe not if you grew up worrying about maintainability.
Top 10 Cool New Features in SQL Anywhere 11 November 3, 2008
I'm not going to say anything nice about regular expressions (they are abominable!)
JPEG Width and Height December 23, 2008

Stand back! I do regular expressions!




I can't say I know regular expressions yet, but I've been doing them lately.

And getting caught by a certain (ahem) behavior change introduced between two successive builds of SQL Anywhere 11.

Now, those familiar with SQL Anywhere's release cycle will know that behavior changes between major point releases are very rare, say after 11.0.0 but before 11.0.1. One reason for the rarity that the documentation (Help) is only shipped with major point releases, and it's the documentation that contains the "Behavior Changes" sections.

So, imagine my surprise when SQL code published in this blog, not once but twice, stopped working properly after I upgraded from the Generally Availability (GA) version 11.0.0.1264 to a later Express Bug Fix (EBF) version 11.0.0.1549.

Suddenly, the wonderful REGEXP_SUBSTR function went from being case-insensitive, just like every other normal string operation in SQL Anywhere, to being case sensitive.

Now, it's not as if I didn't know something was coming. I did see the early warnings, like this one that shipped with the GA software:
As a result of feedback received during the 11.0.0 beta period, the behavior of REGEXP and REGEXP_SUBSTR will be changed in an early 11.0.0 EBF. The exact extent of the change is still being considered, but the intent is to make the regular expression behavior closer to that which is available in other languages (such as Perl, Java, .NET and other products) and to allow for the possibility of additional flexibility in determining how regular expressions are matched.

Also, SIMILAR TO, REGEXP, and REGEXP_SUBSTR will be changed in an early 11.0.0 EBF so that [[:lower:]] and [[:upper:]] are case sensitive regardless of database case sensitivity.
SQL Anywhere 11.0.0 Release Notes

So here's what I thought: If I ever (snork!) use things with names like [[:lower:]] and [[:upper:]], I'll be sure to be careful!

In the meantime, I relied on the Help: "Regular expression matching is case insensitive on case-insensitive databases."

Little did I know that what I should have paid attention to is this comment on the same page: "SQL Anywhere syntax for regular expressions matches closely the Perl 5 syntax".

Here's how the build 1549 read-me described the change:
   ================(Build #1431  - Engineering Case #537171)================

The search condition REGEXP and system function REGEXP_SUBSTR() were using
the database's collation to determine if a literal, or character class range,
in the pattern matched the string. For example, if the database was case
insensitive and accent insensitive, matches were case insensitive and accent
insensitive as well. Ranges were evaluated using the collation sort order.
This resulted in different behavior than other tools such as Perl, Java,
.NET, etc. This has been fixed so that REGEXP and REGEXP_SUBSTR() only match
a literal in a pattern if it is the exact same character. Ranges in character
classes (for example '[A-F]') only match characters which have character
set encoding greater than or equal to the encoding of the first character
in the range (A in '[A-F]') and less than or equal to the encoding of the
second character in the range (F in '[A-F]').

Note, this change does not affect the SIMILAR TO search expression, which
continues to use the collation to determine character equivalence and evaluate
character class ranges.
That's why the Help is so important: It is written by native language speakers rather than professional Locale Lawyers.

Let's translate, into English:
   ================(Build #1431  - Engineering Case #537171)================

REGEXP and REGEXP_SUBSTR() are now case sensitive; i.e., 'A' is not equal to 'a'.
That's probably a good thing. Folks who know regular expressions (not me, not yet) say so.

But I still got caught. I have to fix two blog posts, PLUS I got royally embarrassed when I copied my own blog code into a client application and watched it... fail!

Stand back! I'm fixing code!

For the record, here's the bad code:
SET @padded_marker = REGEXP_SUBSTR (
@hex_string,
'(FF)+([0-9A-F][0-9A-E]|[0-9A-E][0-9A-F])',
@current_pos );
and here's the fix... see all the leetle a's and e's and f's? That's what case sensitivity does for you:
SET @padded_marker = REGEXP_SUBSTR (
@hex_string,
'((FF)|(ff))+([0-9A-Fa-f][0-9A-Ea-e]|[0-9A-Ea-e][0-9A-Fa-f])',
@current_pos );
Just gimme a minute, I'm gonna put "fix old blog posts" on my to-do list...
-- Fix JPEG Width and Height

-- Fix Loading Folders
...I'll get around to it soon, trust me!

Wednesday, February 11, 2009

Don't be a Ferengi Programmer!

Finally, someone said it. At least, someone said it within earshot of me:

Oh and Scott Ambler blows too.
- Brian in comment on The Ferengi Programmer
I want to rend my garments, cast away all possessions and devote my life to the teachings of Joel Spolsky and Jeff Atwood after reading The Ferengi Programmer.

Because the monstrous perversions that are frameworks, architectures and principles that guide our business have been dragging us down for decades, and Jeff and Joel speak the truth!

Sunday, February 8, 2009

So ya wanna code for the iPhone?

The good news is, maybe, you'll get to use a real relational database management system; sign up for more news here.

The bad news is, maybe, writing that killer iPhone app won't make you rich.

Apple might not let anyone download your application. And if they do, now, they might decide to remove your application at some later date... not just from the iPhone store, but from every iPhone out there.

"... imagine if Microsoft had adjusted Windows to act the way the iPhone and Facebook apps platforms do. WordPerfect would owe Microsoft 30% on sales of every copy of its word processor — if it sold any, since Word could be featured by Microsoft to its users much more readily, or rejected entirely as duplicative of Word.

... Of course, Microsoft could change that percentage owed at any time — or make it a flat fee. The makers of, say, Quicken, could find that they owe 70% or 80% on every app, take it or leave it. If they leave it, Quicken would stop working on every PC on which it had previously been installed."
- The end of the generative internet, Jonathan Zittrain, Communications of the ACM, January 2009
Ok, so it's true, I can't see a silver lining without immediately zoning in on the black cloud surrounding it.

I also can't see Apple's policy being used for evil... at least, not for much longer ( there's that black cloud again :).

Saturday, February 7, 2009

The Answer to Grandiose Help

Q: What have you used to create help files for your applications?

A: It's not exactly a model of perfection, but for Foxhound I am heading toward a static website simply because Google can be used for searching; e.g.:

heartbeat site:risingroad.com/foxhound
Right away you can see that individual page titles need work... but at least the content's there.

I also ship the *.HTML files with the product so they can be browsed offline if absolutely necessary. I am not going to attempt custom offline searching.

I used to be a fan of *.CHM but it is so difficult to develop, and the searching is so 1980s.

If you want to see what does NOT work, see Microsoft Office Help. I'm sure it requires wetware-centuries of effort to develop and maintain, but it is absolutely impenetrable... BECAUSE of the so-called ease-of-use. They made so many great improvements in Office 2007, I don't understand why they keep going in that direction for the Help. When I have an Office question I use Google:
"get external data" "excel 2007" site:microsoft.com
Foxhound does attempt a kind of context-sensitive Help using frames; click on one of the several tiny "?" images in the working frame to scroll the Help frame to the appropriate section.



If Foxhound was a "normal" GUI (end-user data entry, etc) then I would not bother with context-sensitive at all. If someone needs to be told that the Date of Birth field "contains the employee's date of birth" they probably can't figure out context-sensitive Help either, and it's time for some classroom instruction.

However, Foxhound is a funky app where highly condensed data is blasted onto the screen... at least in the performance monitor section. So, I needed some context sensitivity to tell people where the data is coming from and what exactly it means... and does not mean. And nobody's going to offer a Foxhound class.

The last thing I want to do is let the Help authoring process itself affect the project in the following negative ways:
  • slow down the addition of new, valuable Help content, and

  • slow down application development by requiring simultaneous Help-related changes to the GUI.
The answer to grandiose Help is not "build a complex framework to assist the authoring and presentation processes" but "stop doing it".

Note that Foxhound is a web app itself... but I would probably take the same approach for a desktop executable. Exactly the same original HTML content is served up three ways:
  • the website Help for Foxhound,

  • local HTML files delivered via start - All Programs - Foxhound1 - Help which is a simple shortcut to foxhound_contents.html, and

  • Help frame content served up from the local SQL Anywhere database via web services as part of the Foxhound GUI.

Tuesday, February 3, 2009

The Tao Of Backup

This ancient website might look cheesy at first, but I don't care who you are, if you own or use a computer, it will give you something to think about.



There's more to this website than the top-level chapters:

  • There are "Click for more information on..." links at the bottom of each page,

  • plus "Sanctuary" links that take you to the actual home page.
For the folks struggling at Ma.gnolia, it's probably too late.

Monday, February 2, 2009

Web 2.0 Meets Database 101

As of this moment, the popular social bookmarking website Ma.gnolia has been off the air for almost three days because of "data corruption and loss". When (if?) Ma.gnolia comes back the data may be gone forever.

This is a life-critical event for an enterprise defined by the data it stores. Founder Larry Halff's apology even seems to stutter ("to to"):

"I can't provide a certain timeline or prognosis as to to when or to what degree Ma.gnolia or your bookmarks will return"


Initially, the key words are to what degree. As time marches on when will become more important, perhaps replaced by who cares? as users flock to delicious.com (formerly known as del.icio.us, get it?).

According to Wired, "Ma.gnolia posted a short note on its website shortly after 9 a.m. Pacific time (January 30), saying it was down temporarily due to a database failure."

Huh?

A "database failure"? OK, it's time for a FAQ:

Q: Why should I use a database?

A: Because a database server automatically recovers after a system crash. Because a database backup is an internally consistent copy of all your data.

Q: Why should I take a backup of my database?

A: Because computer files sometimes get lost or corrupted. Because disk drives sometimes stop working. Because sometimes, whole computer systems are destroyed. Along with the buildings they are in.

Q: How do I back up my database?

A: If you're talking about SQL Anywhere, shut down the database server and copy the .DB file to another location. On another disk drive, on another computer, maybe even in another building.

Q: How do I restore my database?

A: Copy the backup .DB file back to its original location and start the SQL Anywhere database server.

Q: Why haven't you mentioned the relational model, SQL, ACID, online backup or database mirroring?

A: Because this is Database 101. Databases like SQL Anywhere have all that other stuff, but the first things first: ya gotta have a plan.

Q: What was that about online backup?

A: The dbbackup utility lets you back up a running database to a disk drive anywhere in the world. It can also do a "live backup" by continuously shipping transaction log records to that other disk drive.

Q: What is database mirroring?

A: It's really called "High Availability" where you have two complete database servers with two matching copies of the database in two different locations. When one server fails all the users are switched over to the other one, and all their data is safe.

But does anyone care?

It may be a moot point now, but Gnolia Systems (Ma.gnolia, get it?) had previously announced an ambitious project to redevelop the Ma.gnolia software. The Ma.gnolia 2 Charter is most interesting for what's missing, not what it contains.

For example, it mentions technical terms like Ruby on Rails, OPENID, OAUTH and a "new API method: user_find", but none of the following words appear anywhere in the 12 pages:
  • database
  • datastore
  • file
  • backup
  • recovery
  • restore
  • integrity
  • availability
  • strategy
  • infrastructure
These words do appear, but not in the context of basic data storage and protection:
  • available
  • data
  • safety
  • protect, protection
  • risk

Can you hear me now?

Yeah, sure, I'm piling on, kicking Ma.gnolia when it's down, but that's not the point.

The point is, WHO ELSE OUT THERE DOESN'T GET IT?

Larry Halff and his team are paying for their complete lack of foresight. They will never forget this past weekend for the rest of their lives. But Ma.gnolia doesn't matter, what matters is YOUR DATA.

Gosh, do ya think?

So I'm gonna pile on some more, with this excerpt from the "Why M2?" section of the Ma.gnolia 2 Charter:
"A major re-design is required to truly take advantage of lessons learned over 3 years. These issues range across identity, reputation, spam, privacy and contact management, cross-service presence, operational costs and the personal and organizational goals that customers bring to a social bookmarking service."
Gosh, do ya think it's time to add something to the "lessons learned"?

Sunday, February 1, 2009

The Blog is the Book

Today is the first anniversary of this blog (see the first article here) and I am celebrating by answering this question:

Q: When are you going to write a book about SQL Anywhere Version 11? I really like the Version 9 book.

A: This blog is that book.
It was a wonderful experience, writing the V9 dead tree edition, where "wonderful" means "I survived"... somewhat similar to basic training... painful while proceeding but prideful when complete.
(Not that I have actually attended boot camp, of course.)
Like basic training, or a car accident, writing an actual book not an experience I want to repeat. Especially not the financial aspect... first-time technical authors make very little money, where "very little money" means "5 dollars a day".

On the other hand, a blog doesn't require my full attention for six months at a time, where "full attention" means "unemployment".

The good news, a blog is easier to write. The bad news, the current product is not nearly as good as a book. Maybe I should say...

This Blog Will Be That Book

Maybe it will be, with some decent V11 content, and some kind of Table of Contents, and ... ?

No promises, but it's a new year, wish me luck!
PS: In a previous life I started publishing daily "Tips" (see list here) back in 1996 (see the first Tip here). Not exactly a true "web log" but sometimes neither is this blog.