Wednesday, December 24, 2008

Loading Folders

Question: How do I load all the JPEG files from a folder into a table?

Answer: With a proxy table based on CLASS 'DIRECTORY' remote server definition.

Included in this article is the code from yesterday's posting, JPEG Width and Height, so that each image's height and width may be saved in the table:

CREATE TABLE jpeg (
file_name LONG VARCHAR NOT NULL PRIMARY KEY,
file_size BIGINT NOT NULL,
file_date_time TIMESTAMP NOT NULL,
file_contents LONG BINARY NOT NULL,
image_height UNSIGNED INTEGER NOT NULL DEFAULT 0,
image_width UNSIGNED INTEGER NOT NULL DEFAULT 0 );
Here's the code that loads all the C:\temp\*.jpg files into the jpeg table:
BEGIN
DECLARE @folder_name LONG VARCHAR;
DECLARE @image_height UNSIGNED INTEGER;
DECLARE @image_width UNSIGNED INTEGER;

SET @folder_name = 'C:\\temp';

EXECUTE IMMEDIATE STRING (
'CREATE SERVER folder CLASS ''DIRECTORY'' ',
'USING ''ROOT=',
@folder_name,
';SUBDIRS=0''' );

CREATE EXTERNLOGIN DBA TO folder;

CREATE EXISTING TABLE folder AT 'folder;;;.';

FOR f_fetch AS c_fetch NO SCROLL CURSOR FOR
SELECT folder.file_name AS @file_name,
folder.size AS @file_size,
folder.modified_date_time AS @file_date_time,
folder.contents AS @file_contents
FROM folder
WHERE folder.file_name LIKE '%.JPG'
ORDER BY folder.file_name
FOR READ ONLY
DO

CALL get_jpeg_dimensions (
@file_contents, -- IN
@image_height, -- OUT
@image_width ); -- OUT

INSERT jpeg VALUES (
@file_name,
@file_size,
@file_date_time,
@file_contents,
@image_height,
@image_width );

COMMIT;

END FOR;

END;
Here are some notes on the code:
  • The EXECUTE IMMEDIATE statement is necessary because the CREATE SERVER statement doesn't allow expressions in the USING clause (at least, I don't think it does, I can't remember trying :)

  • If you hard-coded the CREATE SERVER instead of using the EXECUTE IMMEDIATE, it would look like this:
    CREATE SERVER folder CLASS 'DIRECTORY' 
    USING 'ROOT=C:\\temp;SUBDIRS=0';

  • The CREATE SERVER ... CLASS 'DIRECTORY' syntax is described here.

  • The CREATE EXTERNLOGIN statement is required for CLASS 'DIRECTORY' remote servers... I'm not sure why, but it's the law.

  • The CREATE EXISTING TABLE statement uses 'folder;;;.' as the "AT" location. The first parameter refers back to the CREATE SERVER name "folder". The second and third parameters are empty; no database or owner names are required. The fourth parameter, the "object name" parameter, is a single dot meaning "this folder", referring back to the earlier ROOT parameter.

  • The CREATE EXISTING TABLE statement gives you a table that looks like this, as shown by Foxhound:
    CREATE EXISTING TABLE DBA.folder -- proxy
    -- (
    -- permissions VARCHAR ( 10 ) NOT NULL,
    -- size BIGINT NOT NULL,
    -- access_date_time TIMESTAMP NOT NULL,
    -- modified_date_time TIMESTAMP NOT NULL,
    -- create_date_time TIMESTAMP NOT NULL,
    -- owner VARCHAR ( 20 ) NOT NULL,
    -- file_name VARCHAR ( 260 ) NOT NULL,
    -- contents LONG BINARY NOT NULL
    -- )
    AT 'folder;;;.';

  • The FOR loop fetches 4 column values from the folder table.

  • The CALL to get_jpeg_dimensions (see JPEG Width and Height) fills in two more values.

  • ...and the INSERT puts another row into the jpeg table.
The time spent to execute this code depends on a lot of factors, not just the state of the database cache. That's my way of saying "I can't get consistent timing figures no matter what I do"... which is another way of saying "I don't know what's going on."

So here goes: 36 JPEG files took anywhere from 1.5 to 3.5 seconds to load into a SQL Anywhere 11 table on the same reasonably fast laptop used yesterday.

Here's the result in dbisql:
SELECT jpeg.file_name,
jpeg.file_size AS "size",
jpeg.file_date_time AS "date_time",
jpeg.file_contents,
jpeg.image_height AS "height",
jpeg.image_width AS "width"
FROM jpeg
ORDER BY jpeg.file_name;



And here's a new feature I didn't know about: SQL Anywhere 11's version of dbisql is smart enough to know that jpeg.file_contents is an "(IMAGE)", and when you click on the funky little ellipsis button "..." you get these marvellous displays:





So... dbisql understands JPEGs too!



For completeness, here is the supporting code from yesterday's article JPEG Width and Height:
CREATE PROCEDURE get_jpeg_dimensions (
IN @file_contents LONG BINARY,
OUT @image_height UNSIGNED INTEGER,
OUT @image_width UNSIGNED INTEGER )
BEGIN

DECLARE @hex_string LONG VARCHAR;
DECLARE @current_pos BIGINT;
DECLARE @jpeg_marker VARCHAR ( 4 );
DECLARE @block_length UNSIGNED INTEGER;

SET @image_height = 0; -- not found / error
SET @image_width = 0;

UNLOAD
SELECT @file_contents
INTO VARIABLE @hex_string
HEXADECIMAL ON;

SET @current_pos = 3; -- skip the leading 0x

-- Process the leading FFD8 SOI Start OF Image header.

CALL scan_jpeg_marker (
@hex_string, @current_pos, @jpeg_marker );

IF @jpeg_marker <> 'FFD8' THEN
RETURN; -- error
END IF;

-- Scan the second marker.

CALL scan_jpeg_marker (
@hex_string, @current_pos, @jpeg_marker );

IF @jpeg_marker = '' THEN
RETURN; -- error
END IF;

-- Look for the first FFCx SOFx Start Of Frame x,
-- stopping if FFD9 EOI End Of Image or
-- FFDA SOS Start Of Scanned data is found.

WHILE 1 = 1 LOOP

IF @jpeg_marker IN (
'FFC0', 'FFC1', 'FFC2', 'FFC3',
'FFC5', 'FFC6', 'FFC7',
'FFC9', 'FFCA', 'FFCB',
'FFCD', 'FFCE', 'FFCF' ) THEN

-- Start of Frame header
-- (note exclusions FFC4, FFC8, FFCC)
-- FFCx marker
-- xxxx block length
-- xx data precision - bits per sample
-- xxxx image height
-- xxxx image width
-- xx number of color components

SET @image_height = HEXTOINT ( SUBSTR (
@hex_string, @current_pos + 6, 4 ) );

SET @image_width = HEXTOINT ( SUBSTR (
@hex_string, @current_pos + 10, 4 ) );

RETURN; -- success

ELSEIF @jpeg_marker IN ( 'FFD9', 'FFDA' ) THEN

-- EOI, SOS

RETURN; -- not found

ELSE

-- Skip over this header to the next one.

SET @block_length = HEXTOINT ( SUBSTR (
@hex_string, @current_pos, 4 ) );

SET @current_pos
= @current_pos + ( @block_length * 2 );

CALL scan_jpeg_marker (
@hex_string, @current_pos, @jpeg_marker );

IF @jpeg_marker = '' THEN
RETURN; -- error
END IF;

END IF;

END LOOP; -- WHILE 1 = 1 LOOP

END; -- get_jpeg_dimensions

CREATE PROCEDURE scan_jpeg_marker (
IN @hex_string LONG VARCHAR,
INOUT @current_pos BIGINT,
OUT @jpeg_marker VARCHAR ( 4 ) )
BEGIN

DECLARE @padded_marker LONG VARCHAR;

-- Match one or more FFs followed by any hex
-- pair other than FF.

SET @padded_marker = REGEXP_SUBSTR (
@hex_string,
'(FF)+([0-9A-F][0-9A-E]|[0-9A-E][0-9A-F])',
@current_pos );

IF @padded_marker IS NOT NULL THEN
SET @jpeg_marker = STRING (
LEFT ( @padded_marker, 2 ),
RIGHT ( @padded_marker, 2 ) );
SET @current_pos
= @current_pos + LENGTH ( @padded_marker );
RETURN; -- success
ELSE
SET @jpeg_marker = '';
RETURN; -- error
END IF;

END; -- scan_jpeg_marker

Tuesday, December 23, 2008

JPEG Width and Height

Question: How do I get the dimensions of a JPEG image?

I want to fill in the HEIGHT and WIDTH attributes of HTML IMG tags generated by SQL Anywhere web services from images stored in the database.

HTML pages display faster if the browser knows ahead of time how big each image is, and the HEIGHT and WIDTH attributes are one way of doing that:

<IMG SRC="images/blue enamel cup.JPG" 
WIDTH="400" HEIGHT="435">
CSS STYLE height and width attributes, the preferred technique for this blog, are another way of doing the same thing:
<img style=" ... width: 283px; height: 308px;" 
src="..." ... />


Answer: The dimensions of a JPEG image are embedded in the image itself.

This article presents a pure SQL solution for finding those dimensions, after first converting the binary image to a character string as described in yesterday's post Hexadecimal Strings.
Reference: There are many explanations of the JPEG file format on the web, mostly shallow, useless, incorrect or hugely obscure. In situations like that I prefer to read actual code, so I went with the logic in in rdjpgcom.c, Copyright (C) 1994-1997 by Thomas G. Lane, from the Independent JPEG Group's software.

This ancient C program delves into the basic structure of JPEG files, just far enough down into the details to find what I was looking for, without going all the way down into the massive complexity of image compression.

Plus, it's got comments!

You can find the rdjpgcom.c file at various places on the web including here.

1: The JPEG Table

For the purposes of this article, the first two columns of the following table have already been filled in, and it's the job of the code in the next few sections to fill in the image_height and image_width columns:
CREATE TABLE jpeg (
file_name LONG VARCHAR NOT NULL PRIMARY KEY,
file_contents LONG BINARY NOT NULL,
image_height UNSIGNED INTEGER NOT NULL DEFAULT 0,
image_width UNSIGNED INTEGER NOT NULL DEFAULT 0 );

2: CREATE PROCEDURE get_jpeg_dimensions

The get_jpeg_dimensions procedure receives a JPEG image as the first parameter, and returns the image height and width as parameters 2 and 3.

Here are some notes on the format of a JPEG file:
  • A JPEG file consists of a series of blocks, sometimes called headers.

  • Each block begins with a marker.

  • Each marker consists of one hexadecimal FF byte, zero or more FF bytes called padding, and one non-FF byte. The scan_jpeg_marker procedure at the end of this article is dedicated to skipping over the FF padding and returning a 4-character 'FFxx' marker string.

  • The first block consists only of its marker, FFD8.

  • For second and later blocks, the marker is followed by a two-byte length field.

  • The value of the length field includes the length of the length field itself plus the length of the varying-length, varying-format data which follows it.

  • The only data of interest is the image width and height fields. This article does not go into any detail on the other very complex data stored inside JPEG images.

  • The image width and height fields are contained in the first block that has one of these markers: 'FFC0', 'FFC1', 'FFC2', 'FFC3', 'FFC5', 'FFC6', 'FFC7', 'FFC9', 'FFCA', 'FFCB', 'FFCD', 'FFCE' or 'FFCF' (yes, there missing values in that sequence :)

  • The only way to find the image width and height fields is to scan all the markers and length fields up to the 'FFCx' block containing those image width and height fields. For example, a simple search for an FFCx marker won't work because JPEG images sometimes contain embedded thumbnails, and a simple search may return the width and height from an inner thumbnail FFCx block instead of the real values.

  • Scanning should not proceed past the first 'FFD9' or 'FFDA' marker... Here Be Dragons!
Here's the actual code, with some notes following:
  1 CREATE PROCEDURE get_jpeg_dimensions (
2 IN @file_contents LONG BINARY,
3 OUT @image_height UNSIGNED INTEGER,
4 OUT @image_width UNSIGNED INTEGER )
5 BEGIN
6
7 DECLARE @hex_string LONG VARCHAR;
8 DECLARE @current_pos BIGINT;
9 DECLARE @jpeg_marker VARCHAR ( 4 );
10 DECLARE @block_length UNSIGNED INTEGER;
11
12 SET @image_height = 0; -- not found / error
13 SET @image_width = 0;
14
15 UNLOAD
16 SELECT @file_contents
17 INTO VARIABLE @hex_string
18 HEXADECIMAL ON;
19
20 SET @current_pos = 3; -- skip the leading 0x
21
22 -- Process the leading FFD8 SOI Start OF Image header.
23
24 CALL scan_jpeg_marker (
25 @hex_string, @current_pos, @jpeg_marker );
26
27 IF @jpeg_marker <> 'FFD8' THEN
28 RETURN; -- error
29 END IF;
30
31 -- Scan the second marker.
32
33 CALL scan_jpeg_marker (
34 @hex_string, @current_pos, @jpeg_marker );
35
36 IF @jpeg_marker = '' THEN
37 RETURN; -- error
38 END IF;
39
40 -- Look for the first FFCx SOFx Start Of Frame x,
41 -- stopping if FFD9 EOI End Of Image or
42 -- FFDA SOS Start Of Scanned data is found.
43
44 WHILE 1 = 1 LOOP
45
46 IF @jpeg_marker IN (
47 'FFC0', 'FFC1', 'FFC2', 'FFC3',
48 'FFC5', 'FFC6', 'FFC7',
49 'FFC9', 'FFCA', 'FFCB',
50 'FFCD', 'FFCE', 'FFCF' ) THEN
51
52 -- Start of Frame header
53 -- (note exclusions FFC4, FFC8, FFCC)
54 -- FFCx marker
55 -- xxxx block length
56 -- xx data precision - bits per sample
57 -- xxxx image height
58 -- xxxx image width
59 -- xx number of color components
60
61 SET @image_height = HEXTOINT ( SUBSTR (
62 @hex_string, @current_pos + 6, 4 ) );
63
64 SET @image_width = HEXTOINT ( SUBSTR (
65 @hex_string, @current_pos + 10, 4 ) );
66
67 RETURN; -- success
68
69 ELSEIF @jpeg_marker IN ( 'FFD9', 'FFDA' ) THEN
70
71 -- EOI, SOS
72
73 RETURN; -- not found
74
75 ELSE
76
77 -- Skip over this header to the next one.
78
79 SET @block_length = HEXTOINT ( SUBSTR (
80 @hex_string, @current_pos, 4 ) );
81
82 SET @current_pos
83 = @current_pos + ( @block_length * 2 );
84
85 CALL scan_jpeg_marker (
86 @hex_string, @current_pos, @jpeg_marker );
87
88 IF @jpeg_marker = '' THEN
89 RETURN; -- error
90 END IF;
91
92 END IF;
93
94 END LOOP; -- WHILE 1 = 1 LOOP
95
96 END; -- get_jpeg_dimensions
Here are some notes on the code:
  • The UNLOAD at line 15 converts an n-byte LONG BINARY string to a 2n-byte LONG VARCHAR string as described in Hexadecimal Strings.

  • The SET at line 20 initializes @current_pos to point to the first significant character in the hexadecimal string. This variable is incremented throughout the rest of the code, as scanning proceeds.

  • The scan_jpeg_marker procedure is called from several locations, starting with line 24. This procedure starts with the leading FF of a JPEG marker and scans forward to find the subsequent non-FF substring; the code appears in the next section.

  • The loop at line 44 continues until a RETURN is executed.

  • The IF at line 46 tests for the sought-after "Start of Frame" block, containing the image height and width values; the code starting at line 61 returns those values.

  • The ELSEIF at line 69 tests for a premature end-of-JPEG situation; scanning can't proceed, and the image height and width are returned as zero.

  • The ELSE at line 75 skips over the current block and scans the subsequent marker.
Note the expression "@block_length * 2" at line 83: forgetting to multiply by 2 is a characteristic error when dealing with hexadecimal strings. You often have to multiply by 2 because a hexadecimal string contains 2 characters for each single byte in the original LONG BINARY string.

3: CREATE PROCEDURE scan_jpeg_marker

The scan_jpeg_marker procedure looks forward for one single FF, followed by one single non-FF, with zero or more FFs in between.
CREATE PROCEDURE scan_jpeg_marker (
IN @hex_string LONG VARCHAR,
INOUT @current_pos BIGINT,
OUT @jpeg_marker VARCHAR ( 4 ) )
BEGIN

DECLARE @padded_marker LONG VARCHAR;

-- Match one or more FFs followed by any hex
-- pair other than FF.

SET @padded_marker = REGEXP_SUBSTR (
@hex_string,
'(FF)+([0-9A-F][0-9A-E]|[0-9A-E][0-9A-F])',
@current_pos );

IF @padded_marker IS NOT NULL THEN
SET @jpeg_marker = STRING (
LEFT ( @padded_marker, 2 ),
RIGHT ( @padded_marker, 2 ) );
SET @current_pos
= @current_pos + LENGTH ( @padded_marker );
RETURN; -- success
ELSE
SET @jpeg_marker = '';
RETURN; -- error
END IF;

END; -- scan_jpeg_marker
I'm not going to say anything nice about regular expressions (they are abominable!) but the REGEXP_SUBSTR function certainly makes life easy once you figure out how to code the regular expression itself. Here's why:
  • REGEXP_SUBSTR returns the actual substring that matches the regular expression, instead of a string position,

  • you can pass it a starting point, and that starting point is numbered from 1, not zero, and

  • you can pass it an "occurrence count"; e.g., you can tell it to match the third occurrence of the regular expression.
For more about REGEXP_SUBSTR see here.

Here's a breakdown of the way-too-complex expression '(FF)+([0-9A-F][0-9A-E]|[0-9A-E][0-9A-F])' used above:
  • (FF) is a group of two successive F characters

  • (FF)+ means match the group 1 or more times

  • [0-9A-F] means match one character in the range 0 through 9 or A through F

  • [0-9A-E] leaves out F

  • [0-9A-F][0-9A-E] means match any hex pair that doesn't end with F

  • [0-9A-E][0-9A-F] means match any hex pair that doesn't start with F

  • The | in the (...) group ([0-9A-F][0-9A-E]|[0-9A-E][0-9A-F]) means "or": match any hex pair that doesn't end with F or doesn't start with F
In other words, match any hex pair in the range 00 through FE...
I'm sure regexp experts (regexperts?) will have many simplifying suggestions, but please be gentle with your comments; it's my first time! (Someday, I may qualify to wear this, based on this :)

4: Calling get_jpeg_dimensions

Finally, here's a FOR loop that calls get_jpeg_dimensions for each row in the jpeg table, and fills in the image height and width columns:
BEGIN

DECLARE @image_height UNSIGNED INTEGER;
DECLARE @image_width UNSIGNED INTEGER;

FOR f_fetch AS c_fetch NO SCROLL CURSOR FOR
SELECT jpeg.file_name AS @file_name,
jpeg.file_contents AS @file_contents
FROM jpeg
FOR UPDATE
DO

CALL get_jpeg_dimensions (
@file_contents, -- IN
@image_height, -- OUT
@image_width ); -- OUT

UPDATE jpeg
SET jpeg.image_height = @image_height,
jpeg.image_width = @image_width
WHERE CURRENT OF c_fetch;

END FOR;

COMMIT;

END;
It's not the fastest code in the world, BUT... it's not the slowest: With a cold cache (call sa_flush_cache) it took about 1.5 seconds to process 36 images on a reasonably-fast laptop. Depending on your application, you might do the calculation on the fly as needed, or do it once ahead of time as shown here.

Here's the result in dbisql:
SELECT jpeg.file_name,
jpeg.image_width,
jpeg.image_height
FROM jpeg
ORDER BY jpeg.file_name;


Here's what Windows shows for one of the images:

Monday, December 22, 2008

Hexadecimal Strings

A hexadecimal string is an ordinary character string that obeys the following rules:

  • each single character in the string is one of the following: '0' through '9', or 'A' through 'F', and

  • the string contains pairs of characters; i.e., the string length is an even number.
In other words, a hexadecimal string is made up of pairs of characters '00' through 'FF'.

Hexadecimal strings are handy in SQL Anywhere because there are a lot of functions for handling strings and not much in the way of features for representing complex data structures; e.g., the contents of a JPEG file.

...and any arbitrary string of data, including that JPEG file, can be converted to a hexadecimal string by converting each successive byte value 0 through 255 to the corresponding hexadecimal string '00' through 'FF'.

Once upon a time, converting a LONG BINARY string to a hexadecimal LONG VARCHAR meant coding a funky (and slow) loop involving RIGHT, INTTOHEX, ASCII and SUBSTR function calls. Now, SQL Anywhere 11 lets you perform the conversion in one single, very fast, UNLOAD statement using the INTO VARIABLE and HEXADECIMAL ON clauses:
BEGIN
DECLARE @hex_string LONG VARCHAR;
UNLOAD
SELECT xp_read_file ( 'boat.jpg' )
INTO VARIABLE @hex_string
HEXADECIMAL ON;
SELECT SUBSTR ( @hex_string, 3 );
END;
Here's how it works:
  • The xp_read_file function returns the entire contents of the boat.jpg file as a LONG BINARY result.

  • The INTO VARIABLE clause tells the UNLOAD statement to put the output in the local LONG VARCHAR variable @hex_string instead of a file.

  • The HEXADECIMAL ON clause (which is actually the default) tells UNLOAD to convert binary columns to strings '0xnnnn...' where '0x' is a fixed prefix and each nn is a pair of hexadecimal characters.
    (Doesn't that sound familiar?
    ...it's the definition of "hexadecimal string" given earlier :)

  • The final SELECT SUBSTR strips the leading '0x' and displays the result.
Here's what the first 1000 characters look like, representing the first 500 bytes of boat.jpg:
         1         2         3         4         5
12345678901234567890123456789012345678901234567890
--------------------------------------------------
ffd8ffe000104a46494600010200006400640000ffe10d7168
7474703a2f2f6e732e61646f62652e636f6d2f7861702f312e
302f003c3f787061636b657420626567696e3d22efbbbf2220
69643d2257354d304d7043656869487a7265537a4e54637a6b
633964223f3e0a3c783a786d706d65746120786d6c6e733a78
3d2261646f62653a6e733a6d6574612f2220783a786d70746b
3d22332e312e322d313133223e0a203c7264663a5244462078
6d6c6e733a7264663d22687474703a2f2f7777772e77332e6f
72672f313939392f30322f32322d7264662d73796e7461782d
6e7323223e0a20203c7264663a4465736372697074696f6e20
7264663a61626f75743d22220a20202020786d6c6e733a7469
66663d22687474703a2f2f6e732e61646f62652e636f6d2f74
6966662f312e302f220a20202020786d6c6e733a70686f746f
73686f703d22687474703a2f2f6e732e61646f62652e636f6d
2f70686f746f73686f702f312e302f220a20202020786d6c6e
733a7861703d22687474703a2f2f6e732e61646f62652e636f
6d2f7861702f312e302f220a20202020786d6c6e733a657869
663d22687474703a2f2f6e732e61646f62652e636f6d2f6578
69662f312e302f220a202020746966663a496d616765576964
74683d22343437220a202020746966663a496d6167654c656e
Oh, yeah... here's what it really looks like:

Tuesday, December 16, 2008

Refactoring Foxhound: Logging Exceptions

My previous post Refactoring Foxhound: Logging talked about two kinds of diagnostic logging,

  • program trace data written to the SQL Anywhere console log text file during development, and

  • errors and exceptions written to a permanent table by both development and delivery builds,
but it only presented code for the first kind.

Here's an example of an exception displayed by the Foxhound "Display Diagnostics" button:



Here is the full text of the message, wrapped to fit on screen. To the end user, the interesting part to the end user is the ERRORMSG "...cannot access the file". However, for the purposes of this article the interesting part is the substring "610a5c":
1 - 2008-10-27 14:54:48.281 
Debug Build 3324a 14 610a5c(610eh1)
SQLCODE = -602, SQLSTATE = WL002, ERRORMSG()
= Cannot access file 'C:\projects\foxhound\dbping_p001.tmp'
-- The process cannot access the file because it is being
used by another process.
From Foxhound's point of view, this exception is a minor glitch. Foxhound executed the dbping.exe utility to gather some information about the connection to the target database, and it could not display that information; i.e., this section of Foxhound's "Facts & Figures" display does not appear:



From the client's point of view, it may be more interesting; in some cases the culprit is a firewall getting in the way between Foxhound and the target database.

But... enough about the actual exception; we're here to see how it is trapped and how it is handled.

The following excerpt comes from the stored procedure that detected the exception; only the code specific to exception handling is shown, NOT the actual code that raised the exception:
  1 CREATE PROCEDURE rroad_dbping (
2 IN @p_proxy_owner VARCHAR ( 128 ),
3 IN @p_connection_string LONG VARCHAR,
4 OUT @p_connect_using_message LONG VARCHAR,
5 OUT @p_connected_to_message LONG VARCHAR,
6 OUT @p_dbping_output LONG VARCHAR )
7 BEGIN -- rroad_dbping
8
9 ...
10
11 DECLARE @sqlcode INTEGER;
12 DECLARE @sqlstate VARCHAR ( 5 );
13 DECLARE @errormsg VARCHAR ( 32767 );
14 DECLARE @diagnostic_string VARCHAR ( 255 );
15
16 ...
17
18 SET @diagnostic_location = '610a5b';
19
20 ...
21
22 SET @diagnostic_location = '610a5c';
23
24 ... [[[[[ the code that raised an EXCEPTION ]]]]]
25
26 SET @diagnostic_location = '610a6';
27
28 ...
29
30 EXCEPTION
31
32 WHEN OTHERS THEN
33
34 SELECT SQLCODE, SQLSTATE, ERRORMSG()
35 INTO @sqlcode, @sqlstate, @errormsg;
36
37 SET @p_connect_using_message = '';
38 SET @p_connected_to_message = '';
39 SET @p_dbping_output = '';
40
41 CALL rroad_exception ( STRING (
42 @diagnostic_location, '(610eh1)',
43 ' SQLCODE = ', @sqlcode,
44 ', SQLSTATE = ', @sqlstate,
45 ', ERRORMSG() = ', @errormsg ) );
46
47 END; -- rroad_dbping
Here's how it works: Lines 18, 22 and 26 (and many others in the actual code) record the "current location" in the code, during execution, so that information may be saved when an exception occurs. The hand-coded @diagnostic_location values are alphanumeric, starting with the module (file) number "610", followed by increments "a1", "a2", ... "a9", "b1", etc. When new program logic is inserted, say between 610a5 and 610a6, more characters are added: 610a5b, 610a5c and so on.

There's nothing special about that encoding technique, except for the fact it's easy to hand-code new values that are globally unique without having to keep track of a "last used value" or anything like it.

A global search for "610a5c" in all of the Foxhound source code will find exactly one location, and will narrow the search quite a bit.
It would be nice if SQL exception diagnostics showed the line numbers inside procedures, triggers and so on, but sometimes life is harsh.

And yes, there all these SET @diagnostic_location statements do have an effect on performance. I've never been able to measure the effect, but I have also avoided placing the SETs inside tight loops.
In this example, the exception was raised somewhere between lines 22 and 26. We know this because the @diagnostic_location value '610a5c' is passed to rroad_exception in line 33, and it ends up being displayed in the message text shown earlier "...610a5c...".

The exception is trapped by the one single EXCEPTION block in the entire module, starting at line 30. Other, longer stored procedures have many nested BEGIN ... EXCEPTION ... END blocks, but this procedure only has one.
The hard-coded literal '(610eh1)' serves to uniquely identify the EXCEPTION handler itself. If this module had more than one, the others would be numbered '(610eh2)' and so on. The most exception handlers in any single Foxhound procedure is 14.
The WHEN OTHERS THEN construction is shorthand for "trap everything". It is immediately followed by the SELECT at line 34 that saves the current values of SQLCODE, SQLSTATE and ERRORMSG(); it is important to do this right away inside the EXCEPTION block before any other code has a chance to change the values.

The three lines starting at line 37 are this module's way of graciously handling all exceptions: all the OUT parameters are set to empty strings to tell the caller "nothing to display".

The CALL to rroad_exception at line 41 passes all the diagnostic information gathered so far to a procedure that will store it in this table:
CREATE TABLE rroad_exception (
exception_id BIGINT NOT NULL
DEFAULT AUTOINCREMENT
PRIMARY KEY,
exception_diagnostic LONG VARCHAR NOT NULL );
Here's the code for the procedure, and for the corresponding EVENT that it triggers:
  1 CREATE PROCEDURE rroad_exception ( 
2 IN @diagnostic_text LONG VARCHAR ) -- may be truncated if it doesn't fit
3 BEGIN
4
5 DECLARE @build_number_fork VARCHAR ( 10 );
6 DECLARE @exception_diagnostic LONG VARCHAR;
7
8 ... [some code to fill @build_number_fork]
9
10 BEGIN -- ignore further exception
11
12 -- Build diagnostic string and truncate to
13 -- "page size - 100" limitation on event parameters
14
15 SET @exception_diagnostic = LEFT (
16 STRING ( CURRENT TIMESTAMP,
17 IF '{DEBUG_MESSAGES}' = 'ON'
18 THEN ' Debug Build '
19 ELSE ' Full Build '
20 ENDIF,
21 @build_number_fork,
22 CONNECTION_PROPERTY ( 'Number' ),
23 ' ',
24 @diagnostic_text ),
25 CAST ( PROPERTY ( 'PageSize' ) AS BIGINT ) - 100 );
26
27 MESSAGE STRING ( 'DIAG EXCEPTION ',
28 @exception_diagnostic ) TO CONSOLE DEBUG ONLY;
29
30 TRIGGER EVENT rroad_exception (
31 @p_exception_diagnostic
32 = @exception_diagnostic );
33
34 EXCEPTION WHEN OTHERS THEN
35
36 END; -- ignore further exception
37
38 END; -- rroad_exception
39
40 CREATE EVENT rroad_exception
41 HANDLER BEGIN
42
43 DECLARE @p_exception_diagnostic LONG VARCHAR;
44
45 ...
46
47 -- Store exception data using separate connection to commit.
48
49 BEGIN -- ignore further exception
50
51 SET @p_exception_diagnostic = COALESCE (
52 EVENT_PARAMETER ( '@p_exception_diagnostic' ),
53 'NULL' );
54
55 INSERT rroad_exception ( exception_diagnostic )
56 VALUES ( @p_exception_diagnostic );
57
58 COMMIT;
59
60 EXCEPTION WHEN OTHERS THEN
61
62 END; -- ignore further exception
63
64 END; -- rroad_exception
Here's how it works, and why an EVENT is included:
  • The SET at line 15 adds some more data (timestamp, Foxhound build number, etc) to the incoming @diagnostic_text, and then right-truncates the result to the length that can be passed as a parameter to an event.

  • The TRIGGER EVENT at line 30 uses the special syntax necessary for passing parameters to events.

  • The SET at line 51 uses the special EVENT_PARAMETER function to receive the parameter that was passed to the event.

  • The INSERT and COMMIT at lines 55 and 58 do the real work... on the separate connection started by the event so the data gets saved even if the caller does a ROLLBACK.
There are many EXCEPTION handlers in Foxhound. Some of them call rroad_exception as shown here, some of them check the SQLCODE first to see if it's on a "do not call" list; i.e., the exception is both expected and uninteresting. Some of the handlers return immediately to the caller, others carry on executing the program logic following the BEGIN block. Still others don't call rroad_exception at all, they just silently handle the exception.

Friday, December 12, 2008

Refactoring Foxhound: Logging

Jeff Atwood's post The Problem With Logging got me thinking: I should talk about diagnostic logging inside Foxhound.

(This being a blog about databases, I have to make one thing clear: diagnostic logging has nothing to do with transaction logging.)
Foxhound creates two kinds of diagnostic logs: large amounts program trace data are written to the SQL Anywhere console log text file during development, and a small number of errors and exceptions are written to a permanent table by both development and delivery builds (those errors and exceptions are also written to the console log during development).
(The location and size of the console log file is controlled by these command line options: dbeng11 -o filespec.txt -os 10M)
Jeff says "I am not anti-logging. I am anti-abusive-logging." That said, Jeff has five main points to make, five problems he sees with the way programmers often do diagnostic logging... let's see how those points apply to Foxhound (Jeff's points are in "Bold italics", my comments follow):

"Logging means more code."

All the tracing logic in Foxhound is hand crafted, none of it is automatically generated. That reduces the amount of superfluous logging, output that will never be useful... most instances of tracing logic have been added to Foxhound to help debug some specific problem, or added to sections of code that have had a history of problems.

The Monitor section of Foxhound has been particularly hard hit, due to it's multi-threaded and timing dependent nature, so that's where the most interesting tracing logic exists.

Once added, tracing logic tends to remain in the code until it proves to be annoying... more on this later.

"Logging isn't free."

All of the tracing logic uses SQL Anywhere's MESSAGE ... FOR DEBUG feature, and it's enabled only for development builds, not delivery. Over time, however, some large sections of tracing logic have been removed from Foxhound to avoid performance penalties during development... sometimes deleted, sometimes commented-out in case it becomes useful again during periods of heavy maintenance.

That only applies to the development build, however. The execution profiler has never shown any bottlenecks in the delivery build that were caused by disabled MESSAGE ... FOR DEBUG statements.

"If it's worth saving to a logfile, it's worth showing in the user interface."

Foxhound doesn't (intentionally) fail, there's no concept of "fatal" errors. The design calls for all exceptions to be handled one way or another: Some are "normal" such as the failure to connect to a target database (which is shown to the user), or the failure to complete some SQL operation because Foxhound is shutting down (which is not shown to the user).

Other exceptions are "abnormal" (unexpected and/or unheard of) and are worth recording for possible future investigation; those are recorded in a permanent table. They are the reason why the Foxhound Options page has buttons labelled Display Diagnostics, Export Diagnostics and Delete Diagnostics.

The vast majority of diagnostic logging data, however, is tracing data written to the console log, and only during development. NONE of that tracing data is written by the delivery build, and there is no Foxhound user interface to display it... during testing, Wordpad is the tracing data UI of choice.

"The more you log, the less you can find."

That's absolutely true; that's the most frequent reason various pieces of tracing logic are removed from Foxhound: to keep the size of the console log file humanly manageable. Sometimes, the pieces are commented-out ("Who knows when this #$%&!%# procedure will give trouble again!"), sometimes they are permanently expunged ("No, this inner-inner-inner loop works now, I'm never gonna need that ten-thousand-line trace again!")

"The logfile that cried Wolf" ... i.e., serious errors will go unnoticed in huge logfiles that contain too much unimportant information.

Sure that's true, if the huge log file is the only place serious errors are recorded. That's not the case with Foxhound, however... that's what the permanent table is for.



I've got some things to say myself, about logging...
  • If logging helps, do it. If it hurts, stop.

  • Diagnostic logging is neither good, nor bad, per se. It's a tool which can be used for good or evil, it's your choice.

  • Logging contributes nothing to the end product; no customer is going to be impressed by the elegance and thoroughness of your logs.

  • And last but not least: Debuggers are great, especially the one built in to SQL Anywhere, but they are not a silver bullet.
All debuggers come with a considerable amount of human procedural overhead. I'm not anti-debugger... I once bought a hardware circuit board to debug an IBM PC timer interrupt handler: $800 in 1988 dollars, a week to figure out how to make it work, one single "Aha!" moment, one single line of code to fix, and then it was never used again.

Debuggers are the MRI machines of the programming world, invaluable for some problems but unnecessary for most. In most cases nothing beats a simple MESSAGE statement. In many cases even that's not required, not if you know where the error occurred and you can see the SQLCODE and ERRORMSG().



Show Me The Code!

Here's how it diagnostic tracing works, in Foxhound, starting with the Windows batch file that runs the development build. The heart of this batch file is the dbisql command which launches the debug build SQL file called 004_rroad_build_debug.sql:
Note: The ^ character is the line continuation character for Windows command files.
"%SQLANY11%\Bin32\dbisql.com"^
-c "ENG=foxhound1;DBN=f;UID=dba;PWD=sql"^
READ ENCODING Cp1252 004_rroad_build_debug.sql^
[ON] [ON] [Beta 1.09] [a] [10]
The five [values] inside square braces are parameter values passed to dbisql for the five named parameters inside the file.

Here's what the corresponding dbisql command looks like in the delivery build batch file; different SQL file, different parameter values:
"%SQLANY11%\Bin32\dbisql.com"^
-c "ENG=foxhound1;DBN=f;UID=dba;PWD=sql"^
READ ENCODING Cp1252 004_rroad_build_for_delivery.sql^
[OFF] [OFF] [Beta 1.09] [a] [10]
Now back to the debug build; here are some excerpts from the 004_rroad_build_debug.sql file:
PARAMETERS DEBUG_MESSAGES, 
MONITOR_DEBUG_MESSAGES,
VERSION_PREFIX,
BUILD_FORK,
MAX_PROXY_OWNER_COUNT;

...

READ ENCODING Cp1252 010_rroad_setup.sql
[{DEBUG_MESSAGES}]
[{VERSION_PREFIX}]
[{BUILD_FORK}]

[{MAX_PROXY_OWNER_COUNT}];
...

READ ENCODING Cp1252 203_rroad_monitor_sample_loop.sql
[{MONITOR_DEBUG_MESSAGES}];

...
The PARAMETERS command is not part of SQL, it's a command specific to dbisql that gives names to different [values] passed from the command line. The series of READ statements inside 004_rroad_build_debug.sql tell dbisql to process each named *.sql file in turn, passing on whatever parameter [values] they require.

Inside the [square braces], the names enclosed inside {curly braces} are references back to the PARAMETERS statement.

For example, MONITOR_DEBUG_MESSAGES parameter is second on the dbisql command line, and it is passed on as the first parameter to the file 203_rroad_monitor_sample_loop.sql. For the debug build, the value is ON. For the delivery build batch file, and the corresponding delivery SQL file, the value is OFF.

Here's an excerpt from 203_rroad_monitor_sample_loop.sql:
PARAMETERS MONITOR_DEBUG_MESSAGES;

...

CREATE EVENT rroad_monitor_sample_loop
HANDLER BEGIN

...

SET TEMPORARY OPTION DEBUG_MESSAGES
= '{MONITOR_DEBUG_MESSAGES}';

...

MESSAGE STRING ( 'DIAG ', CURRENT TIMESTAMP, ' ',
CONNECTION_PROPERTY ( 'Number' ),
' rroad_monitor_sample - Stop Sampling for ',
@sampling_id ) TO CONSOLE DEBUG ONLY;

...
Since every EVENT and web SERVICE runs on its own connection, every event and service block is responsible for its own connection-level SET TEMPORARY OPTION commands... hence the PARAMETERS command in this SQL file. The handling of {curly brace values} is a pre-compile string substitution process performed by dbisql before the code is passed to SQL Anywhere for processing; here's the SET command after the substitution:
SET TEMPORARY OPTION DEBUG_MESSAGES 
= 'ON';
For many applications, setting DEBUG_MESSAGES is not that complex: The simplest approach is to set it ON and OFF one time, at the database level, and be done with it:
SET OPTION PUBLIC.DEBUG_MESSAGES = 'ON'; -- or 'OFF'
In a client server environment, your application could pass the following command to the engine after starting a new connection:
SET TEMPORARY OPTION DEBUG_MESSAGES = 'ON';
That might not work too well in a connection pooling environment; for that, it's probably best to turn it ON and OFF at the database level as described above.

Foxhound, however, is built entirely around EVENT and web SERVICE blocks; there are no client server database connections, no connection pools, no applications in the traditional sense, just the web browser.

The MESSAGE statement shown above contains these basic components:
  • A call to STRING to build the message text.

  • The 'DIAG ' substring acts as a eyecatcher so you can tell diagnostic text apart from checkpoint messages and other data in the log.

  • The CURRENT TIMESTAMP is often useful when looking for needles in the haystack.

  • The CONNECTION_PROPERTY ( 'Number' ) value is helpful with applications that have many simultaneous connections; e.g., busy applications with many users, and Foxhound with its myriad events and web services.

  • The rest of the data passed to STRING is the actual diagnostic data, unique to each MESSAGE statement.

  • The TO CONSOLE DEBUG ONLY clauses control where and when the data is written; for more information see here.

Thursday, December 11, 2008

How I Feel About Web 2.0

Wednesday, December 10, 2008

The Importance of COMPUTE

In Monday's Windows-Style ORDER BY I answered the question "How do I use ORDER BY to sort alphanumeric strings the same way Windows Explorer sorts file names?" as follows:

"By using CREATE FUNCTION to return an alternative string value that will have the desired result when used in ORDER BY, and then (optionally) calling that function in a new COMPUTE column."
I really should have left out "(optionally)"... the COMPUTE clause isn't just some cute technique to push program code down into the database, it's a valuable performance improvement tool. And without it, in this case, queries run about as fast as continental drift.

Here is an example of a table without the COMPUTE column, followed by a query that refers to the user-defined function in the ORDER BY (see Windows-Style ORDER BY for the reformat code):
CREATE TABLE slow (
original_name LONG VARCHAR NOT NULL PRIMARY KEY );

SELECT TOP 100 START AT 200 *
FROM slow
ORDER BY reformat ( slow.original_name );
When the table contained 1000 rows and the cache was empty, SQL Anywhere 11 took over 7 seconds to run the query on a 3.4GHz Pentium 4 computer.

Here's what the plan looked like... a sequential table scan, no surprises there:



The table scan is just the beginning of the problem; it doesn't take 7 seconds to sort 1000 rows, but it does take that time to call the user defined function 1000 times.

Now consider a table with the COMPUTE column, plus a CLUSTERED INDEX on that column, followed by a query that refers to that column in the ORDER BY:
CREATE TABLE fast (
original_name LONG VARCHAR NOT NULL PRIMARY KEY,
alternate_name LONG VARCHAR NOT NULL
COMPUTE ( reformat ( original_name ) ) );

CREATE CLUSTERED INDEX xalternate_name
ON fast ( alternate_name );

SELECT TOP 100 START AT 200 *
FROM fast
ORDER BY fast.alternate_name;
On the same 1000 rows, same empty cache, same computer, the query now took less than 1 second. Now the plan looked better, an index scan:



But wait, there's more! If you code the ORDER BY to use the function call instead of the computed column, SQL Anywhere STILL finds the index and uses it, and it STILL takes less than one second:
SELECT TOP 100 START AT 200 *
FROM fast
ORDER BY reformat ( fast.original_name );
I kid you not; here's the plan... fast.alternate_name does not appear anywhere in the SELECT but the index xalternate_name is still being used:



There is a down side to this COMPUTE clause: the user defined function must be called for each row that is inserted or updated. In this test it took almost 7 seconds to run a LOAD TABLE with COMPUTES ON... but that's a one-time cost, whereas queries tend to be repeated over and over again:
LOAD TABLE fast ( original_name ) 
FROM 'file_names.txt'
COMPUTES ON;

Tuesday, December 9, 2008

The End of NNTP?

Does the creation of the web-based Google Group SQL Anywhere Web Development mark the beginning of the end for NNTP-based newsgroup forums like sybase.public.sqlanywhere.general on forums.sybase.com?



This ain't your father's Google Groups... this is not just an HTTP front-end on top of NNTP data from sybase.public.sqlanywhere.web, it's a whole new implementation: different input, different data store, different display.

Web interfaces generally suck when it comes to viewing threaded conversations because they are slow, awkward to navigate, and (did I mention it?) slow. One of the absolute worst is Sybase's own "Community Forums" web interface to the NNTP newsgroups; check it out at sybase.public.sqlanywhere.web.

Even the much-hyped stackoverflow.com website sucks. Like the new Google Groups, it's a whole new implementation... but go ahead, try to find what you're looking for.

With Google Groups, searching has never been a problem. The new interface at SQL Anywhere Web Development lets you easily post messages via email, and you can reduce the suckage with these tips:

  • open each new thread in a new tab, and

  • set Options - View as tree.
If NNTP is going to die, if we have to use web browsers to carry on a conversation, I'd rather it be Google Groups than any of the alternatives.

Monday, December 8, 2008

Windows-Style ORDER BY

Question: How do I use ORDER BY to sort alphanumeric strings the same way Windows Explorer sorts file names?

Answer: By using CREATE FUNCTION to return an alternative string value that will have the desired result when used in ORDER BY, and then (optionally) calling that function in a new COMPUTE column.

Microsoft describes the modern Windows file name sort order in an article with this breathtakingly long title: The sort order for files and folders whose names contain numerals is different in Windows Vista, Windows XP, and Windows Server 2003 than it is in Windows 2000:

"By default, the newer sort order considers strings in file and folder names as numeric content, not text. Numerals in folder and file names are sorted according to their numeric value."
The article goes on to describe how to hack with the registry to get the old sort order back, but let's not go there :)

Here are some file names as sorted by Windows XP Explorer...
a5.1.txt
a5.02.txt
a5.003.txt
Ie4_01
Ie4_128
Ie5
Ie6
Ie401sp2
Ie501sp2
z8A88.txt
z9A99.txt
z88A8.txt
z99A9.txt
Here's how the traditional ORDER BY sorts those strings in SQL Anywhere 11...
( ...which is exactly the same order produced by the DOS DIR /ON command under Windows XP, but let's ignore that, too :)
'a5.003.txt'
'a5.02.txt'
'a5.1.txt'
'Ie4_01'
'Ie4_128'
'Ie401sp2'
'Ie5'
'Ie501sp2'
'Ie6'
'z88A8.txt'
'z8A88.txt'
'z99A9.txt'
'z9A99.txt'
Here is a test table with two string columns, one for the original name and the other to be used in the ORDER BY clause:
CREATE TABLE sorted (
original_name LONG VARCHAR NOT NULL PRIMARY KEY,
alternate_name LONG VARCHAR NOT NULL
COMPUTE ( reformat ( original_name ) ) );
The alternate_name column is defined as an automatic COMPUTE column based on this expression:
   reformat ( original_name )
where reformat performs the following transformation:
The original string is parsed from left to right looking for numeric and non-numeric substrings. Each non-numeric substring is returned as-is. Each numeric substring is right justified with leading zero characters to some long length (say 40 digits, although 5 is the number used here).
Here's what the function looks like:
CREATE FUNCTION reformat (
IN @original_name LONG VARCHAR )
RETURNS LONG VARCHAR
DETERMINISTIC
BEGIN

DECLARE @alternate_name LONG VARCHAR;
DECLARE @justify_to INTEGER;
DECLARE @pos BIGINT;
DECLARE @state VARCHAR ( 20 );
DECLARE @string_part LONG VARCHAR;
DECLARE @char VARCHAR ( 1 );

DECLARE LOCAL TEMPORARY TABLE string_part (
string_part_number BIGINT NOT NULL
DEFAULT AUTOINCREMENT
PRIMARY KEY,
string_part LONG VARCHAR NOT NULL )
NOT TRANSACTIONAL;

SET @justify_to = 5; -- use 40 if you want to be sure
SET @pos = 1;
SET @state = 'start';
SET @string_part = '';

WHILE @pos <= LENGTH ( @original_name ) LOOP

SET @char = SUBSTR ( @original_name, @pos, 1 );

IF @char LIKE '[0-9]' THEN
IF @state = 'non-numeric' THEN
INSERT string_part ( string_part )
VALUES ( @string_part );
SET @string_part = '';
END IF;
SET @string_part = @string_part + @char;
SET @state = 'numeric';
ELSE
IF @state = 'numeric' THEN
INSERT string_part ( string_part )
VALUES ( @string_part );
SET @string_part = '';
END IF;
SET @string_part = @string_part + @char;
SET @state = 'non-numeric';
END IF;
SET @pos = @pos + 1;
END LOOP;

IF @state = 'start' THEN
RETURN '';
ELSE
INSERT string_part ( string_part )
VALUES ( @string_part );
SELECT LIST (
IF LEFT ( string_part.string_part, 1 )
LIKE '[0-9]'
THEN RIGHT (
STRING (
REPEAT ( '0', @justify_to ),
string_part.string_part ),
@justify_to )
ELSE string_part.string_part
ENDIF,
'' ORDER BY string_part_number )
INTO @alternate_name
FROM string_part;
RETURN @alternate_name;
END IF;

END;
Here are some notes on the code:
  • The string_part table numbers and stores each numeric and non-numeric substring in a separate row so they can be reformatted and gathered back together again via the LIST function.

  • @justify_to controls the width of the right-justified numeric strings.

  • @pos steps through each @char in the @original_name string.

  • @state implements a state-driven parsing loop, starting with @state = 'start' and then flipping back and forth between 'numeric' and 'non-numeric'.

  • @string_part is used to gather up all the @char values with the same value of @state.

  • The REPEAT function is used pad each numeric substring with an enormous number of leading '0' characters, and then the RIGHT function is used to strip off the excess '0' characters from the front of the string.

  • The LIST function is used to gather all the reformatted substrings back together into the single @alternate_name return value.

  • And last but not least, if the input is the empty string, so is the output.
Here's what the test looks like:
INSERT sorted ( original_name ) VALUES ( 'a5.1.txt' );
INSERT sorted ( original_name ) VALUES ( 'a5.02.txt' );
INSERT sorted ( original_name ) VALUES ( 'a5.003.txt' );
INSERT sorted ( original_name ) VALUES ( 'Ie4_01' );
INSERT sorted ( original_name ) VALUES ( 'Ie4_128' );
INSERT sorted ( original_name ) VALUES ( 'Ie5' );
INSERT sorted ( original_name ) VALUES ( 'Ie6' );
INSERT sorted ( original_name ) VALUES ( 'Ie401sp2' );
INSERT sorted ( original_name ) VALUES ( 'Ie501sp2' );
INSERT sorted ( original_name ) VALUES ( 'z8A88.txt' );
INSERT sorted ( original_name ) VALUES ( 'z9A99.txt' );
INSERT sorted ( original_name ) VALUES ( 'z88A8.txt' );
INSERT sorted ( original_name ) VALUES ( 'z99A9.txt' );
COMMIT;

SELECT * FROM sorted ORDER BY sorted.alternate_name;

original_name, alternate_name
'a5.1.txt', 'a00005.00001.txt'
'a5.02.txt', 'a00005.00002.txt'
'a5.003.txt', 'a00005.00003.txt'
'Ie4_01', 'Ie00004_00001'
'Ie4_128', 'Ie00004_00128'
'Ie5', 'Ie00005'
'Ie6', 'Ie00006'
'Ie401sp2', 'Ie00401sp00002'
'Ie501sp2', 'Ie00501sp00002'
'z8A88.txt', 'z00008A00088.txt'
'z9A99.txt', 'z00009A00099.txt'
'z88A8.txt', 'z00088A00008.txt'
'z99A9.txt', 'z00099A00009.txt'
...which is the same as the way Windows XP Explorer does it:
a5.1.txt
a5.02.txt
a5.003.txt
Ie4_01
Ie4_128
Ie5
Ie6
Ie401sp2
Ie501sp2
z8A88.txt
z9A99.txt
z88A8.txt
z99A9.txt
Note that the reformat function translates substrings like 5.02 into 00005.00002, resulting in a different sort order than if the 5.02 was treated as a single decimal number and translated into 00005.02. If you want the latter functionality, the specs for the reformat function might have to be changed to something like this:
The original string is parsed from left to right looking for numeric and non-numeric substrings. Each non-numeric substring is returned as-is. Each numeric substring that does not follow a dot '.' is right justified with leading zero characters to some long length. Each numeric substring that does follow a dot '.' is returned as is.

Sunday, December 7, 2008

Who is "bowser"?



Maybe it belongs to Vista's "Cancel or Allow" guy.

Popular Pages

Here are the top 2 most popular pages in this blog, as compiled by StatCounter.com... at least as far as direct visits are concerned:

IS NULL Versus NOT EXISTS has 16% of the total direct hits, whereas the Number Two posting OPENXML() Rocks! has only 6%.

Rounding out the top 5 positions, at 5%, 5% and 4% respectively, are SQL Anywhere Is Free!, Unpublished MySQL FAQ and URL Forwarding Via Reverse Proxy.

Saturday, December 6, 2008

Visit Counts

Here are the lastest counts of visitors to this blog, as compiled by StatCounter.com:

Friday, December 5, 2008

Rotating Database Backups

Personally, I think the SQL EVENT is a wonderful thing... BUT, I still prefer running dbbackup from command (batch) files that are scheduled via cron (for Unix and Linux) or Scheduled Tasks (for Windows), instead of executing BACKUP commands from inside scheduled database events.

My preference still stands today, even though Version 11 of SQL Anywhere introduced "Maintenance Plans" in Sybase Central to make it easier for you to create SQL Anywhere events that perform database backups and validations.

Here's why I prefer batch files over events, for database backups:

  • It's easier to write a batch file that records its progress in a text file.

  • It's easier to make an ad-hoc run of a batch file... just doubleclick on the file name.

  • It's easier to write batch file commands that copy, rename and delete files.

  • It's just as easy to create a schedule via Windows Scheduled Tasks as it is via Sybase Central.

  • Batch files can be administered without database access or privileges.

  • More people are familiar with operating system commands than database operations.
Some time ago I wrote a template Windows batch file to give to clients who want to set up scheduled database backups... just one batch file, plus one document explaining in detail how to demonstrate the workings of the batch file.

The actual batch file is listed at the end of this blog posting; here are the contents of the document called "Backup a SQL Anywhere Database Keeping Three Generations of Backups":


1. Pick a folder to contain the command file and the backup subfolders.

Put the backup command file in that folder.

For example: C:\dbbackup\run_dbbackup_full.bat


2. Edit the backup command file.

Change the connection string so that dbbackup can connect to the consolidated database:
Note: The ^ character is the line continuation character for Windows command files.
"%ASANY9%\win32\dbbackup.exe"^
-c "ENG=ddd;DBN=ddd;UID=dba;PWD=sql"^
-o bkup\dbbackup_log.txt -x -y bkup\generation3

3. Run the backup for the first time.

Here's what will happen:
  • The bkup subfolder will be created; e.g., C:\dbbackup\bkup

  • The diagnostic text file will be started; e.g., C:\dbbackup\bkup\dbbackup_log.txt

  • The "most recent backup" subfolder will be created: e.g., C:\dbbackup\bkup \generation3

  • Backup copies of the database and transaction log will be written to the generation3 subfolder

  • The original transaction log will deleted and recreated.
The diagnostic text file bkup\dbbackup_log.txt will look like this...
***************************************************************************************** 
Full dbbackup started
20/04/2007
10:04 AM
Adaptive Server Anywhere Backup Utility Version 9.0.2.3456
(599 of 599 pages, 100% complete)
(49 of 49 pages, 100% complete)
Transaction log renamed to: 070420AA.LOG
Database backup completed
20/04/2007
10:04 AM
Volume in drive C has no label.
Volume Serial Number is 5E13-A7B2

Directory of C:\projects\$SA_templates\run\dbbackup
\demo_full_backup9\bkup

20/04/2007 10:04 AM <DIR> .
20/04/2007 10:04 AM <DIR> ..
20/04/2007 10:04 AM 371 dbbackup_log.txt
20/04/2007 10:04 AM <DIR> generation3
1 File(s) 371 bytes

Directory of C:\projects\$SA_templates\run\dbbackup
\demo_full_backup9\bkup\generation3

20/04/2007 10:04 AM <DIR> .
20/04/2007 10:04 AM <DIR> ..
20/04/2007 10:04 AM 2,453,504 ddd.db
20/04/2007 10:04 AM 200,704 ddd.log
2 File(s) 2,654,208 bytes

Total Files Listed:
3 File(s) 2,654,579 bytes
5 Dir(s) 6,716,006,400 bytes free
Full dbbackup OK
20/04/2007
10:04 AM

4. Run the backup for the second time.

Here's what will happen:
  • More messages will be written to the diagnostic text; e.g., C:\dbbackup\bkup\dbbackup_log.txt

  • The previous "generation3" subfolder will be renamed: e.g., C:\dbbackup\bkup \generation2

  • The "most recent backup" subfolder will be created: e.g., C:\dbbackup\bkup \generation3

  • Backup copies of the database and transaction log will be written to the generation3 subfolder

  • The original transaction log will deleted and recreated.
Here are the new messages written to the diagnostic text file bkup\dbbackup_log.txt...
***************************************************************************************** 
Full dbbackup started
20/04/2007
10:35 AM
Adaptive Server Anywhere Backup Utility Version 9.0.2.3456
(599 of 599 pages, 100% complete)
(2 of 2 pages, 100% complete)
Transaction log renamed to: 070420AB.LOG
Database backup completed
20/04/2007
10:35 AM
Volume in drive C has no label.
Volume Serial Number is 5E13-A7B2

Directory of C:\projects\$SA_templates\run\dbbackup
\demo_full_backup9\bkup

20/04/2007 10:35 AM <DIR> .
20/04/2007 10:35 AM <DIR> ..
20/04/2007 10:35 AM 1,611 dbbackup_log.txt
20/04/2007 10:04 AM <DIR> generation2
20/04/2007 10:35 AM <DIR> generation3
1 File(s) 1,611 bytes

Directory of C:\projects\$SA_templates\run\dbbackup
\demo_full_backup9\bkup\generation2

20/04/2007 10:04 AM <DIR> .
20/04/2007 10:04 AM <DIR> ..
20/04/2007 10:04 AM 2,453,504 ddd.db
20/04/2007 10:04 AM 200,704 ddd.log
2 File(s) 2,654,208 bytes

Directory of C:\projects\$SA_templates\run\dbbackup
\demo_full_backup9\bkup\generation3

20/04/2007 10:35 AM <DIR> .
20/04/2007 10:35 AM <DIR> ..
20/04/2007 10:04 AM 2,453,504 ddd.db
20/04/2007 10:35 AM 8,192 ddd.log
2 File(s) 2,461,696 bytes

Total Files Listed:
5 File(s) 5,117,515 bytes
8 Dir(s) 6,712,197,120 bytes free
Full dbbackup OK
20/04/2007
10:35 AM

5. Run the backup for the third time.

Here's what will happen:
  • More messages will be written to the diagnostic text; e.g., C:\dbbackup\bkup\dbbackup_log.txt

  • The previous "generation2" subfolder will be renamed: e.g., C:\dbbackup\bkup \generation1

  • The previous "generation3" subfolder will be renamed: e.g., C:\dbbackup\bkup \generation2

  • The "most recent backup" subfolder will be created: e.g., C:\dbbackup\bkup \generation3

  • Backup copies of the database and transaction log will be written to the generation3 subfolder

  • The original transaction log will deleted and recreated.
Here are the new messages written to the diagnostic text file bkup\dbbackup_log.txt...
***************************************************************************************** 
Full dbbackup started
20/04/2007
10:37 AM
Adaptive Server Anywhere Backup Utility Version 9.0.2.3456
(599 of 599 pages, 100% complete)
(2 of 2 pages, 100% complete)
Transaction log renamed to: 070420AC.LOG
Database backup completed
20/04/2007
10:37 AM
Volume in drive C has no label.
Volume Serial Number is 5E13-A7B2

Directory of C:\projects\$SA_templates\run\dbbackup
\demo_full_backup9\bkup

20/04/2007 10:37 AM <DIR> .
20/04/2007 10:37 AM <DIR> ..
20/04/2007 10:37 AM 3,223 dbbackup_log.txt
20/04/2007 10:04 AM <DIR> generation1
20/04/2007 10:35 AM <DIR> generation2
20/04/2007 10:37 AM <DIR> generation3
1 File(s) 3,223 bytes

Directory of C:\projects\$SA_templates\run\dbbackup
\demo_full_backup9\bkup\generation1

20/04/2007 10:04 AM <DIR> .
20/04/2007 10:04 AM <DIR> ..
20/04/2007 10:04 AM 2,453,504 ddd.db
20/04/2007 10:04 AM 200,704 ddd.log
2 File(s) 2,654,208 bytes

Directory of C:\projects\$SA_templates\run\dbbackup
\demo_full_backup9\bkup\generation2

20/04/2007 10:35 AM <DIR> .
20/04/2007 10:35 AM <DIR> ..
20/04/2007 10:04 AM 2,453,504 ddd.db
20/04/2007 10:35 AM 8,192 ddd.log
2 File(s) 2,461,696 bytes

Directory of C:\projects\$SA_templates\run\dbbackup
\demo_full_backup9\bkup\generation3

20/04/2007 10:37 AM <DIR> .
20/04/2007 10:37 AM <DIR> ..
20/04/2007 10:04 AM 2,453,504 ddd.db
20/04/2007 10:37 AM 8,192 ddd.log
2 File(s) 2,461,696 bytes

Total Files Listed:
7 File(s) 7,580,823 bytes
11 Dir(s) 6,709,657,600 bytes free
Full dbbackup OK
20/04/2007
10:37 AM

6. Set up the command file to run repeatedly.

Use Start - All Programs - Accessories - System Tools - Scheduled Tasks - Add Scheduled Task

Browse to pick the command file run_dbbackup_full.bat

Pick: Daily, Start Time: 1:00 AM, Every Day

Here's what will happen when the scheduled task runs:
  • More messages will be written to the diagnostic text; e.g., C:\dbbackup\bkup\dbbackup_log.txt

  • The previous "generation1" subfolder and its contents will be deleted

  • The previous "generation2" subfolder will be renamed: e.g., C:\dbbackup\bkup \generation1

  • The previous "generation3" subfolder will be renamed: e.g., C:\dbbackup\bkup \generation2

  • The "most recent backup" subfolder will be created: e.g., C:\dbbackup\bkup \generation3

  • Backup copies of the database and transaction log will be written to the generation3 subfolder

  • The original transaction log will deleted and recreated.

Here's the actual batch file run_dbbackup_full.bat...
Note: A bug has been fixed, and a new version of the batch file has been posted here: Rotating Database Backups Revisited.

Note: The ^ character is the line continuation character for Windows command files.
REM -c ...  database connection string
REM -b ... maximum block size in pages
REM -d backup database file only, not the transaction log.
REM -k ... checkpoint log option: copy, nocopy, recover,
REM auto (default)
REM -l live backup
REM -n rename the backup log file to YYMMDDXX.LOG
REM -o ... where to put dbbackup console messages
REM -r rename current log to YYMMDDXX.LOG, start a new one
REM Note: It is safe to delete old YYMMDDXX.LOG files.
REM -s use internal BACKUP statement to create the backup
REM Note: With -s the -o filespec is relative to
REM the server, no progress messages or prompts
REM are displayed, and the server must have
REM permission to write the output.
REM -t backup transaction log only, not the database file.
REM -x delete the current log and start a new one
REM -y create output folder and overwrite file(s)
REM without confirmation
REM bkup\generation3 target folder for latest backup log file

MD bkup

ECHO ********************************************^
*********************************************^
>>bkup\dbbackup_log.txt
ECHO Full dbbackup started >>bkup\dbbackup_log.txt
DATE /T >>bkup\dbbackup_log.txt
TIME /T >>bkup\dbbackup_log.txt

CD bkup
RD /S /Q generation1
RENAME generation2 generation1
RENAME generation3 generation2
MD generation3
CD ..

"%ASANY9%\win32\dbbackup.exe"^
-c "ENG=ddd;DBN=ddd;UID=dba;PWD=sql"^
-o bkup\dbbackup_log.txt -x -y bkup\generation3

IF ERRORLEVEL 1 GOTO ERR_BACKUP

REM Full dbbackup OK
DATE /T >>bkup\dbbackup_log.txt
TIME /T >>bkup\dbbackup_log.txt
DIR /S bkup\*.* >>bkup\dbbackup_log.txt
ECHO Full dbbackup OK >>bkup\dbbackup_log.txt
DATE /T >>bkup\dbbackup_log.txt
TIME /T >>bkup\dbbackup_log.txt
GOTO END

:ERR_BACKUP
REM Error: Full dbbackup failed
DATE /T >>bkup\dbbackup_log.txt
TIME /T >>bkup\dbbackup_log.txt
DIR /S bkup\*.* >>bkup\dbbackup_log.txt
ECHO Error: Full dbbackup failed >>bkup\dbbackup_log.txt
GOTO END

:END

Thursday, December 4, 2008

Refactoring Foxhound: Always Watch For

In order to preserve some semblance of sanity, the huge Foxhound "To Do" list has been broken into separate files by category: bug fixes, necessary enhancements to the Schema Display portion, enhancements to the Monitor, future changes and so on, plus a "Done" list where completed to-do items are moved when they're done.

The current refactoring walk-through has pointed out the need for a new category: the "Always Watch For" to-do items. These are tasks which must be repeated, sometimes frequently, and are never really "Done".

Here's the brand-new "Always Watch For" list... only three items, but the list is only a day old:

  • Always watch for new problems in Foxhound Options - Display Diagnostics.

  • When upgrading to a new version of SQL Anywhere, check code for references to SQLANY11.

  • When upgrading to a new version of SQL Anywhere, check code in 610_rroad_dbping.sql for version-specific environmental changes (folder names, etc.)
The first one was really nasty... the Foxhound build process can create three different versions: the final delivery version, a version to be used for tuning, and a version with lots of extra diagnostics for use during development only.

Multiple versions of the same code exist in some places, and which version is actually compiled depends on the build version; for example, multiple versions of a CREATE TABLE may exist with more constraints in the development build than the final deployment build. When making maintenance changes it's important to check ALL the versions of the same code to see if they ALL need the same change; e.g., a new column is needed in all build versions.

You can see where this is heading: The diagnostic development build works OK, but not the final delivery version, because a change was made in one place when it needed to be made in two places. In some cases, the result is a "soft error" that is not fatal to Foxhound operation, but causes a runtime diagnostic to be stored.

And that's the reason for the to-do item "Always watch for new problems in Foxhound Options - Display Diagnostics"... over 500 of these messages appeared there before I noticed anything was amiss:
514 2008-12-03 07:45:40.093 Full Build 3331a 1000000321 201a3(201eh1) SQLCODE = -143, SQLSTATE = 52003, ERRORMSG() = Column 'timeout_retry_at' not found
The other two "When upgrading..." items refer to code that is specific to the version of SQL Anywhere used to build Foxhound. Regardless of whether version-specific data is hard-coded or placed in a configuration file, it still needs to be revisited when the version changes... hence these reminders in the "Always Watch For" list.

All the items got added to the list the hard way: I made a mistake, and it's likely I'll make the same mistake again.

There will be lots more entries, I'm sure... the reason I'm talking about it is that you might benefit from creating your own "Always Watch For" list, on your own projects.

Wednesday, December 3, 2008

Refactoring Foxhound: The Series

Refactoring: verb; to change a computer program's code to make it amenable to change, improve its readability, or simplify its structure, while preserving its existing functionality (more...).

Foxhound: noun; a database monitor and schema troubleshooter for SQL Anywhere (more...).
Foxhound is of interest to this blog, not (just) because it's a tool dedicated to SQL Anywhere, but because it's written in SQL... at least, 61,000 lines of it is. That's 98% of the total. The other 2% is written in C.

The time has come for me to fix some bugs in Foxhound and make some enhancements, and while I'm at it I'm going to do some refactoring: every time I have to open a module to make some changes, I'm also going to look around for other improvements to make.

And while I'm doing that, I'm also going to look for code samples and programming techniques that are interesting enough to publish here... hence this post, the first one in "Refactoring Foxhound: The Series".

Refactoring Foxhound: Calling C

The only reason any part of Foxhound is written in C is because it can't be done in SQL.

The piece of C shown here lets you call the Windows API function GetEnvironmentVariable from SQL via the "external function call" mechanism.

Here is the SQL code that shows how to define a SQL function called get_environment_variable that in turn calls a function of the same name inside demodll.dll... and inside that dll resides the C code (shown later) that actually calls GetEnvironmentVariable.
CREATE PROCEDURE get_environment_variable (
IN name VARCHAR ( 255 ),
OUT value VARCHAR ( 32766 ),
OUT return_code INTEGER, -- OK if > 0
OUT diagnostic_code INTEGER,
OUT diagnostic_string VARCHAR ( 255 ) )
EXTERNAL NAME 'get_environment_variable@C:\\temp\\demodll.dll';

BEGIN
DECLARE @name VARCHAR ( 255 );
DECLARE @value VARCHAR ( 32766 );
DECLARE @return_code INTEGER;
DECLARE @diagnostic_code INTEGER;
DECLARE @diagnostic_string VARCHAR ( 255 );

SET @name = 'SQLANY11';

CALL get_environment_variable (
@name,
@value,
@return_code,
@diagnostic_code,
@diagnostic_string );

SELECT @name,
@value,
@return_code,
@diagnostic_code,
@diagnostic_string;

END;
In this case the SQL function receives one input parameter, the name of the environment variable, and returns four output parameters: the value of the environment variable, a return code and two diagnostic values in case there was a problem.

For example, if you call get_environment_variable with name = 'SQLANY11' it will return value = 'C:\Program Files\SQL Anywhere 11' and return_code = a non-zero value.

If you call it with an unknown variable name, such as 'UNKNOWN', the return value will be empty, the return_code will be zero, and the diagnostic_string will contain 'GetEnvironmentVariable failed'.

Here's the actual demodll.cpp code used to compile demodll.dll via Visual C++ 2005 Express Edition, with line numbers and some explanations.
Note: This is not a tutorial in C++, nor is it a complete explanation of how to write external C functions for SQL Anywhere. For more information on the latter topic see the SQL Anywhere External Function API section in the Help, as well as the sample code in C:\Documents and Settings\All Users\Documents\SQL Anywhere 11\Samples\SQLAnywhere\ExternalProcedures.

Instead, the explanations below are my attempt to draw your attention to some tricky and troublesome bits...
  1 // *********************************************************
2 // Copyright 1994-2008 iAnywhere Solutions, Inc. All rights
3 // reserved.
4 // This sample code is provided AS IS, without warranty or
5 // liability of any kind.
6 //
7 // You may use, reproduce, modify and distribute this sample
8 // code without limitation, on the condition that you retain
9 // the foregoing copyright notice and disclaimer as to the
10 // original iAnywhere code.
11 // *********************************************************
12
Lines 1 through 11 is the standard Copyright notice that iAnywhere Solutions wants you to include if you publish any of the sample code that came with SQL Anywhere.
 13 #if defined( WIN32 )
14 #include <windows.h>
15 #endif
16
17 #include <stdarg.h>
18 #include <stdlib.h>
19 #include <stdio.h>
20 #include <string.h>
21 #if defined( _MSC_VER )
22 #include <tchar.h>
23 #endif
24
25 #include "extfnapi.h"
26 #include "Shlobj.h"
27
28 #if !defined( _unused )
29 #define _unused( i ) ( (i) = (i) )
30 #endif
31
32 #if !defined( TRUE )
33 #define TRUE 1
34 #endif
35 #if !defined( FALSE )
36 #define FALSE 0
37 #endif
38
39 #if defined( WIN32 )
40 #define _UINT32_ENTRY unsigned int FAR __stdcall
41 #define _VOID_ENTRY void FAR __stdcall
42 #else
43 #define _UINT32_ENTRY unsigned int
44 #define _VOID_ENTRY void
45 #endif
46
47 #define int64 __int64
48 #define uint64 unsigned __int64
49
50 #if defined( WIN32 )
51 int __stdcall LibMain(
52 HANDLE inst,
53 ULONG reasoncalled,
54 LPVOID reserved )
55 /*********************/
56 {
57 _unused( inst );
58 _unused( reasoncalled );
59 _unused( reserved );
60 return( 1 );
61 }
62 #endif
63
64 __declspec(dllexport) _UINT32_ENTRY extfn_use_new_api( void )
65 {
66 return( EXTFN_API_VERSION );
67 }
68
Lines 12 through 67 are "standard stuff" necessary for writing a C dll that's going to be called from SQL Anywhere... mess around with this code at your peril!
 69 //----------------------------------------------------------
70 // get_environment_variable
71 //----------------------------------------------------------
72
73 __declspec(dllexport) _VOID_ENTRY get_environment_variable (
74 an_extfn_api *api, void *arg_handle ) {
75
Lines 73 and 74 show the standard entry point for your externally-callable C function. You pick the function name, but no matter what arguments you pass in the SQL CALL, the C routine only ever sees these two parameters: an_extfn_api *api and void *arg_handle.
 76    // For information about GetEnvironmentVariable see: 
77 // http://msdn2.microsoft.com/en-us/library/ms683188.aspx
78
79 an_extfn_value api_name;
80 an_extfn_value api_value;
81 an_extfn_value api_return_code;
82 an_extfn_value api_diagnostic_code;
83 an_extfn_value api_diagnostic_string;
84
Lines 79 through 83 show how you must provide special SQL-Anywhere-specific declarations for each of the arguments that are passed in the SQL CALL.
 85    char *          name; 
86 char * value;
87 DWORD return_code;
88 DWORD diagnostic_code;
89 char * diagnostic_string;
90
91 if ( !api -> get_value ( arg_handle, 1, &api_name )
92 || api_name.data == NULL ) {
93 return;
94 }
95
Lines 85 through 89 show five local declarations corresponding to the five arguments passed in the SQL CALL.

Lines 91 through 94 check for an invalid call and return immediately.
 96    name = (char *) api_name.data;
97
98 value = (char *) malloc( 32766 );
99 strcpy_s ( value, 32766, "" );
100
101 return_code = 0;
102 diagnostic_code = 0;
103 diagnostic_string = (char *) malloc( 255 );
104 strcpy_s ( diagnostic_string, 255, "" );
105
Lines 96 through 104 initialize the five local variables. The first variable receives a pointer to the environment variable name parameter, and the other four are set to default return values.
106    return_code = GetEnvironmentVariable ( 
107 (LPCTSTR) name,
108 (LPTSTR) value,
109 32767 );
110
Lines 106 through 109 call the actual Windows API function, passing name and returning values in return_code and value.
111    if ( return_code <= 0 ) {
112 diagnostic_code = 1;
113 strcpy_s ( diagnostic_string, 255,
114 "GetEnvironmentVariable failed" );
115 }
116
Lines 11 through 115 handle a failed call to GetEnvironmentVariable.
117    // CHECK THE ARGUMENT NUMBERS IN THE SET_VALUE CALLS...
118
Line 117 is an exhortation to the maintenance programmer: The five arguments passed in the SQL CALL are numbered 1, 2, 3, 4, 5 and those numbers are hard-coded in the code that follows. Getting one of those numbers wrong can wreak havoc...
119    api_value.type = DT_VARCHAR;
120 api_value.data = value;
121 api_value.piece_len
122 = ( a_sql_uint32 )( strlen ( value ) );
123 api_value.len.total_len
124 = ( a_sql_uint32 )( strlen ( value ) );
125 api -> set_value (
126 arg_handle, 2, &api_value, 0 );
127
128 api_return_code.type = DT_INT;
129 api_return_code.data = &return_code;
130 api -> set_value (
131 arg_handle, 3, &api_return_code, FALSE );
132
133 api_diagnostic_code.type = DT_INT;
134 api_diagnostic_code.data = &diagnostic_code;
135 api -> set_value (
136 arg_handle, 4, &api_diagnostic_code, FALSE );
137
138 api_diagnostic_string.type = DT_VARCHAR;
139 api_diagnostic_string.data = diagnostic_string;
140 api_diagnostic_string.piece_len
141 = ( a_sql_uint32 )( strlen ( diagnostic_string ) );
142 api_diagnostic_string.len.total_len
143 = ( a_sql_uint32 )( strlen ( diagnostic_string ) );
144 api -> set_value (
145 arg_handle, 5, &api_diagnostic_string, 0 );
146
The four sections of code above, starting at lines 119, 128, 133 and 138 respectively, show the special SQL-Anywhere-specific code that must be written to return data to the four output parameters in the SQL CALL.
147    free ( value );
148 free ( diagnostic_string );
149
150 } // get_environment_variable
Lines 147 and 148 perform some cleanup before return.

Here are a few tips about using Visual C++:
To specify the location of SQL Anywhere *.h include files:

Solution Explorer - project Properties
Configuration Properties
C/C++
General
Additional Include Directories...
"C:\Program Files\SQL Anywhere 10\h"
... or ...
"C:\Program Files\SQL Anywhere 11\SDK\Include"

To specify which DLL functions can be named
in CREATE PROCEDURE ... EXTERNAL statements:

Solution Explorer - project Properties
Configuration Properties
Linker
Input
Module Definition File... demodll.def

The demodll.def file contains these two lines of text:

EXPORTS extfn_use_new_api
EXPORTS get_environment_variable