Wednesday, February 29, 2012

Example: Working With RAND()

Question: How do I generate random integers in the range 1 through 6?

Answer:  

Oh, you mean "How do I use SQL Anywhere to generate random numbers in the range 1 through 6?"

Well, you can call RAND()... which gives you a number between 0 and 1, but doesn't every actually give you a zero or a one, and take it from there.

Gee, thanks for all the help...

OK, OK, let's take it a bit further... let's call EXPRTYPE() to see what RAND() returns:
SELECT EXPRTYPE ( 'SELECT RAND()', 1 ) AS "data type";

data type
double

Great... RAND() returns a honking big floating point number.

Well, what do they look like? Let's use sa_rowgenerator() to call RAND() ten times:
SELECT RAND() AS random FROM sa_rowgenerator ( 1, 10 );

random
0.7364153851458874
0.9333781469303082
0.28651545768907083
0.46529738021329853
0.25306924490866683
0.33479917996320835
0.969817641642791
0.7251030903892141
0.8076401715202444
0.00836274074779951

Wonderful! That's so far away from 1, 2, 3, 4, 5, 6 it's not even funny!

Why does everything have to be so hard?

Sigh... let's get started... multiply by 6 and add 1, to change the interval from "0 to 1" to "1 to 7":
SELECT ( RAND() * 6 ) + 1 AS random 
  FROM sa_rowgenerator ( 1, 10 );

random
4.5213047105452535
5.568270134072876
3.916143362836979
4.621499201106605
3.5370729987216523
5.585889514808492
6.045075386317947
1.5820178457452068
2.9739334396896573
2.899320864071753

Much better... now the results are in the range 1-point-something to 6-point-something, but don't forget: it's never exactly 1 or exactly 7.

Now let's use CAST to drag the values down to 1 through 6, where the values can sometimes be exactly 1 or exactly 6:
SELECT CAST ( ( RAND() * 6 ) + 1 AS INTEGER ) AS random 
  FROM sa_rowgenerator ( 1, 10 );

random
3
1
4
3
3
6
2
3
4
6

MUCH better... now let's see if it generates the same number of each digit 1 through 6.

Let's generate a million numbers and count how many of each digit we get:
SELECT @@VERSION,
       random              AS digit,
       COUNT(*)            AS actual_occurrences,
       ( 1000000 / 6.0 )   AS perfect_occurrences
  FROM ( SELECT CAST ( ( RAND() * 6 ) + 1 AS INTEGER ) AS random 
           FROM sa_rowgenerator ( 1, 1000000 ) 
       ) AS generated 
 GROUP BY random
 ORDER BY random;

@@VERSION,digit,actual_occurrences,perfect_occurrences
'12.0.1.3298',3,1000000,166666.67

Oops, that's not right.

Try it again and you get a million fours. Or a million ones.

But just a single digit, not an even distribution across all 6 digits.

What we want is 166,666 ones, 166,666 twos, and so on... like we (almost) get in SQL Anywhere Version 9:
@@VERSION,digit,actual_occurrences,perfect_occurrences
'9.0.2.3951',1,166070,166666.67
'9.0.2.3951',2,166786,166666.67
'9.0.2.3951',3,166835,166666.67
'9.0.2.3951',4,166467,166666.67
'9.0.2.3951',5,166944,166666.67
'9.0.2.3951',6,166898,166666.67

This Is Why We Test...

...to make sure our crap stuff actually works.

Somewhere, deep down inside the SQL Anywhere query engine, the promise that RAND() "is treated as a non-deterministic function" is being broken... perhaps it's because of all the GROUP BY and COUNT(*) stuff that was added to the SELECT.

Or maybe it's a feature... in this case, it doesn't matter, we just need it to work.

And here's how: Bite the bullet, save the million RAND() results in a simple temporary table, then run the query:
SELECT RAND() AS random 
  INTO #generated
  FROM sa_rowgenerator ( 1, 1000000 );

SELECT @@VERSION,
       CAST ( ( random * 6 ) + 1 AS INTEGER )  AS digit,
       COUNT(*)                                AS actual_occurrences,
       ( 1000000 / 6.0 )                       AS perfect_occurrences
  FROM #generated 
 GROUP BY digit
 ORDER BY digit;

@@VERSION,digit,actual_occurrences,perfect_occurrences
'12.0.1.3298',1,166612,166666.67
'12.0.1.3298',2,166700,166666.67
'12.0.1.3298',3,166727,166666.67
'12.0.1.3298',4,166496,166666.67
'12.0.1.3298',5,166801,166666.67
'12.0.1.3298',6,166664,166666.67

That works in all versions of SQL Anywhere from 9 through 12, and the result looks pretty good: for all intents and purposes the number of ones, twos, threes, ... are the same.

So there's the answer: Call RAND(), multiply it to expand the range, add to it to shift the range, and truncate it to get an integer.

Tip: Truncate the number, don't ROUND() it, that might generate an incorrect number of 1s and 6s. That's what happened with this expression: CAST ( ROUND ( ( RAND() * 5 ) + 1, 0 ) AS INTEGER )

Are you sure that's random?


Scott Adams has the answer...

Dilbert.com


Tuesday, February 28, 2012

Latest SQL Anywhere EBF - 12.0.1.3577 for the Mac

The three asterisks "***" show what's new since the previous list.

  • Only EBFs for the latest fully-supported versions of SQL Anywhere are shown here: 10.0.1, 11.0.1 and 12.0.1.

  • Note: Some earlier versions are also fully supported but they are not shown here; e.g., the 12.0.0.2758 EBF for Windows was recently released. For more information see SQL Anywhere Supported Platforms and Engineering Support Status.

  • Also: Just because an older version isn't "fully supported" doesn't mean you can't download files (or ask questions, or get help), it just means there won't be any more new EBFs released. For a complete list of everything available for download see Downloads - EBFs / Maintenance.

Current builds for the most active platforms...

HP-UX Itanium    12.0.1.3523 EBF       15 Dec 2011
                 11.0.1.2753 EBF       13 Feb 2012
                 10.0.1.3777 EBF       14 Oct 2008

IBM AIX          12.0.1.3523 EBF       15 Dec 2011
                 11.0.1.2753 EBF       13 Feb 2012

Linux x86/x64    12.0.1.3519 EBF       20 Dec 2011
                 11.0.1.2753 EBF       13 Feb 2012
                 10.0.1.4310 EBF       17 Jan 2012

Mac OS           12.0.1.3577 EBF ***   24 Feb 2012 ***
                 11.0.1.2449 EBF       29 Jun 2010
                 10.0.1.4042 EBF       01 Oct 2010

Solaris SPARC    12.0.1.3523 EBF       15 Dec 2011
                 11.0.1.2753 EBF       13 Feb 2012
                 10.0.1.4310 EBF       17 Jan 2012

Solaris x64      12.0.1.3523 EBF       15 Dec 2011
                 11.0.1.2753 EBF       13 Feb 2012
                 10.0.1.3870 EBF       01 Apr 2009

Windows x86/x64  12.0.1.3554 EBF       31 Jan 2012
                 11.0.1.2755 EBF       10 Feb 2012
                 10.0.1.4310 EBF       12 Jan 2012
[Download EBFs here] [Register for the Fuji Beta here]

...and for the other platforms:

HP-UX PA-RISC    10.0.1.3778 EBF                 16 Oct 2008

Linux Itanium    10.0.1.3777 EBF                 14 Oct 2008

NetWare          10.0.1.4036 EBF                 10 Mar 2010

Windows CE       10.0.1.4036 EBF                 09 Mar 2010

Windows Itanium  10.0.1 GA Upgrade from 10.0.0   26 Apr 2007
[Download EBFs here]


Saturday, February 25, 2012

Example: A Really Simple Moving Average WINDOW clause

Using a simple moving average to "smooth out" data is a fairly popular technique... it's too bad the primary example in the SQL Anywhere Help is far from simple:

SELECT *
  FROM ( SELECT s.ProductID, 
         Month( o.OrderDate ) AS julian_month,
         SUM( s.Quantity ) AS sales,
         AVG( SUM( s.Quantity ) )
         OVER ( PARTITION BY s.ProductID
           ORDER BY Month( o.OrderDate ) ASC
           RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING ) 
         AS average_sales
         FROM SalesOrderItems s KEY JOIN SalesOrders o
         WHERE Year( o.OrderDate ) = 2000
         GROUP BY s.ProductID, Month( o.OrderDate ) ) 
  AS DT
  ORDER BY 1,2;

ProductID,julian_month,sales,average_sales
300,1,108,108.000000
300,2,108,112.000000
300,3,120,112.000000
300,4,108,116.000000
300,5,120,116.000000
300,6,120,132.000000
300,7,156,132.000000
300,8,120,124.000000
300,9,96,116.000000
300,10,132,124.000000
300,11,144,140.000000
300,12,144,144.000000
...

What makes that example so complex? Besides the problem statement, that is: "compute the moving average of all product sales, by month, in the year 2000."

Here's what makes it complex:
  • two references to the AVG() function,

  • two SUM() calls,

  • three calls to Month(),

  • two ORDER BYs,

  • a FROM,

  • a WHERE,

  • a GROUP BY (which all by itself makes just about any SELECT a head-scratcher),

  • plus...

...a stealth WINDOW clause!

...a WINDOW clause that doesn't even use the WINDOW keyword, so to the uninitiated (the folks who need examples more than anyone else) it isn't obvious that a WINDOW is involved at all.

Not just any WINDOW clause, mind you, but one that includes every single component you can code in a WINDOW:
  • a PARTITION BY,

  • an ORDER BY and

  • a RANGE clause... not a simple ROWS clause but full-blown RANGE clause, one that has an intimate relationship with the ORDER BY... "I know what a row is, but what the [redacted] is a RANGE?"
But wait, there's more: The choice of RANGE over ROWS in this example is critical to the correct operation of the query.
(for a more complete discussion of this particular example, see Example 23 - Computing a Moving Average in Glenn Paulley's excellent OLAP white paper.)


Now, let's get back on track:

A Really Really Simple Moving Average

The following example displays 10 days worth of data together with the moving average of today's value and yesterday's:
CREATE TABLE t (
   entry_date   DATE NOT NULL PRIMARY KEY,
   value        INTEGER NOT NULL );

INSERT t VALUES ( '2012-02-01', 10 );
INSERT t VALUES ( '2012-02-02', 20 );
INSERT t VALUES ( '2012-02-03', 10 );
INSERT t VALUES ( '2012-02-04', 30 );
INSERT t VALUES ( '2012-02-05', 10 );
INSERT t VALUES ( '2012-02-06', 40 );
INSERT t VALUES ( '2012-02-07', 10 );
INSERT t VALUES ( '2012-02-08', 50 );
INSERT t VALUES ( '2012-02-09', 10 );
INSERT t VALUES ( '2012-02-10', 60 );
COMMIT;

SELECT t.entry_date,
       t.value,
       AVG ( t.value ) OVER two_days AS two_day_average
  FROM t
WINDOW two_days AS
          ( ORDER BY t.entry_date
            ROWS BETWEEN 1 PRECEDING AND CURRENT ROW )
 ORDER BY t.entry_date;

entry_date,value,two_day_average
'2012-02-01',10,10.000000
'2012-02-02',20,15.000000
'2012-02-03',10,15.000000
'2012-02-04',30,20.000000
'2012-02-05',10,20.000000
'2012-02-06',40,25.000000
'2012-02-07',10,25.000000
'2012-02-08',50,30.000000
'2012-02-09',10,30.000000
'2012-02-10',60,35.000000

The WINDOW clause on lines 21 through 23 defines a moving window that contains two rows: today's row (CURRENT ROW) and yesterday's row (1 PRECEDING):
  • the WINDOW ORDER BY clause determines what "PRECEDING" means (the preceding row by t.entry_date) and

  • the ROWS clause determines the size of the window (always two rows).
The expression AVG ( t.value ) OVER two_days on line 19 refers to the WINDOW clause by name, and it tells SQL Anywhere to compute the average of the two values of t.value that exist in the 2-row sliding window, for each row in the result set.
  • So, for '2012-02-02' the average of 10 and 20 is 15.000000,

  • for '2012-02-03' the average of 20 and 10 is 15.000000,

  • for '2012-02-04' the average of 10 and 30 is 20.000000,

  • ... and so on ...

  • for '2012-02-10' the average of 10 and 60 is 35.000000.

Oops, what about the first row?

The '2012-02-01' row doesn't have a PRECEDING row, so what is the average over the moving window?

According to Glenn Paulley's white paper "in the case of a moving window, it is assumed that rows containing Null values exist before the first row, and after the last row, in the input."

That means when the moving window has '2012-02-01' as the "CURRENT ROW", the "1 PRECEDING" row contains NULL values... and when SQL Anywhere computes an AVG() that includes a NULL value, it doesn't count the NULL at all... not in the numerator or in the denominator when computing the average. Here's proof:
CREATE TABLE x ( c INTEGER );

INSERT x VALUES ( NULL );
INSERT x VALUES ( 10 );
COMMIT;

SELECT AVG ( c ) FROM x;

AVG(x.c)
10.000000
That's why two_day_average = 10.000000 for the first row '2012-02-01'.


Monday, February 20, 2012

Example: Using Views For Row-Level Security

This question made me realize just how lucky I've been for several years, not having to deal with any Security Schemas Gone Wild!

Security is one of those areas of life that knows no physical bounds. People can eat only so much food, travel to only one place at a time, and are generally limited in their activities when those activities are based on real-world resources.

(even multitasking nutbars have their limits)
Security is an entirely different matter, especially computer system security: There is no limit to the rules and regulations that can be imposed by a database schema gone crazy... or if there is a limit, it is far beyond the point where administrators, end users and developers have been driven 'round the bend.

Examples? You want examples? ...not from me, I don't want anyone to get ideas.

In fact, I'm have second thoughts about this article, but here goes...


Here's the question: How do I provide insert, update, delete and select access to all tables and columns, to all users, but limit each user to a specific subset of rows in each table?

In other words, how do I provide row-level security?

Let's start with a simple base table that contains a primary key, some data columns plus a special "access_code" column that will be used to implement row-level security:
CREATE TABLE base_table (
   pkey             INTEGER NOT NULL PRIMARY KEY,
   access_code      INTEGER NOT NULL,
   data1            INTEGER NOT NULL,
   data2            INTEGER NOT NULL,
   data3            INTEGER NOT NULL );

INSERT base_table VALUES ( 1, 101, 0, 0, 0 );
INSERT base_table VALUES ( 2, 101, 0, 0, 0 );
INSERT base_table VALUES ( 3, 102, 0, 0, 0 );
INSERT base_table VALUES ( 4, 102, 0, 0, 0 );
INSERT base_table VALUES ( 5, 103, 0, 0, 0 );
INSERT base_table VALUES ( 6, 103, 0, 0, 0 );
COMMIT;

Here's a CREATE VIEW statement for a view through which end users have access to base_table data on a row-by-row basis:
CREATE VIEW view_base_table
AS SELECT base_table.pkey,
          base_table.access_code,
          base_table.data1, 
          base_table.data2, 
          base_table.data3 
     FROM base_table
    WHERE base_table.access_code = @access_code 
WITH CHECK OPTION;

The CREATE VIEW ... SELECT on lines 1 through 7 sets up the basics: a view which looks exactly the same as the underlying base table. The "looks exactly the same" part can be important when it comes to updating base_table via the view... more on that later.

The WHERE clause on line 8 implements row-level access control: the end user can only see base_table rows that have a access_code value that matches what's in the @access_code variable.
(and where does @access_code come from? more on that later, too)
The WITH CHECK OPTION lets your end users INSERT, UPDATE and DELETE data via the view without affecting any base_table rows that a SELECT on the view wouldn't return in the first place; in other words, the WHERE clause is applied to the data that's being inserted, updated or deleted.

Creating an updatable view is a bit more difficult that a SELECT-only view. Most of the rules for updatable views are described in the Help, but not all the rules: A view can only be INSERT-able and DELETE-able if there is only one table in the FROM clause, no matter what the ansi_update_constraints option is set to. Of course, that's no problem here, the access control predicate is coded in a WHERE clause rather than an INNER JOIN ON clause.
(but it can become a problem, later on... that one-table-FROM-clause rule)
There is also a common-sense rule that an updatable view SELECT must contain all the columns required to code the INSERT, UPDATE or DELETE you actually want. For example, the view SELECT should probably include the underlying primary key columns for an INSERT, and for UPDATE and DELETE statements as well... unless you want to update/delete all the rows. That's why "looks exactly the same" as the base tables can make these views a lot easier to deal with.


Here's where the @access_code variable comes from, it's a connection-level variable that is created and assigned a value depending on the user id, whenever that user id connects to the database:
CREATE PROCEDURE DBA.connection_start() 
BEGIN
   CREATE VARIABLE @access_code INTEGER;
   SET @access_code = CASE USER_NAME()
                         WHEN 'user_a' THEN 101
                         WHEN 'user_b' THEN 102
                         WHEN 'user_c' THEN 102
                         ELSE 0
                      END CASE;
END; 

SET OPTION PUBLIC.login_procedure = 'DBA.connection_start';

Connection-level variables like @access_code are visible to all the SQL run on the connection, including every reference to view_base_table from anywhere in the application or any stored procedure or trigger... plus, each connection has its own copy of @access_code that is inaccessible (invisible, in fact) to SQL code running on other connections.
For more information see the CREATE VARIABLE statement and the login_procedure option.

Here are the GRANT statements that set up three different users with access to the view:
  • The GRANT CONNECT and GRANT GROUP statements on lines 1 and 2 create a user group called "usergroup"... this is optional, but it does make administration easier when you have a lot of tables and a lot of users.

  • The GRANT GROUP and GRANT MEMBERSHIP on lines 4 and 5 provide name visibility on all the objects created by DBA to all the members of usergoup. Name visibility simply means you can refer to table t instead of having to code DBA.t; it doesn't actually give you the right to do anything useful like INSERT or SELECT, that's done separately. And being a member of the group called DBA doesn't confer any special powers: the DBA privilege is not inherited through group membership (whew!)

  • The GRANT EXECUTE on statement 7 lets everyone execute the connection_start procedure when they log in.

  • The GRANT statement on line 9 sets up each member of usergroup with all the necessary privileges on the view. This statement is the reason usergroup exists: Only one such GRANT is required for each view, not one GRANT for each view for each user... on a large system that brings the number of GRANT statements down to the hundreds from the thousands.

  • The three pairs of GRANT statements on lines 11 through 18 are all that is required to set up three users with (1) the ability to connect, (2) name visibility to all the stuff DBA owns and (3) INSERT, UPDATE, DELETE and SELECT privileges on the view.
GRANT CONNECT TO usergroup;
GRANT GROUP TO usergroup;

GRANT GROUP TO DBA;
GRANT MEMBERSHIP IN GROUP DBA TO usergroup;

GRANT EXECUTE ON connection_start TO usergroup;

GRANT INSERT, UPDATE, DELETE, SELECT ON view_base_table TO usergroup;

GRANT CONNECT TO user_a IDENTIFIED BY SQL;
GRANT MEMBERSHIP IN GROUP usergroup TO user_a;

GRANT CONNECT TO user_b IDENTIFIED BY SQL;
GRANT MEMBERSHIP IN GROUP usergroup TO user_b;

GRANT CONNECT TO user_c IDENTIFIED BY SQL;
GRANT MEMBERSHIP IN GROUP usergroup TO user_c;

Here's what works...


Here are some INSERT, UPDATE, DELETE and SELECT statements that work OK for user_a:
  • the two INSERTs work OK because they both set the access_code to 101,

  • the UPDATE works OK because the access_code for row 7 is 101,

  • the DELETE works OK because the access_code for row 8 is 101, and

  • the SELECT returns only the rows that have access_code = 101.
-- user_a
INSERT view_base_table VALUES ( 7, @access_code, 0, 0, 0 );
INSERT view_base_table VALUES ( 8, @access_code, 0, 0, 0 );
UPDATE view_base_table SET data1 = 555 WHERE pkey = 7; 
DELETE view_base_table WHERE pkey = 8;    
COMMIT;
SELECT * FROM view_base_table ORDER BY pkey;           

pkey,access_code,data1,data2,data3
1,101,0,0,0
2,101,0,0,0
7,101,555,0,0

Here's the same test for user_b; similar results, different rows:
-- user_b
INSERT view_base_table VALUES ( 9,  @access_code, 0, 0, 0 );
INSERT view_base_table VALUES ( 10, @access_code, 0, 0, 0 );
UPDATE view_base_table SET data1 = 666 WHERE pkey = 9; 
DELETE view_base_table WHERE pkey = 10;    
COMMIT;
SELECT * FROM view_base_table ORDER BY pkey;           

pkey,access_code,data1,data2,data3
3,102,0,0,0
4,102,0,0,0
9,102,666,0,0

Here's the same test for user_c...
-- user_c
INSERT view_base_table VALUES ( 11, @access_code, 0, 0, 0 );
INSERT view_base_table VALUES ( 12, @access_code, 0, 0, 0 );
UPDATE view_base_table SET data1 = 777 WHERE pkey = 11; 
DELETE view_base_table WHERE pkey = 12;    
COMMIT;
SELECT * FROM view_base_table ORDER BY pkey;           

...but this time, the SELECT displays something different; what is it?



Here's a clue, the INSERT, UPDATE and DELETE statements all work, but the SELECT result set is still different...



Here's the answer: the SELECT displays 4 rows instead of three because both user_b and user_a have @access_code set to 102!
pkey,access_code,data1,data2,data3
3,102,0,0,0
4,102,0,0,0
9,102,666,0,0
11,102,777,0,0

For the record here's what the underlying base_table looks like from the DBA's point of view:
-- DBA
SELECT * FROM base_table ORDER BY pkey;           

pkey,access_code,data1,data2,data3
1,101,0,0,0
2,101,0,0,0
3,102,0,0,0
4,102,0,0,0
5,103,0,0,0
6,103,0,0,0
7,101,555,0,0
9,102,666,0,0
11,102,777,0,0

Here's what doesn't work...


Here's what happens when an end user tries to use the base table: Permission denied.
-- user_a
SELECT * FROM base_table ORDER BY pkey;           

Permission denied: you do not have permission to select from "base_table"
SQLCODE=-121, ODBC 3 State="42000"

Here's what happens when an end user tries to insert a row with a different access code: SQLCODE -632.
-- user_a
INSERT view_base_table VALUES ( 13, 102, 0, 0, 0 );

WITH CHECK OPTION violated for insert/update on base table 'base_table'
SQLCODE=-632, ODBC 3 State="44000"

When a user tries to UPDATE a row with a different access code, it fails silently with SQLCODE 100:
-- user_b
UPDATE view_base_table SET data1 = 888 WHERE pkey = 1; 
SELECT SQLCODE;

sqlcode
100

Same thing when trying to DELETE someone else's row: silent failure, SQLCODE 100.
-- user_c
DELETE view_base_table WHERE pkey = 2;    
SELECT SQLCODE;

sqlcode
100

So what's all the fuss about?


The problem with row-level security is that CREATE VARIABLE is a gateway drug that only satisfies security cravings for a very short time... the next step is CREATE TABLE, followed by many-to-many relationships among multiple security tables implementing hierarchical access rules... by department, by job description, by time-of-day, by range-of-dollar-value... bwaaa ha ha ha!

Eventually, each single-table CREATE VIEW involves hidden joins to several other tables, and every five-way join among views becomes a performance nightmare involving twenty, thirty or more tables.

Why won't CREATE VARIABLE satisfy?


Because each user gets only one access code, thus limiting the relationship between users and rows, not to mention the different security relationships between one user and several tables.

But never mind that, consider this loophole: There's nothing stopping a user with an adhoc reporting tool from changing @access_code on the fly:
-- user_a
SET @access_code = 102;
SELECT * FROM view_base_table ORDER BY pkey;           

pkey,access_code,data1,data2,data3
3,102,0,0,0
4,102,0,0,0
9,102,666,0,0
11,102,777,0,0

So no, CREATE VARIABLE makes a nice demo but the security high doesn't last long at all.

Moving on, from crack cocaine to CREATE TABLE


Here's a workaround to close the CREATE VARIABLE loophole. It involves a single-row connection-level table to replace the CREATE VARIABLE, plus modifications to the CREATE VIEW and connection_start procedures:
CREATE GLOBAL TEMPORARY TABLE connection_settings (
   one_row_per_connection    INTEGER NOT NULL DEFAULT 1 CHECK ( one_row_per_connection = 1 ),
   access_code               INTEGER NOT NULL DEFAULT 0,
   PRIMARY KEY ( one_row_per_connection ) ) 
   NOT TRANSACTIONAL;

CREATE VIEW view_base_table
AS SELECT base_table.pkey,
          base_table.data1, 
          base_table.data2, 
          base_table.data3 
     FROM base_table
    WHERE EXISTS ( SELECT * 
                     FROM connection_settings
                    WHERE connection_settings.access_code = base_table.access_code )
WITH CHECK OPTION;

CREATE PROCEDURE DBA.connection_start() 
BEGIN
   INSERT connection_settings 
   VALUES ( DEFAULT, 
            CASE USER_NAME()
               WHEN 'user_a' THEN 101
               WHEN 'user_b' THEN 102
               WHEN 'user_c' THEN 102
            END CASE );
END; 

SET OPTION PUBLIC.login_procedure = 'DBA.connection_start';

The end user is now prevented from changing the access code assigned in the connection_start procedure because the user doesn't have permission to UPDATE the connection_settings table.

Thus it starts: a security table is introduced, and the view WHERE clause introduces an implicit join with a second table... just to close one loophole. Not to increase flexibility in setting up security rules, for that you need more columns, more rows, more tables, more joins...

Dilbert.com


Friday, February 17, 2012

Example: A Single-Row Connection-Level Table

Question: How do I create a table that can contain only a single row for each connection, and that row exists only as long as that connection exists, and is visible only on that connection?

Answer: Start with a non-shared CREATE GLOBAL TEMPORARY TABLE statement to ensure that rows inserted by each connection are only visible on that connection and are automatically deleted when the connection finishes.

Then, use a CHECK constraint to ensure that each connection can insert only one row.

Here's an example of a connection settings table; "non-shared" is the default when the SHARE BY ALL clause is omitted:

CREATE GLOBAL TEMPORARY TABLE connection_settings (
   one_row_per_connection    INTEGER NOT NULL DEFAULT 1 CHECK ( one_row_per_connection = 1 ),
   setting1                  INTEGER NOT NULL DEFAULT 0,
   setting2                  VARCHAR ( 100 ) NOT NULL DEFAULT '',
   setting3                  VARCHAR ( 1 ) NOT NULL DEFAULT 'Y',
   PRIMARY KEY ( one_row_per_connection ) ) 
   NOT TRANSACTIONAL;

The schema for a GLOBAL TEMPORARY TABLE is permanent and visible to all connections; in other words, you CREATE it once and then use it on each connection.

Each row of data, however, is temporary in the sense that it lasts no longer than the connection that inserted it, and each row is only visible to code running on the connection that inserted it.
Note: The rules that "data is temporary" and "only visible" do not apply when the SHARE BY ALL clause is used; see the Help for more information about non-shared versus shared global temporary tables.
The CHECK ( one_row_per_connection = 1 ) constraint enforces the "single row for each connection" requirement by limiting the primary key to a single value... since primary keys must be unique, that implies there can only be one row.
Tip: Don't forget to code NOT TRANSACTIONAL or ON COMMIT PRESERVE ROWS on all your GLOBAL TEMPORARY TABLE statements... unless you want your data in those tables to disappear every time you run a COMMIT. The default is ON COMMIT DELETE ROWS, one of those rare decisions that violates the rule "Watcom does things the way they should be done."
The resulting table is somewhat similar to connection-level variables created with the CREATE VARIABLE statement:
Table Variable
Create schema when:   Ahead of time   On connection  
Create schema how:   CREATE GLOBAL TEMPORARY TABLE t ...   CREATE VARIABLE v...  
Scope of schema:   All code on all connections   All code on connection  
Data values:   Multiple columns   Single value  
Create data when:   On connection   On connection  
Create data how:   INSERT t ...   CREATE VARIABLE v ...  
Scope of data:   All code on connection   All code on connection  
Reference values how:   SELECT first, then reference   Direct references to variable names  
Update data how:   UPDATE t SET c = ...   SET v = ...  
Explicitly delete data:   DELETE t   DROP VARIABLE v  
Data implicitly deleted:   End of connection   End of connection  
Tip: Use the login_procedure option to ensure the row exists before the connection is used for anything else:
CREATE PROCEDURE DBA.connection_start() 
BEGIN
   INSERT connection_settings DEFAULT VALUES;
END; 

SET OPTION PUBLIC.login_procedure = 'DBA.connection_start';

Warning: Don't use a Connect EVENT: a row inserted inside an event will only be visible inside that event because it runs on its own connection.

Oh, and if you're worried about overriding the default login_procedure value sp_login_environment, go read the Help to see if any of the crap stuff in there means anything to you. It probably won't, but if it does, be sure to clutter up add the code to your procedure.

Here's an example of how data in the table can be used and updated:
BEGIN
DECLARE @setting1 INTEGER;
SELECT setting1 INTO @setting1 FROM connection_settings;
END;

UPDATE connection_settings SET setting2 = 'Hello, World!';

Neither the SELECT nor the UPDATE needs a WHERE clause because there's only one row.

Also, the UPDATE doesn't need a COMMIT, and it isn't affected by a ROLLBACK, because of the NOT TRANSACTIONAL clause. If you want your data to be involved in transaction processing, use ON COMMIT PRESERVE ROWS instead (or the default, ON COMMIT DELETE ROWS, if you're in that 0.01% who needs that kind of strangeness functionality :)

...and if you do use ON COMMIT PRESERVE ROWS instead of NOT TRANSACTIONAL, you going to need a COMMIT when you update your table... just sayin'.



Hey, that INSERT DEFAULT VALUES clause is pretty cool, isn't it? Betcha didn't know you could do that...


Wednesday, February 15, 2012

Example: ISDATE Is Smarter Than It Looks

Question: If SQL Anywhere has ISNUMERIC() and ISDATE() functions, how come it doesn't have ISTIME() and ISTIMESTAMP()?

Answer: It does, they're both called ISDATE()!

Here's proof... Four tests involving strings containing good and bad DATE, TIME, TIMESTAMP and UTC TIMESTAMP literals:

BEGIN 
   DECLARE @good_date            VARCHAR ( 100 );
   DECLARE @good_time            VARCHAR ( 100 );
   DECLARE @good_timestamp       VARCHAR ( 100 );
   DECLARE @good_utc_timestamp   VARCHAR ( 100 );
   DECLARE @good_time2           VARCHAR ( 100 );
   DECLARE @bad_date             VARCHAR ( 100 );
   DECLARE @bad_time             VARCHAR ( 100 );
   DECLARE @bad_timestamp        VARCHAR ( 100 );
   DECLARE @bad_utc_timestamp    VARCHAR ( 100 );
   DECLARE @bad_time2            VARCHAR ( 100 );

   SET @good_date          = CAST ( CURRENT DATE AS VARCHAR ( 100 ) );
   SET @good_time          = CAST ( CURRENT TIME AS VARCHAR ( 100 ) );
   SET @good_timestamp     = CAST ( CURRENT TIMESTAMP AS VARCHAR ( 100 ) );
   SET @good_utc_timestamp = CAST ( CURRENT UTC TIMESTAMP AS VARCHAR ( 100 ) );
   SET @good_time2         = '1:1';

   SET @bad_date           = STRING ( @good_date, 'x' );
   SET @bad_time           = STRING ( @good_time, 'x' );
   SET @bad_timestamp      = STRING ( @good_timestamp, 'x' );
   SET @bad_utc_timestamp  = STRING ( @good_utc_timestamp, 'x' );
   SET @bad_time2          = STRING ( @good_time2, 'x' );

   SELECT '(1a)' AS "Test", @good_date AS "Value", ISDATE ( @good_date ) AS "ISDATE"  
   UNION
   SELECT '(1b)', @bad_date, ISDATE ( @bad_date ) 
   UNION
   SELECT '(2a)', @good_time, ISDATE ( @good_time ) 
   UNION
   SELECT '(2b)', @bad_time, ISDATE ( @bad_time ) 
   UNION
   SELECT '(3a)', @good_timestamp, ISDATE ( @good_timestamp ) 
   UNION
   SELECT '(3b)', @bad_timestamp, ISDATE ( @bad_timestamp ) 
   UNION
   SELECT '(4a)', @good_utc_timestamp, ISDATE ( @good_utc_timestamp ) 
   UNION
   SELECT '(4b)', @bad_utc_timestamp, ISDATE ( @bad_utc_timestamp ) 
   UNION
   SELECT '(5a)', @good_time2, ISDATE ( @good_time2 ) 
   UNION
   SELECT '(5b)', @bad_time2, ISDATE ( @bad_time2 ) 
   ORDER BY 1;

END;

Test     Value                           ISDATE
(1a)    '2012-02-14'                        1
(1b)    '2012-02-14x'                       0
(2a)    '15:26:48.592'                      1
(2b)    '15:26:48.592x'                     0
(3a)    '2012-02-14 15:26:48.592'           1
(3b)    '2012-02-14 15:26:48.592x'          0
(4a)    '2012-02-14 20:26:48.592+00:0'      1
(4b)    '2012-02-14 20:26:48.592+00:00x'    0
(5a)    '1:1'                               1
(5b)    '1:1x'                              0

Test 5(a) is especially interesting: the string '1:1' passes the ISVALID test... that's because it's not just a valid TIME but it's also a valid DATE, TIMESTAMP and UTC TIMESTAMP!

How can that be?


It's because in SQL Anywhere a valid DATE is also a valid TIME, and vice versa: a valid TIME is also valid DATE.
SELECT '3-2-1' AS "DATE", 
       CAST ( "DATE" AS DATE )      AS "AS DATE",
       CAST ( "DATE" AS TIME )      AS "AS TIME",
       CAST ( "DATE" AS TIMESTAMP ) AS "AS TIMESTAMP";

SELECT '2:1' AS "TIME", 
       CAST ( "TIME" AS DATE )      AS "AS DATE",
       CAST ( "TIME" AS TIME )      AS "AS TIME",
       CAST ( "TIME" AS TIMESTAMP ) AS "AS TIMESTAMP";

DATE,AS DATE,AS TIME,AS TIMESTAMP
'3-2-1','2003-02-01','00:00:00.000','2003-02-01 00:00:00.000'

TIME,AS DATE,AS TIME,AS TIMESTAMP
'2:1','2012-02-14','02:01:00.000','2012-02-14 02:01:00.000'

Curious, isn't it?


Not the part about a-valid-THIS-is-a-valid-THAT, but the part about "today" being used to fill in a missing date part, but not "now" for a missing time part: '00:00:00.000' is used instead.

Dilbert.com


Tuesday, February 14, 2012

Latest SQL Anywhere EBFs - 11.0.1 For (Almost) Everyone (Again)!

The three asterisks "***" show what's new since the previous list.

  • Only EBFs for the latest fully-supported versions of SQL Anywhere are shown here: 10.0.1, 11.0.1 and 12.0.1.

  • Note: Some earlier versions are also fully supported but they are not shown here; e.g., the 12.0.0.2758 EBF for Windows was recently released. For more information see SQL Anywhere Supported Platforms and Engineering Support Status.

  • Also: Just because an older version isn't "fully supported" doesn't mean you can't download files (or ask questions, or get help), it just means there won't be any more new EBFs released. For a complete list of everything available for download see Downloads - EBFs / Maintenance.

Current builds for the most active platforms...

HP-UX Itanium    12.0.1.3523 EBF       15 Dec 2011
                 11.0.1.2753 EBF ***   13 Feb 2012 ***
                 10.0.1.3777 EBF       14 Oct 2008

IBM AIX          12.0.1.3523 EBF       15 Dec 2011
                 11.0.1.2753 EBF ***   13 Feb 2012 ***

Linux x86/x64    12.0.1.3519 EBF       20 Dec 2011
                 11.0.1.2753 EBF ***   13 Feb 2012 ***
                 10.0.1.4310 EBF       17 Jan 2012

Mac OS           12.0.1.3352 EBF       26 May 2011
                 11.0.1.2449 EBF       29 Jun 2010
                 10.0.1.4042 EBF       01 Oct 2010

Solaris SPARC    12.0.1.3523 EBF       15 Dec 2011
                 11.0.1.2753 EBF ***   13 Feb 2012 ***
                 10.0.1.4310 EBF       17 Jan 2012

Solaris x64      12.0.1.3523 EBF       15 Dec 2011
                 11.0.1.2753 EBF ***   13 Feb 2012 ***
                 10.0.1.3870 EBF       01 Apr 2009

Windows x86/x64  12.0.1.3554 EBF       31 Jan 2012
                 11.0.1.2755 EBF       10 Feb 2012 ***
                 10.0.1.4310 EBF       12 Jan 2012
[Download EBFs here] [Register for the Fuji Beta here]

...and for the other platforms:

HP-UX PA-RISC    10.0.1.3778 EBF                 16 Oct 2008

Linux Itanium    10.0.1.3777 EBF                 14 Oct 2008

NetWare          10.0.1.4036 EBF                 10 Mar 2010

Windows CE       10.0.1.4036 EBF                 09 Mar 2010

Windows Itanium  10.0.1 GA Upgrade from 10.0.0   26 Apr 2007
[Download EBFs here]


Friday, February 10, 2012

Why SQL Server Sucks: ISNULL

Question: Why does SQL Anywhere have both ISNULL() and COALESCE()?

They both work exactly the same way, they both perform a useful function (return the first non-null argument), and they both have stupid names, so why the duplication?

Why not have just one function, with a decent name, like FIRST_NON_NULL()?

Answer: Here's a guess: It might have something to do with "Transact SQL compatibility" which is a historical requirement that by its very nature breaks the fundamental rule "WATCOM does things the way they should be done."

In this case, however, "compatibility" is not absolute: In Microsoft SQL Server the two functions do not work exactly the same way. For one thing, Transact SQL ISNULL() accepts only two arguments whereas Transact SQL COALESCE() accepts two or more.

But it gets worse...


When calling ISNULL() in Microsoft SQL Server the second argument "must be of a type that is implicitly convertible to the type" of the first argument.

That sounds OK, until you think about it.

Even after you think about it, it probably still sounds OK... ISNULL() could return either argument so they have to have compatible types, right?

It sounds OK until you actually run a test using Microsoft SQL Server 2008:
1> BEGIN
2>    DECLARE @hello VARCHAR ( 10 );
3>    SET @hello = NULL;
4>    SELECT ISNULL ( @hello, 'Hello, World!' ) AS Result;
5> END
6> GO
 Result
 ----------
 Hello, Wor

(1 row affected)

Go ahead, justify that behavior... the inventors of SQL Anywhere clearly couldn't, which is probably why they didn't try.

In SQL Anywhere "the way things should be done" trumps "Transact SQL compatibility" and here's a test using SQL Anywhere 12 to prove it:
BEGIN
   DECLARE @hello VARCHAR ( 10 );
   SET @hello = NULL;
   SELECT ISNULL ( @hello, 'Hello, World!' ) AS Result;
END
GO

Result
'Hello, World!'

For the record, SQL Server and SQL Anywhere both return 'Hello, World!' when you call COALESCE() instead of ISNULL().

...and for a longer discussion of ISNULL's strange behavior in SQL Server see Twist in ISNULL function.


Wednesday, February 8, 2012

Example: UNLOAD INTO VARIABLE and EXECUTE IMMEDIATE

Way back in January I made a New Year's Resolution to "Post More Examples Here", so here we go (and I'm set for another year, right?)...



Question: How do I write a SQL Anywhere function that accepts a SELECT statement in a string parameter and returns a string containing the result set with each row delimited by [square brackets] and separated by commas, and adjacent columns separated by tilde '^' characters?

Here's a demonstration of how it should work:
CREATE TABLE person (
   first_name    VARCHAR ( 100 ) PRIMARY KEY,
   member_since  DATE,
   last_seen     DATE,
   visits        INTEGER );

INSERT person VALUES ( 'Joe',  '2009-11-10', '2012-02-01', 1  );
INSERT person VALUES ( 'Mary', '2009-11-08', '2012-01-31', 45 );
INSERT person VALUES ( 'Tom',  '2009-11-11', '2012-01-30', 12 );
COMMIT;

SELECT to_string ( 'SELECT * FROM Person ORDER BY first_name' ) AS result;

result
[Joe^2009-11-10^2012-02-01^1],[Mary^2009-11-08^2012-01-31^45],[Tom^2009-11-11^2012-01-30^12]

Answer: Phrases like "row delimited by" and "columns separated by" should make every SQL Anywhere developer think of the UNLOAD and OUTPUT statements... in fact, if you look up DELIMITED BY in the Help that's what you'll find.

In this case, OUTPUT won't work because it's unique to the ISQL utility and can't be embedded inside a function, so UNLOAD it will have to be.

Let's tackle the problem step-by-step:

Step 1: Build A Prototype UNLOAD SELECT


Here's a standalone UNLOAD built with a few clauses described in the Help:
UNLOAD
   SELECT * FROM Person ORDER BY first_name
   TO 'c:/temp/result.txt'
   ROW DELIMITED BY '],['
   DELIMITED BY '^'
   ESCAPES OFF
   HEXADECIMAL OFF
   QUOTES OFF;

The UNLOAD and TO clauses on lines 1 and 3 are wrapped around the SELECT to tell SQL Anywhere to send the entire result set to a text file.

The ROW DELIMITED BY clause on line 4 really should be called "ROW TERMINATED BY" because it tells SQL Anywhere to place the character string '],[' at the right end of each output row including the last. This isn't exactly what we want, but it's the only row-separator-or-terminator clause that UNLOAD gives us... so we'll take it for now and deal with it later.

The DELIMITED BY clause on line 5 really should be called "COLUMNS SEPARATED BY" because it tells SQL Anywhere to put the tilde '^' between adjacent columns... in this case, that's exactly what we want.

The remaining three clauses on lines 6 through 8 tell SQL Anywhere to write all the column values as simple strings with no extra formatting: no backslash-escape sequences, no \0x hexadecimal formatting and no 'single quotes' around string values.

Here's what the output file looks like... close, but no cigar:
Joe^2009-11-10^2012-02-01^1],[Mary^2009-11-08^2012-01-31^45],[Tom^2009-11-11^2012-01-30^12],[

Step 2: Fix The Formatting


There are two problems with the output: The leading '[' is missing from the first row and an extra ',[' is tacked on to the end of the last row.

Those problems can't be solved by fiddling with UNLOAD clauses, it requires code:
BEGIN
   DECLARE @result LONG VARCHAR;
   UNLOAD
      SELECT * FROM Person ORDER BY first_name
      INTO VARIABLE @result
      ROW DELIMITED BY '],['
      DELIMITED BY '^'
      ESCAPES OFF
      HEXADECIMAL OFF
      QUOTES OFF;
   SELECT STRING ( '[', LEFT ( @result, LENGTH ( @result ) - 2 ) ) AS result;    
END;

The INTO VARIABLE clause on line 5 puts the output from UNLOAD into a local variable instead of a file, and the SELECT STRING statement fixes the two problems by tacking a '[' onto the front and chopping off the trailing ',['.

Now the formatting is OK:
result
'[Joe^2009-11-10^2012-02-01^1],[Mary^2009-11-08^2012-01-31^45],[Tom^2009-11-11^2012-01-30^12]'

Step 3: Turn It Into A General-Purpose Function


The UNLOAD statement expects a SELECT statement, not a string containing a SELECT statement like the function's supposed to use...

...what to do, oh, what to do?


Answer: Turn the whole UNLOAD statement into a string and use EXECUTE IMMEDIATE to run it!

Here's how:
CREATE FUNCTION to_string (
   IN @select LONG VARCHAR )
   RETURNS  LONG VARCHAR 
BEGIN
   DECLARE @unload LONG VARCHAR;
   DECLARE @result LONG VARCHAR;
   SET @unload = STRING (
      'UNLOAD ',
      @select,
      ' INTO VARIABLE @result ',
      ' DELIMITED BY ''^'' ',
      ' ROW DELIMITED BY ''],['' ',
      ' ESCAPES OFF HEXADECIMAL OFF QUOTES OFF' );
   EXECUTE IMMEDIATE @unload;
   RETURN STRING ( '[', LEFT ( @result, LENGTH ( @result ) - 2 ) );    
END;

The SET statement on lines 7 through 13 builds the entire UNLOAD statement and the EXECUTE IMMEDIATE on line 14 runs it.

The input SELECT is added to the UNLOAD on line 9.

The delimiter strings on lines 11 and 12 have their single quotes ''doubled up'' because they are embedded in the outer 'string literals'.

The magic of EXECUTE IMMEDIATE allows you to embed a reference to a local variable inside the statement to be executed (INTO VARIABLE @result on line 10) and then refer to that variable again afterwards: the LEFT ( @result, ... on line 15.

Dilbert.com


Monday, February 6, 2012

Latest SQL Anywhere EBF - 12.0.1.3554 for Windows

The three asterisks "***" show what's new since the previous list.

  • Only EBFs for the latest fully-supported versions of SQL Anywhere are shown here: 10.0.1, 11.0.1 and 12.0.1.

  • Note: Some earlier versions are also fully supported but they are not shown here; e.g., the 12.0.0.2758 EBF for Windows was recently released. For more information see SQL Anywhere Supported Platforms and Engineering Support Status.

  • Also: Just because an older version isn't "fully supported" doesn't mean you can't download files (or ask questions, or get help), it just means there won't be any more new EBFs released. For a complete list of everything available for download see Downloads - EBFs / Maintenance.

Current builds for the most active platforms...

HP-UX Itanium    12.0.1.3523 EBF       15 Dec 2011
                 11.0.1.2745 EBF       24 Jan 2012
                 10.0.1.3777 EBF       14 Oct 2008

IBM AIX          12.0.1.3523 EBF       15 Dec 2011
                 11.0.1.2745 EBF       26 Jan 2012

Linux x86/x64    12.0.1.3519 EBF       20 Dec 2011
                 11.0.1.2745 EBF       24 Jan 2012
                 10.0.1.4310 EBF       17 Jan 2012

Mac OS           12.0.1.3352 EBF       26 May 2011
                 11.0.1.2449 EBF       29 Jun 2010
                 10.0.1.4042 EBF       01 Oct 2010

Solaris SPARC    12.0.1.3523 EBF       15 Dec 2011
                 11.0.1.2745 EBF       26 Jan 2012
                 10.0.1.4310 EBF       17 Jan 2012

Solaris x64      12.0.1.3523 EBF       15 Dec 2011
                 11.0.1.2745 EBF       24 Jan 2012
                 10.0.1.3870 EBF       01 Apr 2009

Windows x86/x64  12.0.1.3554 EBF ***   31 Jan 2012 ***
                 11.0.1.2744 EBF       26 Jan 2012
                 10.0.1.4310 EBF       12 Jan 2012
[Download EBFs here] [Register for the Fuji Beta here]

...and for the other platforms:

HP-UX PA-RISC    10.0.1.3778 EBF                 16 Oct 2008

Linux Itanium    10.0.1.3777 EBF                 14 Oct 2008

NetWare          10.0.1.4036 EBF                 10 Mar 2010

Windows CE       10.0.1.4036 EBF                 09 Mar 2010

Windows Itanium  10.0.1 GA Upgrade from 10.0.0   26 Apr 2007
[Download EBFs here]


Friday, February 3, 2012

Coding Good Comments: Modification History

The easiest kind of comment to write is Modification History: A comment which identifies the "What?" and "When?" of each change made to the program code and serves to answer questions like "Did I remember to make some change?"

Too bad Modification History comments are also be the least important kind of comment, certainly less important than Exhortations and Explanations, probably less important than Section Titles.

Nevertheless, Modification History comments can be helpful when they provide a concise overview of changes made from the developer's point of view... with the emphasis on "concise" and "developer's point of view" since version control utilities and even source code comparison programs can provide all the excruciating syntactical detail you would ever need.

Example 1

The first few lines of Foxhound's most important and most heavily-modified stored procedure looks like this:
-- 2007 02 04 BC 1988a: Date begun.
-- (see bottom of file for earlier modification history)
-- 2011 12 05 BC 4025a: Published: Foxhound GA 1.2.4025a
-- 2011 12 25 BC 4033a: Removed: Old commented-out code.
-- 2012 02 19 BC 4046a: Added:   SELECT ... rroad_global_options.enable_schedules   INTO ... @enable_schedules   FROM rroad_global_options;
-- 2012 02 19 BC 4046a: Changed: Comment renumbered: -- Step 5 - Check to see if sampling has been turned off. 
-- 2012 02 19 BC 4046a: Changed: Comment renumbered: -- Step 6 - Get the properties. 
-- 2012 02 19 BC 4046a: Added:   Code section: -- Check the connection count.
-- 2012 02 19 BC 4046a: Added:   Code section: -- Check the connection sample schedule.
-- 2012 02 19 BC 4046a: Changed: IF @connection_properties_should_be_retrieved = 'Y' THEN

PARAMETERS MONITOR_DEBUG_MESSAGES; 

-------------------------------------------------------------------------------------------------------------------------------------------------
MESSAGE STRING ( '****************************** CREATE PROCEDURE rroad_monitor_sample' ) TO CLIENT;

BEGIN
   DROP PROCEDURE rroad_monitor_sample;
   EXCEPTION WHEN OTHERS THEN
END;

-----------------------------------------------------------------------------------
CREATE PROCEDURE rroad_monitor_sample (
   IN  @sampling_id                        UNSIGNED INTEGER )

BEGIN

Each Modification History comment is limited to one line, and includes the date, developer's initials and the build number which first included the change.

Some changes are eyecatchers, like "Published: Foxhound GA 1.2.4025a" which server to identify big events in the lifespan of the application.

Other changes include snippets of code to facilitate future searches: "Added: SELECT ... rroad_global_options.enable_schedules INTO ... @enable_schedules FROM rroad_global_options;"

When whole sections of code are affected, the Section Title comment is included, as in "Added: Code section: -- Check the connection count."

Generally speaking, only the most recent Modification History entries stand a chance of being important to the developer, which is why the older entries are moved out of the way: "(see bottom of file for earlier modification history)". In the case of this module there are 387 lines of Modification History comments at the bottom of the module... someday, they might be deleted, but they're not in the way and from time to time they're useful, so the effort required to choose the cutoff-point for each module has never seemed worthwhile.

Example 2

It is tempting to code Modification History comments down in the body of the code, close to the affected lines, especially when the effort required to make the changes was very high... but

Resist The Temptation!

Modification History comments just . . . aren't . . . that . . . important! ...and it's just that kind of clutter that gives comments a bad name.

With every rule, however, comes an exception, and Foxhound has exactly one. In the previous article on Exhortations this comment was described:

Here's an example from inside a SQL script that contains CREATE TABLE and other definitions for tables that are subject to the "Data Upgrade" process when an old copy of the application database is upgraded to a new version:
-- ***********************************************************************
-- ***** EVERY CHANGE DOCUMENTED HERE MUST ALSO BE DOCUMENTED IN     *****
-- ***** 015c_rroad_data_upgrade_startup.sql EVEN IF IT IS           *****
-- ***** NOT SIGNIFICANT TO THE DATA UPGRADE PROCESS.                *****
-- ***********************************************************************


The "MUST ALSO BE DOCUMENTED" exhortation is put into effect with a series of Modification History comments copied from the first module and embedded down in the code of the second:
-- 4026-9999 eighth historical era

   -- Changes requiring work in this module...
   -- 2011 12 18 BC 4026a: Added:   CREATE TABLE rroad_schedule (
   -- 2011 12 18 BC 4026a: Added:   CREATE TABLE rroad_schedule_day_entry (
   -- 2011 12 18 BC 4026a: Added:   CREATE TABLE rroad_schedule_period_entry (
   -- 2012 02 20 BC 4047a: Added:   Column rroad_alert.email_status                LONG VARCHAR NOT NULL );
   -- 2012 02 20 BC 4047a: Added:   Column rroad_all_clear.email_status                LONG VARCHAR NOT NULL );
   -- 2012 02 20 BC 4047a: Added:   Column rroad_alert_cancelled.email_status                LONG VARCHAR NOT NULL );

   -- Changes not requiring work in this module...
   -- 2011 12 15 BC 4026a: Changed: Comment: rroad_alerts_criteria.display_alerts marked "-- no longer used"
   -- 2011 12 27 BC 4033a: Added:   Column rroad_global_options.enable_schedules           VARCHAR ( 1 ) NOT NULL DEFAULT 'Y' CHECK ( @enable_schedules IN ( 'Y', 'N' ) ),
   -- 2011 12 27 BC 4034a: Added:   Column rroad_schedule.enforce_this_schedule            VARCHAR ( 1 ) NOT NULL DEFAULT 'Y' CHECK ( @enforce_this_schedule IN ( 'Y', 'N' ) ),
   -- 2011 12 27 BC 4036a: Added:   Column rroad_sampling_options.activate_sample_schedule VARCHAR ( 1 ) NOT NULL DEFAULT 'Y' CHECK ( @activate_sample_schedule IN ( 'Y', 'N' ) ),
   -- 2011 12 31 BC 4041a: Changed: rroad_schedule_day_entry.day_number ... CHECK ( @day_number BETWEEN 1 AND 7 ),
   -- 2011 12 31 BC 4041a: Changed: rroad_schedule_period_entry.period_number ... CHECK ( @period_number BETWEEN 1 AND 672 ),
   -- 2011 12 31 BC 4041a: Changed: rroad_schedule_period_entry.period_code ... CHECK ( @period_code IN ( 'Y', '.' ) ),
   -- 2012 02 17 BC 4044a: Added:   Column rroad_sampling_options.schedule_request         VARCHAR ( 100 ) NOT NULL DEFAULT '',

Ugly, but easy to do, and easy to check later on to see if anything got missed... and yes, "checking later" is a frequent task.

The Bottom Line

If you're not going to do a thorough job of writing Modification History comments, then don't bother trying... nothing is more wasteful than a poorly executed task that results in a work product nobody uses.

"Thorough" doesn't mean "provide the same level of detail as a source code comparison program."

It doesn't even mean "document every trivial syntax error" that you fixed... but it does mean "list every non-trivial maintenance task you performed on the module"... on the code and on the other comments.

What? All of a sudden comments aren't important? Well, if that's how you feel, see "don't bother trying"... most don't.


Wednesday, February 1, 2012

Coding Good Comments: Explanations

The code comes first: If there's a choice to be made between writing clear, understandable program code and writing comments to explain less-than-readable code, comments should take a back seat.

The reason is simple: It's the code that determines what the program does, not the comments. Comments aren't even compiled, let alone executed. So it shouldn't come as a surprise that maintenance programmers concentrate on the code, often skipping over the comments altogether.

Sadly, however, even readable, understandable program code tells only part of the story: It only answers the question, "How is this program accomplishing its tasks?" Very few executable statements directly address the question, "What is this program doing?", with the possible exception of a CALL statement where the procedure name has been carefully crafted to answer the "What?"

There are no exceptions, however, when it comes to answering the questions, "Why is the program doing this thing?" or the much harder "Why isn't the program doing this other thing?"

So, comments come second, but

Explanations Are Important


Explanation: A concise comment describing the "What" and "Why" of a section of program code while avoiding the "How" which should be apparent from reading the code itself.

Example 1

Here is a short section of simple code (basically a SELECT FROM DUMMY) with several explanations, some good, some not so good:
------------------------------------------------------------
      -- Perform the basic heartbeat or canarian query.
      -- This is done to calculate the heartbeat time as well check the connection.
      -- Note: The target database can be stopped and restarted, and the monitor should just keep trying to connect.
      -- Note: When Foxhound itself is is stopped and restarted, reconnection to target databases *should* be automatic.

      SET @sql = STRING ( 'SELECT dummy_col INTO @dummy_col FROM ', @proxy_owner, '.proxy_DUMMY' );

      SET @canarian_query_started_at  = CURRENT TIMESTAMP;
      SET @canarian_query_finished_at = @canarian_query_started_at;

      EXECUTE IMMEDIATE @sql;

      ROLLBACK; -- release locks, recover from possible failed proxy connection

      SET @connected_ok = 'Y';
      SET @canarian_query_finished_at = CURRENT TIMESTAMP;

      MESSAGE STRING ( 'DIAG ', CURRENT TIMESTAMP, ' ', CONNECTION_PROPERTY ( 'Number' ), 
         ' 204-4 Monitor ', @sampling_id, ': First heartbeat OK' ) TO CONSOLE DEBUG ONLY;

      -- Note: Because the proxy_DUMMY table is the last proxy object created, a 
      -- successful query against proxy_DUMMY may taken as proof that a connection exists
      -- and is usable. This is important because a previous connection attempt
      -- may be dropped (timed out) after it created some but not all necessary proxy objects.
      -- Such a partly-established connection must be treated as no connection at all, and
      -- that is guaranteed by creating proxy_DUMMY last.

In the comment "This is done..." the phrase "as well as check the connection" provides extra information about the "What?" of this code, so that's OK as an explanation.

The phrase "calculate the heartbeat time" is a bit redundant, however... not wrong, but not really necessary given the preceding section title "Perform the basic heartbeat or canarian query."

The two one-line "Note:" comments are a bit other-worldly because they state true facts that have very little (nothing?) to do with nearby code. These score zero on the Comment Value Scale; not a positive score because they offer no real value, but not a negative score because at least they aren't wrong.

The long "Note: Because the proxy_DUMMY table is ..." paragraph scores very high, however, because it explains a very important "Why?" for this otherwise trivial-looking section of code.

Example 2

This example shows two separate explanations that were added to the code at two separate times.

The later paragraph "It is possible..." came first, chronologically speaking, to explain why the unusual step of calling GET_IDENTITY was performed instead of just letting the later INSERT assign the value... so far, so good, the original "Why?" is covered.
------------------------------------------------------------
-- Calculate candidate primary key for successful sample.

-- Note: The following discussion is moot, but @candidate_sample_set_number is still
--       assigned because later code refers to it.

-- It is possible for this event to take some time before inserting a row
-- in rroad_sample_set. During that time, it is also possible for sampling to be 
-- stopped, and a "sampling stopped" row to be inserted. The row inserted by this
-- event should have a lower primary key than the "sampling stopped" event, so it
-- cannot wait until the actual INSERT rroad_sample_set statement to calculate the
-- value of the autoincrement column sample_set_number. The following statement 
-- pre-assigns a value which will only be used for the INSERT of a "successful" 
-- sample set; if that INSERT is not performed because of some problem, there will
-- be a gap in the primary key sequence.

SET @candidate_sample_set_number = GET_IDENTITY ( 'rroad_sample_set' );

The second explanation, the previous "Note: ...", was added later to explain that GET_IDENTITY was being called through laziness rather than real need. This explanation isn't quite as thorough, but it does serve to prevent the new maintenance programmer from searching the code to find the reason GET_IDENTITY was called.

In a perfect world, the call to GET_IDENTITY would have been removed as soon as it became unnecessary. In an imperfect world, it's easier to add the "Note:" when the call became moot... not just because it's less work to write the comment than change the code, but also because comments aren't compiled so the "Note:" doesn't have to be tested.

Example 3

The following example almost got left out because at first glance it looked like just another bad explanation: Who needs to be told that sa_locks needs DBA authority?
------------------------------------------------------------
   -- Create proxy_sa_locks if necessary.
   -- This is only done after a full connection process.
   -- Note: DBA authority is needed to call sa_locks.

Well, with this application (Foxhound) whether the user id has DBA authority or not controls on how much detail the user's going to get, and this comment explains to the maintenance programmer that information about row locks is going to be missing from that detail if the user doesn't have DBA authority.

So, this "Note: DBA authority is needed to call sa_locks" looks like a "How?" but it's really a "What?", albeit on a grander scale: "What does the maintenance programmer need to know about this code?"

Example 4

Here is a breathtakingly-verbose "Note:" that explains a very simple-looking IF NOT EXISTS RETURN control structure:
------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------
-- Step 5 - Check to see if sampling has been turned off. 
------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------

IF  @ok = 'Y'  
AND @connected_ok = 'Y' THEN 

   -- Note: This code is an attempt to prevent a successful sample from appearing between 
   --       successive "Sampling stopped at" and "Sampling stopped" lines in the Monitor display.
   --       This process may already be executing the code above this point when sampling is stopped by
   --       a different process, and the @sample_started_at used by this process is calculated in the
   --       next section... which is after the the "Sampling stopped at" point calculated by the other
   --       process but before the eventual "Sampling stopped" point which will be calculated by some
   --       (probably much later) process.
   -- 
   --       In the following example, the sample at Apr 22 5:12:21 PM should not have been recorded:
   --          Apr 22 5:13:35 PM   1m 14.2s    0s / .1s ...
   --          Apr 22 5:13:21 PM   1m 18.8s   -- Sampling stopped --  
   --          Apr 22 5:12:21 PM   0s    0s / 0s    1 ...
   --          Apr 22 5:12:02 PM   1m 24.2s   -- Sampling stopped at --  
   --          Apr 22 5:10:38 PM   1m 10.1s    .1s / .1s ...
   --
   --       In the following example, the sample at 5:11:23 PM should not have been recorded:
   --          5:11:32 PM   8.5s    .1s / 0s ...
   --          5:11:27 PM    ALL CLEAR - Alert # 1: Foxhound has been unable to gather samples for 1m or longer. 
   --          5:11:25 PM   4m 54.7s   -- Sampling stopped --  
   --          5:11:23 PM   0s    0s / .1s ...
   --          5:06:30 PM   0s   -- Sampling stopped at --  
   --          5:06:30 PM   39.6s   -- Foxhound stopped --  
   --          5:05:50 PM   0s   -- Foxhound stopped at --  

   IF NOT EXISTS ( SELECT *
                     FROM rroad_sampling_options
                    WHERE rroad_sampling_options.sampling_id                = @sampling_id
                      AND rroad_sampling_options.sampling_should_be_running = 'Y' ) THEN

      RETURN; -- do not record this sample

   END IF;

Here's the reason for the long explanation: The bug very hard to find, and the fix so simple that no future maintenance programmer, not even the origial author, would understand the "Why?" of the IF-NOT-EXISTS-RETURN code without some help.

In fact, without the explanation, it's not clear that the code is even correct let alone necessary. Comments don't just save time, they can prevent mistakes (like removing a piece of code because it looks wrong.)

In this case the comment itself wasn't that hard to write: the sample data was gathered as part of the diagnostic effort, and the explanation was worked out carefully and even written down as part of that same effort... it took very little extra effort to copy the explantion into the code as an aid to future developers.

The Bottom Line

Explanations belong inside the code, not locked in the fading memory of the original author or lost inside some other document no developer will ever read.

That is, if you think explanations are important.

Folks who don't think explanations are important generally don't think readability's important either. That's why most code without comments is also unreadable... not because there are no comments, but because the author cares more about abstract measurements of "elegance" or believes that somehow readability contradicts correctness.