Monday, January 19, 2009

Readable, Correct, Concise

Question: How do I format a number with commas and a dollar sign? I want to format -123456789 as '-$1,234,567.89'.

Answer: The glib, dogmatic answer is "You don't do stuff like that in SQL, you do it on the client side!"

Sure, but what if the database server IS part of the "client side"? The boundaries are getting blurred these days. Consider web services, where a SQL Anywhere stored procedure can return fully-formatted XML, JSON or HTML text to the client side.

In that case, some of the traditional client-side tasks (like formatting a number with commas) may be performed on the server side (by SQL code inside a stored procedure).

Here are the specs:

  • Receive a DECIMAL ( 30, 2 ) input argument like 0 or -9999999999999999999999999999.99 and return it as a left-justified VARCHAR ( 50 ) string like '$0.00' or '-$9,999,999,999,999,999,999,999,999,999.99'

  • Omit the sign for positive values.

  • Always show the '$' sign.

  • Always show at least one digit to the left of the decimal point.

  • Always show exactly two digits to the right of the decimal point.

  • If the input is NULL, return NULL.

  • If an EXCEPTION is raised inside the function, RETURN '???' after logging the error via MESSAGE ... TO CONSOLE DEBUG ONLY.
Here are some test cases displayed in dbisql:



Here are some comments and caveats:
  • The input is fixed-point DECIMAL, not floating point, simply to encourage the calling code to answer the question "Do I round or truncate?" before calling the function. The implicit default is "round to two decimal places", and that happens at the interface when the actual input value is converted to DECIMAL ( 30, 2 ).

  • Here's a general rule of thumb: FLOAT and DOUBLE data types should never, ever be used for exact monetary values. I feel strongly about that, especially about the "mysterious rounding errors" mentioned here:
    "Programmers writing financial applications often mistakenly use floating-point datatypes to represent monetary values, such as dollars and cents. Doing so results in mysterious rounding errors where the total of a column is slightly different from the exact arithmetic sum of all integer values that represent the currency amounts." - Sybase Technote Using the float Datatype

  • If conversion errors are going to happen with this function, they're going to happen at the interface; e.g., when an attempt is made to pass 'xxx' as input.

  • The function is coded to always return something; e.g., NULL or '???'.

  • There are no funky RAISERROR, SIGNAL or RESIGNAL statements to confuse the programmer. If the function's going to be used in a life-critical environment, it's up to the calling code to check for '???' return values.

  • Of course, a function like this isn't likely to be used in a life-critical environment. However, it might be used in consumer-oriented website, and nothing is more likely to drive a customer away than seeing a cryptic internal error message bubbling up to the browser. That's another reason not to use RAISERROR and the like, and it's a reason you might choose to RETURN the empty string '' instead of '???' when an exception occurs. In other words, sometimes it's better to show nothing at all than to show big scary stuff.

  • Exception logging is only performed in "debugging mode"; i.e. when SET TEMPORARY OPTION DEBUG_MESSAGES = 'ON' is in effect for the current connection.

  • And yes, all this talk about exception handling is overkill for a function this small, this simple... or is it? What about future enhancements? What about copying-and-modifying this function to do something more difficult, like return 'fifty five dollars and 32 cents'?
The code itself is written primarily to be readable. The second goal is correctness, and in third place comes conciseness. All three goals have been achieved, as opposed to the code samples here (less readable, less concise... including one provided by me) and the to_money function here (less readable, less concise, not correct...
  • to_money ( -123.45 ) returns '$-,123.45'

  • to_money ( 0 ) returns '$0' instead of '$0.00' in SQL Anywhere 9, whereas to_money ( 0.1 ) returns '$0.10'

  • to_money ( 0 ) fails in versions 10 and 11: Function 'substr' has invalid parameter '3' ('-1')
Putting readability first improves the chances of achieving the second goal, correctness... both during initial development and during later maintenance. If code is easy to read, it's easy to fix; use the "telephone test".

Conciseness can be important if verbosity is standing in the way of readability. Also, the search for conciseness can be a good learning exercise... it certainly was, here, for me.

Here's the code, followed by some comments about how it works:
CREATE FUNCTION dollar_number_string (
IN amount DECIMAL ( 30, 2 ) )
RETURNS VARCHAR ( 50 )
BEGIN

DECLARE string_amount VARCHAR ( 50 );
DECLARE insertion_point INTEGER;

DECLARE @sqlcode INTEGER;
DECLARE @sqlstate VARCHAR ( 5 );
DECLARE @errormsg VARCHAR ( 32767 );

IF amount IS NULL THEN
RETURN NULL
END IF;

SET string_amount = STRING (
IF amount BETWEEN -.99 AND -.01
OR amount BETWEEN .01 AND .99
THEN '0'
ELSE ''
ENDIF,
ABS ( amount ) );

SET insertion_point = LENGTH ( string_amount ) - 6;

WHILE insertion_point > 0 LOOP

SET string_amount = INSERTSTR (
insertion_point, string_amount, ',' );

SET insertion_point = insertion_point - 3;

END LOOP;

RETURN STRING (
IF amount < 0.00
THEN '-'
ELSE ''
ENDIF,
'$',
string_amount );

EXCEPTION
WHEN OTHERS THEN
SELECT SQLCODE, SQLSTATE, ERRORMSG()
INTO @sqlcode, @sqlstate, @errormsg;
MESSAGE STRING (
'EXCEPTION in dollar_number_string at ',
CURRENT TIMESTAMP,
': SQLCODE = ', @sqlcode,
', SQLSTATE = ', @sqlstate,
', ERRORMSG() = ', @errormsg )
TO CONSOLE DEBUG ONLY;
RETURN '???';

END;
Here are some comments about the code:
  • The first SET string_amount strips the sign from the amount and converts it to a string.

  • The string conversion process automatically inserts a period and pads to the right with zeros if necessary; e.g., SELECT STRING ( CAST ( 123 AS DECIMAL ( 30, 2 ) ) ) is '123.00'.

  • The first SET also handles a quirk of the string conversion process whereby 0 is converted to '0.00' but .1 is converted to '.10' (no leading zero).

  • The SET insertion_point = LENGTH ( string_amount ) - 6 calculates the point after which the first comma must be inserted.

  • The WHILE loop inserts commas from right to left instead of left to right so the dynamically changing string length doesn't affect the calculation of insertion_point. This backward looping is an old trick that often makes string manipulation easier.

  • The WHILE loop stops when the insertion point falls off the front of the string; in fact, the loop may not even run at all.

  • The RETURN STRING adds the '$' to the result, plus a leading '-' if necessary.
Note that dollar_number_string supports 30 digits of precision. That number is arbitrary, but important... change it, and you'll be changing some other "magic numbers" in the code.

What's a magic number? It's an Unnamed numerical constant. Magic numbers are generally regarded as bad things. So sue me.

Seriously, the code would be more readable if these magic numbers replaced by named variables; i.e., use "rightmost_insertion_point_offset" for the 6 and "insertion_point_decrement" for the 3.

Here is the code for the test cases shown earlier:
BEGIN
SELECT *
FROM (
SELECT CAST ( NULL AS VARCHAR ( 100 ) ) AS "input",
dollar_number_string ( NULL ) AS "output"
UNION ALL
SELECT '-0', dollar_number_string ( -0 )
UNION ALL
SELECT '-.1', dollar_number_string ( -.1 )
UNION ALL
SELECT '-.99', dollar_number_string ( -.99 )
UNION ALL
SELECT '-.994', dollar_number_string ( -.994 )
UNION ALL
SELECT '-.995', dollar_number_string ( -.995 )
UNION ALL
SELECT '-1', dollar_number_string ( -1)
UNION ALL
SELECT '-1.1', dollar_number_string ( -1.1 )
UNION ALL
SELECT '-23.45', dollar_number_string ( -23.45 )
UNION ALL
SELECT '-123.45', dollar_number_string ( -123.45 )
UNION ALL
SELECT '-9123.45', dollar_number_string ( -9123.45 )
UNION ALL
SELECT '-99123.45', dollar_number_string ( -99123.45 )
UNION ALL
SELECT '-999123.45', dollar_number_string ( -999123.45 )
UNION ALL
SELECT '-9999123.45', dollar_number_string ( -9999123.45 )
UNION ALL
SELECT '-123456789012345678.90',
dollar_number_string ( -123456789012345678.90 )
UNION ALL
SELECT '-1234567890123456789012345678.90',
dollar_number_string ( -1234567890123456789012345678.90 )
UNION ALL
SELECT '-9999999999999999999999999999.99',
dollar_number_string ( -9999999999999999999999999999.99 )
UNION ALL
SELECT '0', dollar_number_string ( 0 )
UNION ALL
SELECT '.1', dollar_number_string ( .1 )
UNION ALL
SELECT '.99', dollar_number_string ( .99 )
UNION ALL
SELECT '.994', dollar_number_string ( .994 )
UNION ALL
SELECT '.995', dollar_number_string ( .995 )
UNION ALL
SELECT '1.00', dollar_number_string ( 1.00 )
UNION ALL
SELECT '1.1', dollar_number_string ( 1.1 )
UNION ALL
SELECT '23.45', dollar_number_string ( 23.45 )
UNION ALL
SELECT '123.45', dollar_number_string ( 123.45 )
UNION ALL
SELECT '9123.45', dollar_number_string ( 9123.45 )
UNION ALL
SELECT '99123.45', dollar_number_string ( 99123.45 )
UNION ALL
SELECT '999123.45', dollar_number_string ( 999123.45 )
UNION ALL
SELECT '9999123.45', dollar_number_string ( 9999123.45 )
UNION ALL
SELECT '123456789012345678.90',
dollar_number_string ( 123456789012345678.90 )
UNION ALL
SELECT '1234567890123456789012345678.90',
dollar_number_string ( 1234567890123456789012345678.90 )
UNION ALL
SELECT '9999999999999999999999999999.99',
dollar_number_string ( 9999999999999999999999999999.99 )
) AS testing
ORDER BY CAST ( "input" as DOUBLE );
END;
Finally, for the sake of comparing apples-to-apples as far as conciseness goes, here's the same function re-coded using an ultra-abbreviated style:



No, I'm not saying that's good style... in fact, it sucks when used for large blocks of code. The point is: yes, the code's concise.

1 comment:

Anonymous said...

Hi Breck, thank you sooo much for sharing the concise code. A very great help really.