Friday, August 9, 2013

DATEDIFF Mistakes Case Study: This Blog

After writing about problems with DATEDIFF it seems natural to look for examples, and there's no more exciting place to start throwing stones than inside one's own glass house!

And so we have . . .

Dogfooding DATEDIFF

. . . a critical look at the use and abuse of SQL Anywhere's DATEDIFF function in this blog.

In other words, a Bug Hunt, with each example of DATEDIFF ranked as follows:
  • FAIL means the DATEDIFF usage is badly flawed; it shouldn't have been coded that way, and it should be fixed.

  • LUCKY means the DATEDIFF usage may be flawed but it doesn't matter given the data values involved.

  • OK means the DATEDIFF usage is OK given the data types involved.
That last point about data types is interesting; it isn't discussed in the Help, it wasn't mentioned in Documenting DATEDIFF, and it makes this Bug Hunt all the more worthwhile.


Example 1 is OK: Let's play "Gotcha!" - Round Two

All the examples share one characteristic in common: The second and third DATEDIFF arguments (the date/time values) are no more precise than the first argument (the unit name). In other words, DAY is used on dates with no time component, and SECOND is used on timestamps that don't have fractional seconds.
SELECT DATEDIFF ( DAY, '2011-09-28', '2011-09-29' );
SELECT DATEDIFF ( SECOND, '2011-09-28 23:59:58', '2011-09-28 23:59:59' );
SELECT DATEDIFF ( SECOND, '7910-12-31 23:59:58', '7910-12-31 23:59:59' );
SELECT DATEDIFF ( SECOND, '7910-12-31 23:59:59', '7911-01-01 00:00:00' );
As a result, they all give correct results.

Well, the first 3 do, and the last one would have worked if the third argument hadn't been out of range (the whole point behind the article :).


Example 2 is OK: Let's play "Gotcha!" - Round Three
DATEDIFF ( SECOND, '2011-09-28 23:59:58', '2011-09-28 23:59:59' )
DATEDIFF ( SECOND, '7910-12-31 23:59:58', '7910-12-31 23:59:59' )
DATEDIFF ( SECOND, '6910-12-31 23:59:58', '7910-12-31 23:59:59' )
No DATEDIFF problems here at all, for the same reason: All the timestamps are precise only to the second.


Example 3 is LUCKY: Intra-Procedure Parallelism

There are a couple of problems with these DATEDIFF calls:
...
DECLARE @start        TIMESTAMP;
DECLARE @start_step_1 TIMESTAMP;
DECLARE @start_step_2 TIMESTAMP;
...
SET @start        = CURRENT TIMESTAMP;
SET @start_step_1 = CURRENT TIMESTAMP;
...
MESSAGE STRING ( CAST ( DATEDIFF ( MILLISECOND, @start_step_1, CURRENT TIMESTAMP ) 
                        AS DECIMAL ( 11, 2 ) ) / 1000.0,
                 ' seconds to perform step 1' ) TO CONSOLE;
...
SET @start_step_2 = CURRENT TIMESTAMP;
...
MESSAGE STRING ( CAST ( DATEDIFF ( MILLISECOND, @start_step_2, CURRENT TIMESTAMP ) 
                        AS DECIMAL ( 11, 2 ) ) / 1000.0,
                 ' seconds to perform step 2' ) TO CONSOLE;

MESSAGE STRING ( CAST ( DATEDIFF ( MILLISECOND, @start, CURRENT TIMESTAMP ) 
                        AS DECIMAL ( 11, 2 ) ) / 1000.0,
                 ' seconds to perform both steps' ) TO CONSOLE;
...
10.1680000 seconds to perform step 1
19.9700000 seconds to perform step 2
30.1440000 seconds to perform both steps
First, the calls may return values that are incorrect by up to one millisecond because DATEDIFF MILLISECOND returns the number of millisecond boundaries between the two timestamps.

Second, since DATEDIFF MILLISECOND returns a BIGINT, the CAST is singularly pointless, possibly dangerous: You can't magically add two digits of precision to an integer, and DECIMAL ( 11, 2 ) isn't big enough for a BIGINT:
SELECT CAST ( 9223372036854775807 AS DECIMAL ( 11, 2 ) );

Value 9223372036854775807 out of range for destination
SQLCODE=-158, ODBC 3 State="22003"
But wait! There's no way this code will run long enough overflow a DECIMAL ( 11, 2 ), and the CAST is there to force decimal rather than integer division. The division by 1000.0 indicates that the user is interested in seconds, rather than milliseconds.

In other words, DATEDIFF MILLISECOND is being used to increase accuracy, not decrease it... an error of one millisecond is OK whereas DATEDIFF SECOND might have an error of a whole second.

The code's not wrong, but it is a bit sloppy: The output shows 7 digits of precision to the right of the decimal point whereas the actual values are only precise to the second digit. An outer CAST AS DECIMAL ( 11, 2 ) call could be used to show this... or call to ROUND().


Example 4 is OK: Today's Tip: Counting Days of the Week

The code is OK because "number of day boundaries" is the same as "number of days" when you're talking about dates with no time component:
DATEDIFF ( DAY, '2007-12-14', '2008-01-29' )
Even the explanation in the article is OK, in this particular case: "The DATEDIFF ( ... ) call returns the number of days between the two dates."


Example 5 is a FAIL: I'm lonely! signed, Your Database

This code uses DATEDIFF SECOND on full-precision timestamps, so it can return a value that may be up to one second too large because it counts the number of second boundaries between the two timestamps, not the number of seconds difference.

In this example, that means a repeated email may be sent in 19 seconds rather than 20:
...
DECLARE @email_sent_at                       TIMESTAMP;
DECLARE @current_timestamp                   TIMESTAMP DEFAULT CURRENT TIMESTAMP;
...
DECLARE @email_repeat_threshold_in_seconds   BIGINT DEFAULT 20;
...
      SELECT email_sent_at
        INTO @email_sent_at
        FROM lonely; 

      IF DATEDIFF ( SECOND, @email_sent_at, @current_timestamp ) 
            >= @email_repeat_threshold_in_seconds THEN  
...

Maybe a 5% error doesn't matter, or even a 10% error (if the DEFAULT was 10 instead of 20), but that's not the point... the author (me) didn't realize the implications of using DATEDIFF SECOND on precise timestamp values.

Not knowing is not good, not in this business.


Example 6 is OK: Everything Looks Like a Database

This one is ok, a DATEDIFF DAY on two dates that don't contain time components:
...
DECLARE @from_date DATE;
DECLARE @to_date   DATE;
SET @from_date = '2009-02-01';
SET @to_date   = '2009-12-31';
...

             DATEDIFF ( DAY, @from_date, @to_date ) ) )
...


Example 7 is LUCKY: Capturing the Server Console Log

Here's an example of a DATEDIFF call that can return values that are too large by up to one millisecond:
...
DECLARE LOCAL TEMPORARY TABLE checkpoint_record (
   checkpoint_starting   TIMESTAMP NOT NULL PRIMARY KEY,
   checkpoint_finished   TIMESTAMP )
   NOT TRANSACTIONAL;
...
SELECT *,
       DATEDIFF ( MILLISECOND, 
                  checkpoint_record.checkpoint_starting,  
                  checkpoint_record.checkpoint_finished ) AS msec
  FROM checkpoint_record
 ORDER BY checkpoint_record.checkpoint_starting;
...
checkpoint_starting      checkpoint_finished      msec
2011-01-30 05:11:32.000  2011-01-30 05:11:32.281   281
2011-01-30 05:31:33.453  2011-01-30 05:31:33.937   484
2011-01-30 05:51:35.046  2011-01-30 05:51:35.515   469
2011-01-30 06:11:36.640  2011-01-30 06:11:37.078   438
2011-01-30 06:31:38.234  2011-01-30 06:31:38.781   547
2011-01-30 06:51:39.937  2011-01-30 06:51:41.125  1188
With these particular values one second is an error of 0.3% to 0.08%... which is probably OK.

The point, however, is the same as before: One shouldn't use DATEDIFF MILLISECOND if one wants millisecond accuracy.


Example 8 is OK: Great Moments In History: Housing Bubble

The code in this post uses DATE values with no time components, so DATEDIFF DAY works just fine:
       DATEDIFF ( DAY, first_date, last_date ) AS days,


It could be said the scoring is too lenient: Given the ignorance factor all the OK scores should really be marked as LUCKY. And maybe, all the LUCKYs should be FAIL.

Dilbert.com 1991-08-11

No comments: