After writing about problems with DATEDIFF it seems natural to look for examples, and there's no more exciting place to start throwing stones than inside one's own glass house!
And so we have . . .
Dogfooding DATEDIFF
. . . a critical look at the use and abuse of SQL Anywhere's DATEDIFF function in this blog.In other words, a Bug Hunt, with each example of DATEDIFF ranked as follows:
- FAIL means the DATEDIFF usage is badly flawed; it shouldn't have been coded that way, and it should be fixed.
- LUCKY means the DATEDIFF usage may be flawed but it doesn't matter given the data values involved.
- OK means the DATEDIFF usage is OK given the data types involved.
Example 1 is OK: Let's play "Gotcha!" - Round Two
All the examples share one characteristic in common: The second and third DATEDIFF arguments (the date/time values) are no more precise than the first argument (the unit name). In other words, DAY is used on dates with no time component, and SECOND is used on timestamps that don't have fractional seconds.
As a result, they all give correct results.
SELECT DATEDIFF ( DAY, '2011-09-28', '2011-09-29' ); SELECT DATEDIFF ( SECOND, '2011-09-28 23:59:58', '2011-09-28 23:59:59' ); SELECT DATEDIFF ( SECOND, '7910-12-31 23:59:58', '7910-12-31 23:59:59' ); SELECT DATEDIFF ( SECOND, '7910-12-31 23:59:59', '7911-01-01 00:00:00' );
Well, the first 3 do, and the last one would have worked if the third argument hadn't been out of range (the whole point behind the article :).
Example 2 is OK: Let's play "Gotcha!" - Round Three
No DATEDIFF problems here at all, for the same reason: All the timestamps are precise only to the second.
DATEDIFF ( SECOND, '2011-09-28 23:59:58', '2011-09-28 23:59:59' ) DATEDIFF ( SECOND, '7910-12-31 23:59:58', '7910-12-31 23:59:59' ) DATEDIFF ( SECOND, '6910-12-31 23:59:58', '7910-12-31 23:59:59' )
Example 3 is LUCKY: Intra-Procedure Parallelism
There are a couple of problems with these DATEDIFF calls:
First, the calls may return values that are incorrect by up to one millisecond because DATEDIFF MILLISECOND returns the number of millisecond boundaries between the two timestamps.
... DECLARE @start TIMESTAMP; DECLARE @start_step_1 TIMESTAMP; DECLARE @start_step_2 TIMESTAMP; ... SET @start = CURRENT TIMESTAMP; SET @start_step_1 = CURRENT TIMESTAMP; ... MESSAGE STRING ( CAST ( DATEDIFF ( MILLISECOND, @start_step_1, CURRENT TIMESTAMP ) AS DECIMAL ( 11, 2 ) ) / 1000.0, ' seconds to perform step 1' ) TO CONSOLE; ... SET @start_step_2 = CURRENT TIMESTAMP; ... MESSAGE STRING ( CAST ( DATEDIFF ( MILLISECOND, @start_step_2, CURRENT TIMESTAMP ) AS DECIMAL ( 11, 2 ) ) / 1000.0, ' seconds to perform step 2' ) TO CONSOLE; MESSAGE STRING ( CAST ( DATEDIFF ( MILLISECOND, @start, CURRENT TIMESTAMP ) AS DECIMAL ( 11, 2 ) ) / 1000.0, ' seconds to perform both steps' ) TO CONSOLE; ... 10.1680000 seconds to perform step 1 19.9700000 seconds to perform step 2 30.1440000 seconds to perform both steps
Second, since DATEDIFF MILLISECOND returns a BIGINT, the CAST is singularly pointless, possibly dangerous: You can't magically add two digits of precision to an integer, and DECIMAL ( 11, 2 ) isn't big enough for a BIGINT:
But wait! There's no way this code will run long enough overflow a DECIMAL ( 11, 2 ), and the CAST is there to force decimal rather than integer division. The division by 1000.0 indicates that the user is interested in seconds, rather than milliseconds.
SELECT CAST ( 9223372036854775807 AS DECIMAL ( 11, 2 ) ); Value 9223372036854775807 out of range for destination SQLCODE=-158, ODBC 3 State="22003"
In other words, DATEDIFF MILLISECOND is being used to increase accuracy, not decrease it... an error of one millisecond is OK whereas DATEDIFF SECOND might have an error of a whole second.
The code's not wrong, but it is a bit sloppy: The output shows 7 digits of precision to the right of the decimal point whereas the actual values are only precise to the second digit. An outer CAST AS DECIMAL ( 11, 2 ) call could be used to show this... or call to ROUND().
Example 4 is OK: Today's Tip: Counting Days of the Week
The code is OK because "number of day boundaries" is the same as "number of days" when you're talking about dates with no time component:
Even the explanation in the article is OK, in this particular case: "The DATEDIFF ( ... ) call returns the number of days between the two dates."
DATEDIFF ( DAY, '2007-12-14', '2008-01-29' )
Example 5 is a FAIL: I'm lonely! signed, Your Database
This code uses DATEDIFF SECOND on full-precision timestamps, so it can return a value that may be up to one second too large because it counts the number of second boundaries between the two timestamps, not the number of seconds difference.
In this example, that means a repeated email may be sent in 19 seconds rather than 20:
... DECLARE @email_sent_at TIMESTAMP; DECLARE @current_timestamp TIMESTAMP DEFAULT CURRENT TIMESTAMP; ... DECLARE @email_repeat_threshold_in_seconds BIGINT DEFAULT 20; ... SELECT email_sent_at INTO @email_sent_at FROM lonely; IF DATEDIFF ( SECOND, @email_sent_at, @current_timestamp ) >= @email_repeat_threshold_in_seconds THEN ...
Maybe a 5% error doesn't matter, or even a 10% error (if the DEFAULT was 10 instead of 20), but that's not the point... the author (me) didn't realize the implications of using DATEDIFF SECOND on precise timestamp values.
Not knowing is not good, not in this business.
Example 6 is OK: Everything Looks Like a Database
This one is ok, a DATEDIFF DAY on two dates that don't contain time components:
... DECLARE @from_date DATE; DECLARE @to_date DATE; SET @from_date = '2009-02-01'; SET @to_date = '2009-12-31'; ... DATEDIFF ( DAY, @from_date, @to_date ) ) ) ...
Example 7 is LUCKY: Capturing the Server Console Log
Here's an example of a DATEDIFF call that can return values that are too large by up to one millisecond:
With these particular values one second is an error of 0.3% to 0.08%... which is probably OK.
... DECLARE LOCAL TEMPORARY TABLE checkpoint_record ( checkpoint_starting TIMESTAMP NOT NULL PRIMARY KEY, checkpoint_finished TIMESTAMP ) NOT TRANSACTIONAL; ... SELECT *, DATEDIFF ( MILLISECOND, checkpoint_record.checkpoint_starting, checkpoint_record.checkpoint_finished ) AS msec FROM checkpoint_record ORDER BY checkpoint_record.checkpoint_starting; ... checkpoint_starting checkpoint_finished msec 2011-01-30 05:11:32.000 2011-01-30 05:11:32.281 281 2011-01-30 05:31:33.453 2011-01-30 05:31:33.937 484 2011-01-30 05:51:35.046 2011-01-30 05:51:35.515 469 2011-01-30 06:11:36.640 2011-01-30 06:11:37.078 438 2011-01-30 06:31:38.234 2011-01-30 06:31:38.781 547 2011-01-30 06:51:39.937 2011-01-30 06:51:41.125 1188
The point, however, is the same as before: One shouldn't use DATEDIFF MILLISECOND if one wants millisecond accuracy.
Example 8 is OK: Great Moments In History: Housing Bubble
The code in this post uses DATE values with no time components, so DATEDIFF DAY works just fine:
DATEDIFF ( DAY, first_date, last_date ) AS days,
It could be said the scoring is too lenient: Given the ignorance factor all the OK scores should really be marked as LUCKY. And maybe, all the LUCKYs should be FAIL.
No comments:
Post a Comment