Monday, September 2, 2013

DATEDIFF Mistakes Case Study 3: The Help


Previously on . . . The story began with Documenting DATEDIFF and continued with two episodes about the use and abuse of DATEDIFF in this blog and in Foxhound.

Now the story turns to DATEDIFF examples published in the SQL Anywhere 16 Help.

The Scoring System

  • FAIL means the DATEDIFF usage is badly flawed; it shouldn't have been coded that way, and it should be fixed.

  • LUCKY means the DATEDIFF usage may be flawed but it doesn't matter given the data values involved.

  • OK means the DATEDIFF usage is OK given the data types involved.

The Scores


Example 1 is OK: begin_poll event
This example creates a push request for a Notifier named Notifier A. It uses a SQL statement that inserts rows into a table named PushRequest. Each row in this table represents a message to send to an address. The WHERE clause determines which push requests are inserted into the PushRequest table.

To use the ml_add_property system procedure with a SQL Anywhere consolidated database, run the following command:
ml_add_property(
    'SIS',
    'Notifier(Notifier A)',
    'begin_poll',
    'INSERT INTO PushRequest
        (gateway, mluser, subject, content)
        SELECT ''MyGateway'', DISTINCT mluser, ''sync'',
            stream_param
            FROM MLUserExtra, mluser_union, Dealer
            WHERE MLUserExtra.mluser = mluser_union.name
            AND (push_sync_status = ''waiting for request''
                OR datediff( hour, last_status_change, now() ) > 12 )
            AND ( mluser_union.publication_name is NULL
                OR mluser_union.publication_name =''FullSync'' ) 
            AND Dealer.last_modified > mluser_union.last_sync_time'
);
Let's ignore the obvious problems with this code (the "Syntax error near DISTINCT", the unknown nature of the tables in the FROM clause, and so on) and assume that "datediff( hour, last_status_change, now()" refers to a valid TIMESTAMP value in last_status_change.

Guess what? Assuming that "> 12" means "thirteen full hours or more" then DATEDIFF HOUR works OK because unlike other forms of DATEDIFF it returns the number of HOUR units between the two timestamps, not the number of HOUR boundaries.

Here's proof using two timestamps that have 13 hour boundaries between them but differ by an interval of only 12:00:00.000001 (which DATEDIFF HOUR truncates to 12):
BEGIN
   DECLARE @last_status_change TIMESTAMP DEFAULT ( '2013 09 02 00:59:59.999999' );
   DECLARE @now                TIMESTAMP DEFAULT ( '2013 09 02 13:00:00.000000' ); 
   SELECT DATEDIFF ( HOUR, @last_status_change, @now );
END;

DATEDIFF(hour,@last_status_change,@now)
12

Example 2 is a FAIL: Trigger conditions for events
Notify an administrator of a possible attempt to break into the database:
CREATE EVENT SecurityCheck
TYPE ConnectFailed
HANDLER
BEGIN
 DECLARE num_failures INT;
 DECLARE mins INT;
 INSERT INTO FailedConnections( log_time )
 VALUES ( CURRENT TIMESTAMP );

 SELECT COUNT( * ) INTO num_failures
 FROM FailedConnections
 WHERE log_time >= DATEADD( minute, -5,
  current timestamp );
 IF( num_failures >= 3 ) THEN
  SELECT DATEDIFF( minute, last_notification,
   current timestamp ) INTO mins
  FROM Notification;
  IF( mins > 30 ) THEN
   UPDATE Notification
   SET last_notification = current timestamp;
   CALL xp_sendmail( recipient='DBAdmin',
      subject='Security Check', "message"= 
       'over 3 failed connections in last 5 minutes' )
  END IF
 END IF
END;
Assuming Notification.last_notification is a TIMESTAMP column, what does "DATEDIFF( minute, last_notification, current timestamp )" return?

Answer: It returns the number of MINUTE boundaries between the two timestamp values. This usage must be marked as a FAIL because DATEDIFF MINUTE can return two different answers for exactly the same interval depending on the number of MINUTE boundaries between the two timestamps.
BEGIN
   DECLARE @last_notification TIMESTAMP DEFAULT ( '2013 09 02 00:59:59.999999' );
   DECLARE @current_timestamp TIMESTAMP DEFAULT ( '2013 09 02 01:30:00.000000' ); 
   SELECT DATEDIFF ( MINUTE, @last_notification, @current_timestamp );
END;

BEGIN
   DECLARE @last_notification TIMESTAMP DEFAULT ( '2013 09 02 01:00:00.000000' );
   DECLARE @current_timestamp TIMESTAMP DEFAULT ( '2013 09 02 01:30:00.000001' ); 
   SELECT DATEDIFF ( MINUTE, @last_notification, @current_timestamp );
END;

DATEDIFF(minute,@last_notification,@current_timestamp)
31

DATEDIFF(minute,@last_notification,@current_timestamp)
30
DATEDIFF MINUTE can also return a smaller answer for a longer interval. Here's proof; the first DATEDIFF MINUTE call returns 31 for an interval of 00:30:00.000001 while the second returns 30 for an interval of 00:30:00.999999:
BEGIN
   DECLARE @last_notification TIMESTAMP DEFAULT ( '2013 09 02 00:59:59.999999' );
   DECLARE @current_timestamp TIMESTAMP DEFAULT ( '2013 09 02 01:30:00.000000' ); 
   SELECT DATEDIFF ( MINUTE, @last_notification, @current_timestamp );
END;

BEGIN
   DECLARE @last_notification TIMESTAMP DEFAULT ( '2013 09 02 01:00:00.000000' );
   DECLARE @current_timestamp TIMESTAMP DEFAULT ( '2013 09 02 01:30:00.999999' ); 
   SELECT DATEDIFF ( MINUTE, @last_notification, @current_timestamp );
END;

DATEDIFF(minute,@last_notification,@current_timestamp)
31

DATEDIFF(minute,@last_notification,@current_timestamp)
30

Example 3 is OK: #hook_dict table
The following sample sp_hook_dbmlsync_delay procedure illustrates the use of in/out parameters in the #hook_dict table. The procedure allows synchronization only outside a scheduled down time of the MobiLink system between 18:00 and 19:00.
CREATE PROCEDURE sp_hook_dbmlsync_delay()
BEGIN
   DECLARE delay_val integer;
 SET delay_val=DATEDIFF(
   second, CURRENT TIME, '19:00');
 IF (delay_val>0 AND
     delay_val<3600)
 THEN
 UPDATE #hook_dict SET value=delay_val
   WHERE name='delay duration';
 END IF;
END
This usage scores OK for two reasons:
  • The code's more precise than the specs so an error doesn't matter; i.e., the specifications refer to hours "between 18:00 and 19:00" while the code calculates seconds "DATEDIFF( second, CURRENT TIME, '19:00')".

  • ...and besides, DATEDIFF SECOND returns the number of full seconds between two timestamps rather than the number of second boundaries; i.e., it's the other DATEDIFF rarity along with with DATEDIFF HOUR.

Example 4 is OK: sp_hook_dbmlsync_abort
The following procedure prevents synchronization during a scheduled maintenance hour between 19:00 and 20:00 each day.
CREATE PROCEDURE sp_hook_dbmlsync_abort()
BEGIN
  DECLARE down_time_start TIME;
  DECLARE is_down_time VARCHAR(128);
  SET down_time_start='19:00';
  IF datediff( hour,down_time_start,now(*) ) < 1
  THEN
    set is_down_time='true';
  ELSE
    SET is_down_time='false';
  END IF;
  UPDATE #hook_dict
  SET value = is_down_time
  WHERE name = 'abort synchronization'
END;
This one looks like a FAIL until you realize that down_time_start contains a constant value '19:00:00.000000'. That means it doesn't matter whether DATEDIFF HOUR is calculated using hours (it's not) or hour boundaries (it is), it's all the same.

In other words, this usage is OK.


Example 5 is OK: sa_performance_diagnostics system procedure
You can execute the following query to identify connections that have spent a long time waiting for database server requests to complete.
SELECT  Number, Name,
      CAST( DATEDIFF( second, LoginTime, CURRENT TIMESTAMP ) AS DOUBLE ) AS T,
      IF T <> 0 THEN (ReqTimeActive / T) ELSE NULL ENDIF AS PercentActive
FROM  sa_performance_diagnostics()
WHERE T > 0 AND PercentActive > 10.0
ORDER BY PercentActive DESC;
This code's OK for the same reasons Example 3 is OK: For the connections of interest the difference between LoginTime and CURRENT TIMESTAMP grows very large very quickly compared with the size of any error introduced by DATEDIFF, and besides, DATEDIFF SECOND is calculated in seconds rather than second boundaries.
Example 6 is OK: sa_performance_diagnostics system procedure
Find all requests that are currently executing, and have been executing for more than 60 seconds:
SELECT  Number, Name,
      CAST( DATEDIFF( second, LastReqTime, CURRENT TIMESTAMP ) AS DOUBLE ) AS ReqTime
FROM  sa_performance_diagnostics()
WHERE ReqStatus <> 'IDLE' AND ReqTime > 60.0
ORDER BY ReqTime DESC;
Same thing again, same as Examples 3 and 5: This code is OK. A purist might use DATEDIFF MILLISECOND (Foxhound does) but with a predicate like "ReqTime > 60.0" it's not really necessary.
Example 7 is a FAIL: Text index refresh types
You can define your own strategy for refreshing MANUAL REFRESH text indexes. In the following example, all MANUAL REFRESH text indexes are refreshed using a refresh interval that is passed as an argument, and rules that are similar to those used for AUTO REFRESH text indexes.
CREATE PROCEDURE refresh_manual_text_indexes( 
   refresh_interval UNSIGNED INT )
BEGIN
 FOR lp1 AS c1 CURSOR FOR
   SELECT ts.*
   FROM SYS.SYSTEXTIDX ti JOIN sa_text_index_stats( ) ts
   ON ( ts.index_id = ti.index_id )
   WHERE ti.refresh_type = 1 -- manual refresh indexes only
 DO
   BEGIN
    IF last_refresh_utc IS null 
    OR cast(pending_length as float) / (
       IF doc_length=0 THEN NULL ELSE doc_length ENDIF) > 0.2
    OR DATEDIFF( MINUTE, CURRENT UTC TIMESTAMP, last_refresh_utc )
       > refresh_interval THEN
     EXECUTE IMMEDIATE 'REFRESH TEXT INDEX ' || text-index-name || ' ON "'
     || table-owner || '"."' || table-name || '"';
    END IF;
   END;
  END FOR;
END;
Let's ignore all the issues with this code and assume the DATEDIFF call was really written like this:
DATEDIFF( MINUTE, last_refresh, CURRENT TIMESTAMP ) > refresh_interval 
This code is a FAIL for the same reasons as the code in Example 2:
  • DATEDIFF MINUTE can return two different answers for exactly the same interval depending on the number of MINUTE boundaries between the two timestamps, and

  • DATEDIFF MINUTE can also return a smaller answer for a longer interval.
The error (less than one minute) might not be important for large values of refresh_interval, but for small values (like one minute) the behavior will be unpredictable.

The Final Score: 5 OK, 2 FAIL

Some might argue one or two OK scores were really just LUCKY, but it's the two FAILs that prove the point: DATEDIFF is difficult.


No comments: