Saturday, February 23, 2008

Tip: Triggering an Audit Trail

Index to all the "Audit Trail" posts
February 23, 2008 Tip: Triggering an Audit Trail
February 28, 2009 Revisited: Triggering an Audit Trail
March 24, 2009 ALTER TABLE and the Audit Trail
March 18, 2013 Reporting and the Audit Trail

Let's say you want to keep a complete record of every INSERT, UPDATE and DELETE made to a particular table. Furthermore, let's say you want to store this audit trail in the database itself to make it easy to write queries.

You can use SQL Anywhere's CREATE TRIGGER feature to do all that, and with the help of several other features like INSERT WITH AUTO NAME the code becomes quite simple.

Here's some code-by-example... First, the base table t for which we want the audit trail:
CREATE TABLE t (
   pkey                    BIGINT NOT NULL DEFAULT AUTOINCREMENT,
   data1                   INTEGER NOT NULL DEFAULT 0,
   data2                   INTEGER NOT NULL DEFAULT 0,
   PRIMARY KEY ( pkey ) );
The first step is to create a "shadow table" for t containing each column from t plus several others:
CREATE TABLE t_log (
   log_id                  BIGINT NOT NULL DEFAULT AUTOINCREMENT,
   -- start of columns from table t...
   pkey                    BIGINT,
   data1                   INTEGER,
   data2                   INTEGER,
   -- ...end of columns from table t
   logged_action           VARCHAR ( 6 ) NOT NULL,
   logged_description      VARCHAR ( 1024 ) NOT NULL,
   logged_by               VARCHAR ( 20 ) NULL DEFAULT current user,
   PRIMARY KEY ( log_id ) );
One row will be inserted into this table each and every time a row is inserted, updated or deleted from t. Each "inserted" and "updated" row in t_log will show the new values in the corresponding t row, and eache "deleted" row in t_log will show the old values from the deleted t row.

It is possible that one row in t will generate several rows in t_log, as the same row in t is inserted, then updated, then deleted (and possible re-inserted with the same primary key). For that reason, while the primary key value can be copied from t to t_log, it can't be used as the primary key of t_log. A different, autoincrement, column "log_id" is added for that.

In fact, the t_log table doesn't need any of the constraints or default values that are defined in t. Some of them wouldn't even work; not just the original PRIMARY KEY clause (discussed above), but FOREIGN KEY constraints would also cause problems. For example, an "inserted" t_log row for a child table might cause a referential integrity error if a foreign key column pointed to a parent table row that was deleted.

Some other columns are added to the end of t_log; these are all optional except maybe the first one:

"logged_action" is filled in with 'INSERT', 'UPDATE' or 'DELETE'. This column is optional if you're only going to audit one kind of operation, e.g., just 'UPDATE', but it's probably necessary if you are going to audit more than one kind.

"logged_description" is filled in with a string passed from the application.

"logged_by" is automatically filled in with the user id of the connection making the change.

"logged_at" is automatically filled in with the date/time the change was made.

The optional logged_description column depends on the application being modified to execute a SQL Anywhere CREATE VARIABLE statement on the connection making the changes, and then executing SQL Anywhere SET statements to fill in different descriptions as the application makes different kinds of changes. This "connection-level variable" feature is unique to SQL Anywhere, and it lets you pass data to triggers as if they were procedure calls with parameters.

Now comes the audit trail trigger:
CREATE TRIGGER log_t
   AFTER INSERT, UPDATE, DELETE ON t
   REFERENCING OLD AS old_t NEW AS new_t
   FOR EACH STATEMENT
BEGIN

   IF VAREXISTS ( '@logged_description' ) = 0 THEN 
      CREATE VARIABLE @logged_description VARCHAR ( 1024 );
      SET @logged_description = '';
   END IF;

   CASE

      WHEN INSERTING THEN
         INSERT t_log
         WITH AUTO NAME
         SELECT *,
               'INSERT'            AS logged_action,
               @logged_description AS logged_description
           FROM new_t;

      WHEN UPDATING THEN
         INSERT t_log
         WITH AUTO NAME
         SELECT *,
               'UPDATE'            AS logged_action,
               @logged_description AS logged_description
           FROM new_t;

      WHEN DELETING THEN
         INSERT t_log
         WITH AUTO NAME
         SELECT *,
               'DELETE'            AS logged_action,
               @logged_description AS logged_description
           FROM old_t;

   END CASE;

END; -- log_t
If you want to audit more than one table, you will need to create separate shadow "t_log" tables and separate "log_t" triggers, but it's not as hard as it sounds. In particular, only the table names "t" and "t_log" and trigger name "log_t" need to be changed to create different triggers. The unique SQL Anywhere feature INSERT WITH AUTO NAME makes it unnecessary to code any of the column names from the original table t.

In fact, it's easy to write code that will automatically generate the CREATE TRIGGER statements; that technique might be the subject of a future Tip :)

The code in the trigger exploits the following features:

1. AFTER UPDATE instead of BEFORE UPDATE so feature 2 can be used.

2. FOR EACH STATEMENT instead of FOR EACH ROW so the REFERENCING names can be referred to as tables in FROM new_t and FROM old_t clauses.

3. IF VAREXISTS to determine if CREATE VARIABLE @logged_description needs to be executed (in case the application didn't do it).

4. CASE WHEN INSERTING / UPDATING / DELETING to determine whether new_t or old_t should be used, and what the value of logged_action should be.

5. WITH AUTO NAME so the SELECT list items will be automatically matched with the INSERT table column names by name.

6. AS logged_action and AS logged_description alias names to make sure WITH AUTO NAME works for these additional column values.

7. DEFAULT values defined for t_log columns get used for columns not in the INSERT SELECT lists.

Here's a dbisql script to test everything:
CREATE VARIABLE @logged_description VARCHAR ( 1024 );

SET @logged_description = 'Test 1: INSERT 5 rows.';
INSERT t 
SELECT row_num, row_num, row_num 
  FROM RowGenerator 
 WHERE row_num <= 5; 
COMMIT;

SET @logged_description = 'Test 1: UPDATE 2 rows.';
UPDATE t
  SET data1 = data1 + 1
 WHERE pkey <= 2; 
COMMIT;

SET @logged_description = 'Test 1: DELETE 3 rows.';
DELETE t
 WHERE pkey >= 3; 
COMMIT;

SELECT * FROM t_log ORDER BY log_id;
Here's what the audit trail looks like after all those changes:




1 comment:

Breck Carter said...

A year later, and I finally notice the missing column "logged_at".

Here's what the table should look like:

CREATE TABLE t_log (
log_id BIGINT NOT NULL DEFAULT AUTOINCREMENT,
-- start of columns from table t...
-- ...end of columns from table t
logged_action VARCHAR ( 6 ) NOT NULL,
logged_description VARCHAR ( 1024 ) NOT NULL,
logged_by VARCHAR ( 20 ) NULL DEFAULT current user,
logged_at TIMESTAMP NOT NULL DEFAULT TIMESTAMP,
PRIMARY KEY ( log_id ) );