Index to all the "Audit Trail" posts
|
One of the most popular articles on this blog was published in February 2008:
Tip: Triggering an Audit Trail.
Let's say you want to keep a complete record of every INSERT, UPDATE and DELETE made to a particular table. Furthermore, let's say you want to store this audit trail in the database itself to make it easy to write queries.
You can use SQL Anywhere's CREATE TRIGGER feature to do all that, and with the help of several other features like INSERT WITH AUTO NAME the code becomes quite simple.
[ ...more ]
A year later, after some real-world experience, the code was rewritten and the article republished:
Revisited: Triggering an Audit Trail.
And shortly after that, the question "How do I add a column to a table being audited?" was answered in
ALTER TABLE and the Audit Trail.
Now it's four years since the original article and the technique is still working, still answering those "What the heck happened?" questions that crop up from time to time.
Not just bug hunting, however; the audit trail can answer questions like "What exactly was in the inventory back on December 31, 2012? The accountants are asking..."
- "Didn't you remember to take a snapshot of the inventory at midnight on December the 31st?"
...nope, forgot
- "Didn't you build something into the application to record the year-end inventory?"
...nope, that didn't happen either
- "Can't you restore the last backup taken on December 31 and run a query?"
...nope, don't keep them around that long
What
can be done, and it requires no more effort (or planning) than to write a SELECT, is to use the data in the audit trail to compute the state of the database at year end.
Here's what the tables look like, first the table holding the current inventory, and then the shadow table holding 17,000 before-and-after-images of rows in the item table:
CREATE TABLE item ( -- 873 rows, 3.7M total = 152k table + 3.5M ext + 32k index, 4,401 bytes per row
sku VARCHAR ( 20 ) NOT NULL,
sku_integer_suffix UNSIGNED INT NOT NULL
COMPUTE ( integer_suffix(sku) ),
show_on_website VARCHAR ( 1 ) NOT NULL DEFAULT 'N'
CONSTRAINT ASA90 CHECK (
show_on_website in( 'Y','N' ) ),
usd_price DECIMAL ( 11, 2 ) NOT NULL DEFAULT 0.0,
usd_shipping DECIMAL ( 11, 2 ) NOT NULL DEFAULT 0.0,
cad_shipping DECIMAL ( 11, 2 ) NOT NULL DEFAULT 0.0,
featured VARCHAR ( 1 ) NOT NULL DEFAULT 'N'
CONSTRAINT ASA91 CHECK (
featured in( 'Y','N' ) ),
stock INTEGER NOT NULL DEFAULT 1,
main_category VARCHAR ( 50 ) NOT NULL,
updated_at TIMESTAMP NOT NULL DEFAULT timestamp,
title VARCHAR ( 100 ) NOT NULL DEFAULT '',
text_description LONG VARCHAR NOT NULL DEFAULT '',
active VARCHAR ( 1 ) INLINE 1 PREFIX 1 NOT NULL DEFAULT 'Y'
CONSTRAINT ASA92 CHECK (
active in( 'Y','N' ) ),
display_order BIGINT NOT NULL DEFAULT 0,
promotional_message LONG VARCHAR INLINE 256 PREFIX 8 NOT NULL DEFAULT '',
promotional_message_at TIMESTAMP NOT NULL DEFAULT '1900-01-01',
CONSTRAINT ASA93 PRIMARY KEY ( -- 16k
sku )
);
CREATE TABLE logged_item ( -- 17,073 rows, 20.5M total = 3.3M table + 17M ext + 160k index, 1,260 bytes per row
log_id UNSIGNED BIGINT NOT NULL DEFAULT autoincrement,
logged_action VARCHAR ( 50 ) NOT NULL
CONSTRAINT ASA94 CHECK ( logged_action in(
'after INSERT',
'before UPDATE',
'after UPDATE',
'before DELETE' ) ),
logged_at TIMESTAMP NOT NULL DEFAULT timestamp,
sku VARCHAR ( 20 ) NULL,
sku_integer_suffix UNSIGNED INT NULL,
show_on_website VARCHAR ( 1 ) NULL,
usd_price DECIMAL ( 11, 2 ) NULL,
usd_shipping DECIMAL ( 11, 2 ) NULL,
cad_shipping DECIMAL ( 11, 2 ) NULL,
featured VARCHAR ( 1 ) NULL,
stock INTEGER NULL,
main_category VARCHAR ( 50 ) NULL,
updated_at TIMESTAMP NULL,
title VARCHAR ( 100 ) NULL,
text_description LONG VARCHAR NULL,
active VARCHAR ( 1 ) NULL,
display_order BIGINT NULL,
promotional_message LONG VARCHAR NULL,
promotional_message_at TIMESTAMP NULL,
CONSTRAINT ASA95 PRIMARY KEY ( -- 160k
log_id )
);
Because of
the way the audit trail triggers work, the rows in logged_item with the same sku (stock keeping unit) value can be ordered on log_id to give the complete chronological history of that sku...
...and the last row inserted before January 1, 2013 tells you the state of that sku in the year-end inventory.
A perfect application for the LAST_VALUE() function and the WINDOW clause, right?
SELECT LAST_VALUE ( log_id ) OVER sku_window AS last_log_id
FROM logged_item
WHERE logged_at < '2013-01-01'
WINDOW sku_window AS ( PARTITION BY sku
ORDER BY log_id )
ORDER BY last_log_id;
- The WHERE clause grabs everything in logged_item up to midnight on December 31, 2012,
- the PARTITION BY sku clause creates a separate partition in the window for each sku,
- the inner ORDER BY log_id clause sorts the partition so the LAST_VALUE ( log_id ) function will return the last row in the chronological history of each partition,
- the "LAST_VALUE ( log_id ) OVER sku_window AS last_log_id" select-list entry computes the last log_id for each sku and gives it an alias name "last_log_id", and
- the outer ORDER BY last_log_id clause sorts the final result set.
Alas, the results are worse-than-useless, just the numbers 1, 2, 3:
last_log_id
-----------
1
2
3
4
5
6
7
8
9
10
11
12
...
16804
16805
16806
Heck, I can do that by calling sa_rowgenerator!
It turns out that LAST_VALUE() needs the WINDOW clause to have a full-tilt-boogie RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING specification (the docs say that's the default but reality intrudes):
SELECT LAST_VALUE ( log_id ) OVER sku_window AS last_log_id
FROM logged_item
WHERE logged_at < '2013-01-01'
WINDOW sku_window AS ( PARTITION BY sku
ORDER BY log_id
RANGE BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING )
ORDER BY last_log_id;
The result isn't much better, still WAY too many rows (16,806), but at least they look different:
last_log_id
-----------
297
297
297
369
369
369
1084
1084
1084
1164
1164
1164
...
16806
16806
16806
At this point, one must have faith: the WINDOW clause works, and so does LAST_VALUE(), and they aren't just powerful, they are fast too!
The problem here is it's returning one row for every row in logged_item, and those rows repeat the LAST_VALUE() for each partition. That's the way partitions work, it's something you (I) must get used to.
...unlike the RANGE nonsense, which may forever remain a mystery. Like waving a dead chicken over the keyboard, if your WINDOW query doesn't work try RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING!
What's the obvious solution to duplicate rows?
Why, the DISTINCT keyword, of course!
...one of the most loved, most used, most mis-used dead chickens in the SQL toolbox!
Here it is, solving the repeating-LAST_VALUE problem:
SELECT DISTINCT LAST_VALUE ( log_id ) OVER sku_window AS last_log_id
FROM logged_item
WHERE logged_at < '2013-01-01'
WINDOW sku_window AS ( PARTITION BY sku
ORDER BY log_id
RANGE BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING )
ORDER BY last_log_id;
Now instead of 16,806 rows there are only 856, one row per sku:
last_log_id
-----------
297
369
1084
1164
...
16806
Each row in that result set points to a single row in logged_item, and that row shows the last action before year end for the corresponding item row:
- logged_action = 'after INSERT' for an item that was inserted before year-end,
- logged_action = 'after UPDATE' for an item that was updated before year-end, and
- logged_action = 'before DELETE' for an item that was deleted before year-end.
There are no entries where logged_action = 'before UPDATE' because that action is always followed by an 'after UPDATE' with the same logged_at timestamp (once again,
that's the way the audit trail triggers work).
In other words, that query embodies the year-end snapshot that accounting wants, and it can be made useful as a view:
CREATE VIEW year_end_2012_logged_item AS
( SELECT DISTINCT LAST_VALUE ( log_id ) OVER sku_window AS last_log_id
FROM logged_item
WHERE logged_at < '2013-01-01'
WINDOW sku_window AS ( PARTITION BY sku
ORDER BY log_id
RANGE BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING ) );
Here's an example of how that view was used to summarize the year-end inventory by category:
SELECT logged_item.main_category AS Category,
COUNT(*) AS SKU_count,
SUM ( logged_item.stock ) AS item_count,
SUM ( logged_item.stock * logged_item.usd_price ) AS inventory_value
FROM logged_item
INNER JOIN year_end_2012_logged_item
ON year_end_2012_logged_item.last_log_id = logged_item.log_id
WHERE logged_item.logged_action IN ( 'after INSERT', 'after UPDATE' )
AND logged_item.show_on_website = 'Y'
AND logged_item.main_category <> 'BOOKS'
GROUP BY logged_item.main_category
ORDER BY logged_item.main_category;
Category SKU_count item_count inventory_value
-------------------- ----------- ----------- ---------------
COLLECTIBLES 18 18 4409.00
GIFTWARE 19 25 663.00
LINENS & TEXTILES 48 48 1804.00
POTTERY & GLASS 61 152 5566.00
SILVERWARE 106 111 4342.00
- The INNER JOIN uses the year_end_2012_logged_item view to identify which logged_item rows form the year-end snapshot,
- the WHERE eliminates the deletions, and applies two business-related predicates (include active items, exclude books), and
- the GROUP BY enables the COUNT() and SUM() calculations.
It's worth noting that the item table doesn't take part in any of these queries, nor can it because it represents the current state of inventory and accounting only cares about year end.
Other queries can be written, to show detail or summaries, by using this template:
SELECT [whatever columns and/or aggregate function calls you want]
FROM logged_item
INNER JOIN year_end_2012_logged_item
ON year_end_2012_logged_item.last_log_id = logged_item.log_id
WHERE logged_item.logged_action IN ( 'after INSERT', 'after UPDATE' )
AND [whatever other predicates you want]
[plus whatever GROUP BY and ORDER BY clauses you might need]
Suddenly, the audit trail tables and triggers are justified not only because of safety and security, but because they provide significant reporting support with zero extra effort.