6.8 Minimizing Audit-Log Space Consumption
6.8.1 Problem
You are going to implement
an audit log, but you need a way to
minimize space consumption so that the resulting log uses disk space
as efficiently as possible. It's still important to
be able to construct an accurate snapshot of the base table as it
appeared at any give point in time.
6.8.2 Solution
The log recipe shown earlier in this chapter implemented a log that
contained a complete snapshot of a row for every operation performed
on that row. While simple to implement and to query, the result is a
log that contains a great deal of redundant information where update
and delete operations are concerned. To minimize the space used by a
row-level log, you can design it such that a column value is recorded
only when it is changed. Thus, a change to one column in a row does
not cause an entirely new copy of that row to be written to the log.
Instead, only the before and after versions of the modified column
are recorded in the log. Similarly, with deletes, only the delete
action needs to be recorded.
For purposes of this recipe, assume that the Stock table to be
audited can be created using the following statement:
CREATE TABLE Stock(
ProductId CHAR(40),
Qty INTEGER,
Price INTEGER,
PRIMARY KEY(ProductId)
)
Given this table, the solution requires that a log table is created
and that triggers are created to record changes to the data in the
Stock table's audit log.
6.8.2.1 Step 1: Create the audit table
Create an audit table that stores the audit data for the base table.
The following statement creates a table named StockLog that can be
used to log changes made to the table named Stock:
CREATE TABLE StockLog(
ProductId CHAR(40) NOT NULL,
Qty INTEGER NULL,
Price INTEGER NULL,
EventTime DATETIME NOT NULL,
DeleteFlag CHAR(1) NULL,
LogId INTEGER IDENTITY
)
As you can see, the audit table has a similar structure to the Stock
table's, with the following exceptions:
One additional column is included to store the timestamp (the date
and time of the event).
One column is included to represent DELETE events.
LogId column is added to uniquely identify events and support
sequencing order between them.
All columns, except for the timestamp and primary-key columns, are
explicitly allowed to be null.
6.8.2.2 Step 2: Implement the audit triggers
This set of triggers records all insert, update, and
delete
activity in the StockLog table, but the approach is slightly
different from the previous audit-log recipe in this chapter. This
time, column values are not recorded for deletes, and only modified
columns are recorded with update log entries. Pay special attention
to the CASE statements in the updateStock trigger as you read the
code.
CREATE TRIGGER insertStock
ON Stock
FOR INSERT
AS
BEGIN
INSERT INTO StockLog
(ProductId, Qty, Price, DeleteFlag, EventTime)
SELECT i.ProductId, i.Qty, i.Price, null, CURRENT_TIMESTAMP
FROM inserted I
END
GO
CREATE TRIGGER deleteStock
ON Stock
FOR DELETE
AS
BEGIN
INSERT INTO StockLog
(ProductId, Qty, Price, DeleteFlag, EventTime)
SELECT d.ProductId, null, null, 'D', CURRENT_TIMESTAMP
FROM deleted d
END
GO
CREATE TRIGGER updateStock
ON Stock
FOR UPDATE
AS
BEGIN
INSERT INTO stockLog
(ProductId, Qty, Price, DeleteFlag, EventTime)
SELECT d.productId,
(CASE WHEN d.Qty!=i.Qty THEN i.Qty ELSE NULL END),
(CASE WHEN d.Price!=i.Price THEN i.Price ELSE NULL END),
null, CURRENT_TIMESTAMP
FROM deleted d,inserted I
WHERE d.ProductId=i.ProductId
END
GO
6.8.3 Discussion
Space can often become an issue if all details are to be stored in a
log. This technique demonstrates how to store the information with
minimum space consumption.
The concept underlying the solution shown here is that column values
are only recorded in the log table when they change. An INSERT
operation is the only operation where the log record will contain a
complete copy of a row. For UPDATE operations, the log will only
contain the new values for the columns that were changed and the
value of the row's primary key. All columns
not changed by an UPDATE
statement are left set to NULL in the log record to avoid storing
data redundantly. In this way, space consumption is minimized.
Deletes are a bit different. Each log record has a delete flag in its
descriptor. When a row is deleted, the delete flag is set in the
corresponding log's record.
Consider, for example, the following DML statements:
INSERT INTO Stock(ProductID, Qty, Price)
VALUES ('Bananas', 10, 112)
INSERT INTO Stock(ProductID, Qty, Price)
VALUES ('Apples', 20, 223)
UPDATE Stock
SET Qty=25
WHERE ProductID='Apples'
UPDATE Stock
SET Qty=30
WHERE ProductId='Apples'
DELETE FROM Stock
WHERE ProductId='Bananas'
The audit-log entries generated by these statements would be as
follows:
SELECT LogId, ProductId, Qty, Price, DeleteFlag, EventTime
FROM stockLog
GO
LogId ProductId Qty Price DeleteFlag EventTime
------ ---------- ----- ------ ---------- ------------------------
1 Bananas 10 112 NULL 2000-06-27 23:01:25.943
2 Apples 20 223 NULL 2000-06-27 23:01:25.950
3 Apples 25 NULL NULL 2000-06-27 23:01:25.970
4 Apples 30 NULL NULL 2000-06-27 23:01:25.970
5 Bananas NULL NULL D 2000-06-27 23:01:25.993
The first two DML statements were INSERT statements, and,
consequently, the log entries each contain a complete copy of the
data being inserted. The next two DML statements, however, were
UPDATE statements. Both of these updated only one column, and the log
entries reflect that fact by setting the other columns to NULL. The
final log entry represents a deletion of the Bananas record. The
D in the DeleteFlag column indicates that the
record was deleted. No other information need be recorded;
consequently, both the Qty and Price columns have been set to NULL.
Most database systems optimize the storage of NULLS such that the
amount of space used for that purpose is negligible.
6.8.3.1 Snapshot generation
Generating a snapshot of
the
Stock table is more difficult when the log records changes to a
single column than it is when each log record contains a complete
copy of the row being changed. In the case of the audit log described
in this recipe, you can use the following SELECTstatement to generate a snapshot of the base table as of a
specific time. The @TIME variable in the example represents the
"as of" time used for the snapshot
generation. For testing purposes, you can replace it with
CURRENT_TIMESTAMP to return the current snapshot.
SELECT t.ProductId,
Qty=(SELECT TOP 1 tQ.Qty FROM StockLog tQ
WHERE tQ.Qty IS NOT NULL
AND tQ.ProductId=t.ProductId
AND tQ.EventTime <= @TIME
ORDER BY tQ.LogId DESC),
Price=(SELECT TOP 1 tP.Price FROM StockLog tP
WHERE tP.Price IS NOT NULL
AND tP.ProductId=t.ProductId
AND tP.EventTime <= @TIME
ORDER BY tP.LogId DESC)
FROM StockLog t
WHERE t.EventTime<=@TIME
GROUP BY t.ProductId
HAVING NOT EXISTS(SELECT * FROM StockLog
WHERE productID=t.productID
AND logID=MAX(t.logID) AND DeleteFlag='D')
This SELECT statement contains a number of subqueries: one for each
column in the Stock table to retrieve the most recent values for
those columns and one at the end to eliminate all deleted rows from
the report. Each subquery retrieves the most recent value of a column
with respect to the date specified by the @TIME variable. The one
exception is that the primary key column does not require a subquery
because we are assuming that the primary key does not change.
To get the most recent value for a column, we execute a subquery. The
subquery ignores rows where the column is NULL. The subquery sorts
the rows in descending order by LogId and returns the topmost value,
which is the most recent value of the column. The subquery respects
the time you pass (@TIME) and will ignore values set later than that
time. The following example shows the subquery used to return the
most recent value for column Qty:
SELECT TOP 1 tQ.Qty FROM StockLog tQ
WHERE tQ.Qty IS NOT NULL
AND tQ.ProductId=t.ProductId
AND tQ.EventTime <= @TIME
ORDER BY tQ.LogId DESC
This same subquery is used for each column in the table, except for
the columns in the record descriptor.
The WHERE clause in the main query limits the scope of the selection
to the rows that were inserted before or on the date and at the time
represented by @TIME. The ProductId column is then used to group the
result set. Grouping is done because we are interested only in one
instance (a snapshot) of each row in the original table.
6.8.3.2 Deleted records
The HAVING clause in the
query eliminates Stock records that have
been deleted, so that they don't appear in the
snapshot. The HAVING clause looks like this:
HAVING NOT EXISTS(SELECT * FROM StockLog
WHERE productID=t.productID
AND logID=MAX(t.logID) AND DeleteFlag='D')
The subquery in the HAVING clause checks to see if the most recent
log entry for a Stock record represents a delete. If it does, then
that Stock record is not returned as part of the snapshot.
6.8.3.3 Dealing with NULL values
If the Stock table includes some columns that are allowed to be NULL, the
solution shown in this recipe will not work properly.
That's because the query to generate a snapshot has
no way to know whether a NULL value in a particular column represents
a NULL column value or if it just means that there has been no change
for that column.
One possible way to deal with NULL values is to use a second column
as a flag to indicate whether the first is a null. This flag column
can be a BIT type if the database implementation allows it, or it can
be a character type such as a CHAR(1). A BIT
type allows for TRUE and FALSE values. The StockLog table is extended
with the columns representing the NULL values:
CREATE TABLE StockLog(
ProductId char(40) NOT NULL,
Qty INTEGER NULL,
Price INTEGER NULL,
IsEmptyPrice BIT,
IsEmptyQty BIT,
EventTime DATETIME NOT NULL,
DeleteFlag CHAR(1) NULL,
LogId INTEGER IDENTITY
)
Assuming that you implemented a flag column named IsEmptyQty and that
you used 1 to represent a NULL value, and 0 otherwise, the subquery
in the snapshot query would need to use a WHERE clause, such as the
one shown here:
SELECT t.ProductId,
Qty=(SELECT TOP 1 tQ.Qty FROM StockLog tQ
WHERE tQ.IsEmptyQty=0
AND tQ.ProductId=t.ProductId
AND tQ.EventTime <= @TIME
ORDER BY tQ.LogId DESC ),
Price=(SELECT TOP 1 tP.Price FROM StockLog tP
WHERE tP.IsEmptyPrice=0
AND tP.ProductId=t.ProductId
AND tP.EventTime <= @TIME
ORDER BY tp.LogId DESC)
FROM StockLog t
WHERE EventTime<=@TIME
GROUP BY t.ProductId
HAVING NOT EXISTS(SELECT * FROM StockLog
WHERE productID=t.productID
AND logID=MAX(t.logID) AND DeleteFlag='D')
This solution requires only minimal changes to the SELECT statement
used to generate snapshots. It also doesn't consume
much space in an implementation where the DBMS supports the BIT
datatype. Sybase and MS SQL Server both support BIT types. The
storage of such BIT types is optimized by combining all the BIT
columns in a row together into as few bytes as possible. If there are
8 bit columns, they will consume just 1 byte per row.
|