Software Engineering
database database-design relational-database postgres audit
Updated Tue, 12 Jul 2022 23:15:22 GMT

Using audit trail as time machine?


I would like to do

SELECT * TABLE t (using data from 1st of march 2012)

I already have a nice audit trail of all tables in the database. It basically makes a copy of all rows that change, storing all columns of the changed row in an hstore column. This means that we have the data available, but I am unsure of whether it is the best way to store the data for a database time machine.

To give you more context, we are creating accounting software. This means that we need to be able to recreate all reports which we have offered to the customer (they can see them on our website and get updated continuously).

What sorts of problems do you anticipate that I will run into with this approach? Is there a better approach?

Some facts

  • Each row in the main table gets edited 5 times on average.
  • The main table has 32 columns (could get reduced to only 7 that need to have an audit trail)
  • We will only ever have 1 million users on our software, each will have ~700 rows in the main table.



Solution

Here is my favorite approach:

  • Each table has a corresponding history table
  • Write stored procedures (or triggers) to make sure that all actions are logged to the history tables
  • On insert, add a row to the history table with start = now() and end = 31.12.2999
  • On update, first update the most recent history record to end = now(). Then insert a new row with start = now() and end = 31.12.2999
  • On delete, update the most recent history record to end = now().

Now you can write a point-in-time query, even with joins:

select g.groupname, p.productname, p.price 
from products_hist p, product_groups_hist g 
where p.id = g.id 
and p.start <= now() and now() < p.end
and g.start <= now() and now() < g.end




Comments (2)

  • +1 – Um, no, don't use BETWEEN, use an exclusive upper-bound. What you're doing will include 2 rows in some cases. end is best thought of conceptually as null, which it should probably be stored as (although this would require additional checks). Note that ending values are always derived values. — Dec 15, 2014 at 13:08  
  • +0 – Thanks @Clockwork-Muse, I have edited the answer to use exclusive upper-bound — Dec 15, 2014 at 17:48  


External Links

External links referenced by this document: