Data Warehouse Subsystem
part of the ArsDigita Community System
by Philip Greenspun
Most of the real work in building a data warehouse is constructing a
dimensional data model and copying information from your online
transaction processing (OLTP) data model into the dimensional model.
This software won't help you with that. Although I hope to some day
write about this, for now I will simply refer you to Ralph Kimball's The
Data Warehouse Toolkit.
What this subsystem is designed to do is provide a reasonable user
interface to ad hoc querying of a single table. If you have multiple
tables that need to be JOINed, you could either extend this software a
bit or build a view that does the JOIN. If you have a truly large
database (gigabytes), you might find that performance isn't acceptable.
In that case, what you need to do is
- get your data into a dimensional model
- build a view that joins the fact table to the dimensions and
contains everything a user might want to group by, restrict by, or
aggregate
- see if you can't browbeat your RDBMS into optimizing queries into
this view so that it doesn't mess with tables that are irrelevant to a
particular query
Once you've gotten your data into a dimensional model (one fact table
plus a bunch of dimension tables), the data warehousing module of the
ACS can help you. It is also potentially useful if you just want to
provide ad-hoc query capabilities for a big table that happens to exist
in your production database.
The assumption is that this module will keep you from having to buy,
install, and maintain Seagate Crystal Reports (a truly painful-to-use
product).
Installation
You will almost surely want to go into /tcl/dw-defs and change
dw_table_name
to return either the right table or view for
this system or something that depends on which user is logged in.
philg@mit.edu