Tuesday, June 30, 2009

The 10 Essential Rules of Dimensional Modeling


Margy Ross, president of the Kimball Group, recently published in Intelligent Enterprise, a nice article where she mentioned and defined what she considers the 10 Essential Rules of Dimensional Modeling:

1 - Load detailed atomic data into dimensional structures

Dimensional models should be populated with bedrock atomic details to support the unpredictable filtering and grouping required by business user queries.

2 - Structure dimensional models around business processes

Business processes are the activities performed by your organization; they represent measurement events, like taking an order or billing a customer. Business processes typically capture or generate unique performance metrics associated with each event. These metrics translate into facts, with each business process represented by a single atomic fact table.

3 - Ensure that every fact table has an associated date dimension table

The measurement events described in Rule #2 always have a date stamp of some variety associated with them, whether it's a monthly balance snapshot or a monetary transfer captured to the hundredth of a second. Every fact table should have at least one foreign key to an associated date dimension table, whose grain is a single day, with calendar attributes and nonstandard characteristics about the measurement event date, such as the fiscal month and corporate holiday indicator. Sometimes multiple date foreign keys are represented in a fact table.

4 - Ensure that all facts in a single fact table are at the same grain or level of detail

There are three fundamental grains to categorize all fact tables: transactional, periodic snapshot, or accumulating snapshot. Regardless of its grain type, every measurement within a fact table must be at the exact same level of detail.

5 - Resolve many-to-many relationships in fact tables

Since a fact table stores the results of a business process event, there's inherently a many-to-many (M:M) relationship between its foreign keys, such as multiple products being sold in multiple stores on multiple days. These foreign key fields should never be null.

6 - Resolve many-to-one relationships in dimension tables

Hierarchical, fixed-depth many-to-one (M:1) relationships between attributes are typically denormalized or collapsed into a flattened dimension table. If you've spent most of your career designing entity-relationship models for transaction processing systems, you'll need to resist your instinctive tendency to normalize or snowflake a M:1 relationship into smaller subdimensions; dimension denormalization is the name of the game in dimensional modeling.

7 - Store report labels and filter domain values in dimension tables

The codes and, more importantly, associated decodes and descriptors used for labeling and query filtering should be captured in dimension tables.

8 - Make certain that dimension tables use a surrogate key

Meaningless, sequentially assigned surrogate keys (except for the date dimension, where chronologically assigned and even more meaningful keys are acceptable) deliver a number of operational benefits, including smaller keys which mean smaller fact tables, smaller indexes, and improved performance. Surrogate keys are absolutely required if you're tracking dimension attribute changes with a new dimension record for each profile change.

9 - Create conformed dimensions to integrate data across the enterprise

Conformed dimensions (otherwise known as common, master, standard or reference dimensions) are essential for enterprise data warehousing. Managed once in the ETL system and then reused across multiple fact tables, conformed dimensions deliver consistent descriptive attributes across dimensional models and support the ability to drill across and integrate data from multiple business processes. The Enterprise Data Warehouse Bus Matrix is the key architecture blueprint for representing the organization's core business processes and associated dimensionality.

10 - Continuously balance requirements and realities to deliver a DW/BI solution that's accepted by business users and that supports their decision-making

Dimensional modelers must constantly straddle business user requirements along with the underlying realities of the associated source data to deliver a design that can be implemented and that, more importantly, stands a reasonable chance of business adoption. The requirements-versus-realities balancing act is a fact of life for DW/BI practitioners, whether you're focused on the dimensional model, project strategy, technical/ETL/BI architectures or deployment/maintenance plan.

Margy Ross wrote a nice article, compiling and defining the main rules that a professional needs when are designing a model using the concepts of Dimensional Modeling.

No comments: