Thursday, April 30, 2009

Eight Guidelines for Low-Risk Enterprise Data Warehousing


Ralph Kimball published in Intelligent Enterprise a very nice article entitled Eight Guidelines for Low-Risk Enterprise Data Warehousing, where he make recommendations for controlling project costs and reducing risks in Enterprise Data Warehousing initiatives.

He said that in today's economic climate, business intelligence (BI) faces two powerful and conflicting pressures. The business users want more focused insight from their BI tools into customer satisfaction and profitability and, these same users are under huge pressure to control costs and reduce risks.

The Eight Guidelines for Low-Risk Enterprise Data Warehousing are:

1 - Work on the Right Thing

He recommends a simple technique for deciding what the right thing is. Make a list of all your potential EDW/BI projects and place them on a simple 2x2 grid, considering the business impact and the feasibility.

Figure out, with your end users, how valuable each of the potential projects would be, independent of the feasibility. Next, do an honest assessment of whether each project has high-quality data and how difficult it will be to build the data delivery pipelines from the source to the BI tool. Remember that at least 70 percent of BI project risks and delays come from problems with the data sources and meeting data delivery freshness (latency) requirements.

2 - Give Business Users Control

The transfer of control means having users directly involved with, and responsible for, each EDW/BI project. Obviously these users have to learn how to work with IT so as to make reasonable demands.

3 - Proceed Incrementally

In this era of financial uncertainty, it's hard to justify a classic "waterfall" approach to EDW/BI development. In the waterfall approach, a written functional specification is created that completely specifies the sources, the final deliverables and the detailed implementation. The rest of the project implements this specification, often with a big-bang comprehensive release.

Many EDW/BI projects are gravitating to what could be called an "agile" approach that emphasizes frequent releases and mid-course corrections. Interestingly, a fundamental tenet of the agile approach is ownership by the business users, not by technical developers.

An agile approach requires tolerating some code rewriting and not depending on fixed-price contracts. The agile approach can successfully be adapted to enterprisewide projects such as master data management and enterprise integration.

4 - Start with Lightweight, Focused Governance

Governance is recognizing the value of your data assets and managing those assets responsibly. Governance is not something that is tacked onto the end of an EDW/BI project. Governance is part of a larger culture that recognizes the value of your data assets and is supported and driven by senior executives.

5 - Build a Simple, Universal Platform

One thing is certain in the BI space: the nature of the end-user-facing BI tools cannot be predicted. we must recognize that the enterprise data warehouse is the single platform for all forms of business intelligence. This viewpoint makes us realize that the EDW's interface to all forms of BI must be agnostic, simple and universal.

Dimensional modeling meets these goals as the interface to all forms of BI. Dimensional schemas contain all possible data relationships, but at the same time can be processed efficiently with simple SQL emitted by any BI tool.

6 - Integrate Using Conformed Dimension

Enterprisewide integration has risen to the top of the list of EDW/BI technical drivers along with data quality and data latency. Dimensional modeling provides a simple set of procedures for achieving integration that can be effectively used by BI tools. Conformed dimensions enable BI tools to drill across multiple subject areas, assembling a final integrated report. The key insight is that the entire dimension (customer, for example) does not need to be made identical across all subject areas. The minimum requirement for a drill-across report is that at least one field be common across multiple subject areas. Thus, the EDW can define a master enterprise dimension containing a small but growing number of conformed fields. These fields can be added incrementally over time. In this way, we reduce the risk and cost of enterprise integration at the BI interface. This approach also fits well with our recommendation to develop the EDW/BI system incrementally.

7 - Manage Quality a Few Screens at a Time

In our articles and books, Kimball Group has described an effective approach to managing data quality by placing data quality screens throughout the data pipelines leading from the sources to the targets. Each data quality screen is a test. When the test fails or finds a suspected data quality violation, the screen writes a record in an error event fact table -- a dimensional schema hidden in the back room away from direct access by end users.

The data quality screens can be implemented one at a time, allowing development of the data quality system to grow incrementally.

8 - Use Surrogate Keys Throughout

Make sure to build all your dimensions (even Type 1 Dimensions) with surrogate primary keys. This insulates you from surprises downstream when you acquire a new division that has its own ideas about keys. What's more, all your databases will run faster with surrogate keys.

No comments: