Friday, August 14, 2009

Collaborative Data Integration


What Works is a publication with an interesting content, provided by The Data Warehousing Institute(TDWI). In the Volume 27 - August 2009, Philip Russom, Senior Manager of TDWI Research, published a good article entitled Collaborative Data Integration.

He started the article with the definition of TDWI Research for collaborative data integration: A collection of user best practices, software tool functions, and cross-functional project workflows that foster collaboration among the growing number of technical and business people involved in data integration projects and initiatives.

According the article, several trends are driving up the requirements for collaboration in data integration projects:
- Data integration specialists are growing in number
- Data integration specialists are expanding their work beyond data warehousing
- Data integration work is increasingly dispersed geographically
- Data integration is now better coordinated with other data management disciplines
- More business people are getting their hands on data integration
- Data governance and other forms of oversight touch data integration

Different organizational units provide a structure in which data integration can be collaborative:
- Technology-focused organizational structures
- Business-driven organizational structures
- Hybrid structures

"Corporations and other user organizations have hired more inhouse data integration specialists in response to an increase in the amount of data warehousing work and operational data integration work outside of warehousing", he wrote.

"Although much of the collaboration around data integration consists of verbal communication, software tools for data integration include functions that automate some aspects of collaboration", he also wrote. Some features have existed in other application development tools, but were only recently added to data integration tools,like: Check out and check in, Versioning, and source code management features.

About Data Integration Tool Requirements for Business Collaboration: "a few data integration and data quality tools today support areas within the tools for data stewards or business personnel to use. In such an area, the user may actively do some hands-on work, like select data structures that need quality or integration attention, design a rudimentary data flow (which a technical worker will flesh out later), or annotate development artifacts (e.g., with descriptions of what the data represents to the business)".

He explained that collaboration via a tool depends on a central repository: The views just described are enabled by a repository that accompanies the data integration tool. Depending on the tool brand, the repository may be a dedicated metadata or source code repository that has been extended to manage much more than metadata and development artifacts, or it may be a general database management system.

He finished with some recommendations:

- Recognize that data integration has collaborative requirements. The greater the number of data integration specialists and people who work closely with them, the greater the need is for collaboration around data integration.

- Determine an appropriate scope for collaboration. At the low end, bug fixes don’t merit much collaboration; at the top end, business transformation events require the most.

- Support collaboration with organizational structures. These can be technology focused (like data management groups), business driven (data stewardship and governance), or a hybrid of the two (BI teams and competency centers).

- Select data integration tools that support broad collaboration. For technical implementers, this means data integration tools with source code management features (especially for versioning). For business collaboration, it means an area within a data integration tool where the user can select data structures and design rudimentary process flows for data integration.

- Demand a central repository. Both technical and business team members—and their management—benefit from an easily accessed, server-based repository through which everyone can share their thoughts and documents, as well as view project information and semantic data relevant to data integration.

No comments: