Chapter 10 – Data Consolidation and Integration

Publisher Summary There are two aspects to consolidation and integration: the initial migration of data from source data and the ongoing integration of data instances created, modified, or removed from the master environment. Data integration comprises the processes for collecting data from different sources and making that data accessible to specific applications. Because data integration has largely been used for building data warehouses, it is often seen as part of collecting data for analysis. However, as more operational data sharing activities are seen, it is observed that data integration has become a core service necessary for both analytics and operations. This chapter explores the techniques employed in extracting, collecting, and merging data from various sources. The intention is to identify the component services required for creating the virtual data set that has a single instance and representation for every unique master data object and then use the data integration and consolidation services to facilitate information sharing through the master data asset. The central driver of master data management is the ability to locate the variant representations of any unique entity and unify the organization's view to that entity. The process by which this is accomplished must be able to access the candidate data sets, extract the appropriate identifying information, and then correctly assess similarity to a degree that allows for linkage between records and consolidation of the data into an application service framework that permits access to the unified view. The technical components of this process—parsing, standardization, matching and identity resolution, and consolidation—must be integrated within two aspects of the master environment: the underlying technical infrastructure and models that support the implementation, and the governance framework that oversees the management of information access and usage policies.