Warehouse Creation - A Potential Roadblock to Data Warehousing

Data warehousing is gaining in popularity as organizations realize the benefits of being able to perform sophisticated analyses of their data. Recent years have seen the introduction of a number of data-warehousing engines, from both established database vendors as well as new players. The engines themselves are relatively easy to use and come with a good set of end-user tools. However, there is one key stumbling block to the rapid development of data warehouses, namely that of warehouse population. Specifically, problems arise in populating a warehouse with existing data since it has various types of heterogeneity. Given the lack of good tools, this task has generally been performed by various system integrators, e.g., software consulting organizations which have developed in-house tools and processes for the task. The general conclusion is that the task has proven to be labor-intensive, error-prone, and generally frustrating, leading a number of warehousing projects to be abandoned mid-way through development. However, the picture is not as grim as it appears. The problems that are being encountered in warehouse creation are very similar to those encountered in data integration, and they have been studied for about two decades. However, not all problems relevant to warehouse creation have been solved, and a number of research issues remain. The principal goal of this paper is to identify the common issues in data integration and data-warehouse creation.

[1]  P ShethAmit,et al.  Federated database systems for managing distributed, heterogeneous, and autonomous databases , 1990 .

[2]  Richard D. Hackathorn,et al.  Using the Data Warehouse , 1994 .

[3]  Michael Stonebraker,et al.  Migrating Legacy Systems: Gateways, Interfaces, and the Incremental Approach , 1995 .

[4]  C. Pu Key equivalence in heterogeneous databases , 1991, [1991] Proceedings. First International Workshop on Interoperability in Multidatabase Systems.

[5]  Ashish Gupta,et al.  Aggregate-Query Processing in Data Warehousing Environments , 1995, VLDB.

[6]  Ramez Elmasri,et al.  Fundamentals of Database Systems , 1989 .

[7]  Umeshwar Dayal,et al.  Query Processing in a Multidatabase System , 1985, Query Processing in Database Systems.

[8]  Umeshwar Dayal,et al.  Processing Queries Over Generalization Hierarchies in a Multidatabase System , 1983, VLDB.

[9]  Shashi Shekhar,et al.  An Evidential Reasoning Approach to Attribute Value Conflict Resolution in Database Integration , 1996, IEEE Trans. Knowl. Data Eng..

[10]  Amit P. Sheth,et al.  Specifying interdatabase dependencies in a multidatabase environment , 1991, Computer.

[11]  Stephen Hayne,et al.  Multi-user view integration system (MUVIS): an expert system for view integration , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[12]  Clement T. Yu,et al.  Query Processing in Multidatabase Systems , 1995, Modern Database Systems.

[13]  Jaideep Srivastava,et al.  Mining Entity-Identification Rules for Database Integration , 1996, KDD.

[14]  Jaideep Srivastava,et al.  Entity identification in database integration , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[15]  Brad Perry,et al.  Applying a Data Miner To Heterogeneous Schema Integration , 1995, KDD.

[16]  Stephen R. Gardner Building the data warehouse , 1998, CACM.

[17]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[18]  H. Garcia-Molina,et al.  Global consistency constraints considered harmful for heterogeneous database systems , 1991, [1991] Proceedings. First International Workshop on Interoperability in Multidatabase Systems.

[19]  Arie Segev,et al.  Data manipulation in heterogeneous databases , 1991, SGMD.

[20]  Weimin Du,et al.  The Pegasus heterogeneous multidatabase system , 1991, Computer.

[21]  LINDA G. DEMICHIEL,et al.  Resolving Database Incompatibility: An Approach to Performing Relational Operations over Mismatched Domains , 1989, IEEE Trans. Knowl. Data Eng..

[22]  Jennifer Widom,et al.  Research problems in data warehousing , 1995, CIKM '95.

[23]  Stuart E. Madnick,et al.  The inter-database instance identification problem in integrating autonomous systems , 1989, [1989] Proceedings. Fifth International Conference on Data Engineering.

[24]  Salvatore J. Stolfo,et al.  The merge/purge problem for large databases , 1995, SIGMOD '95.

[25]  Jano Moreira de Souza SIS - A Schema Integration System , 1986, BNCOD.

[26]  William Kent,et al.  The breakdown of the information model in multi-database systems , 1991, SGMD.