Challenges and conflicts integrating heterogeneous data warehouses in virtual organisations

This paper addresses challenges, which appear when heterogeneous data warehouses are integrated. Such a scenario especially shows up in the environment of virtual organisations. The involved companies, of the before named constellation, need to combine their business data to attain 'one single version of truth' for decisions beyond organisational borders. The occurring issues are depicted and analysed from an exemplary and theoretical point of view. Not only weaknesses, but also reasons for those shortcomings are analysed. The introduction throws a glance at growing data volumes and the derived consequences for the business. Afterwards, a working definition of the term 'virtual organisation' is established and the link and interdependence to information technology is offered. In the following sections schema and mapping conflicts are discussed. The eighth section targets the detection of organisational deficits, which are probably a facilitator for those problems. Moreover, path breaking decisions within a data warehouse integration project are visualised, to provide decision makers with some rough reference points. The paper is completed by a view on the human component and an outlook on in-memory computing.

[1]  Laks V. S. Lakshmanan,et al.  nD-SQL: A Multi-Dimensional Language for Interoperability and OLAP , 1998, VLDB.

[2]  Jennifer Widom,et al.  Making views self-maintainable for data warehousing , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[3]  David W. Embley,et al.  Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration , 2001, Workshop on Information Integration on the Web.

[4]  Charles M. Savage Fifth generation management : co-creating through virtual enterprising, dynamic teaming and knowledge networking , 1996 .

[5]  Laura M. Haas,et al.  Data-driven understanding and refinement of schema mappings , 2001, SIGMOD '01.

[6]  Silvana Castano,et al.  Semantic integration of heterogeneous information sources , 2001, Data Knowl. Eng..

[7]  Myoung-Ho Kim,et al.  Reducing the cost of accessing relations in incremental view maintenance , 2007, Decis. Support Syst..

[8]  Ralph Kimball,et al.  The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses , 1996 .

[9]  Leonard J. Seligman,et al.  Bulletin of the Technical Committee on Data Engineering September 2002 , 2002 .

[10]  Craig A. Knoblock,et al.  Retrieving and Integrating Data from Multiple Information Sources , 1993, Int. J. Cooperative Inf. Syst..

[11]  Chris Clifton,et al.  Database Integration Using Neural Networks: Implementation and Experiences , 2000, Knowledge and Information Systems.

[12]  Inderpal Singh Mumick,et al.  Incremental maintenance of aggregate and outerjoin expressions , 2006, Inf. Syst..

[13]  AnHai Doan,et al.  Corpus-based schema matching , 2005, 21st International Conference on Data Engineering (ICDE'05).

[14]  Erhard Rahm,et al.  COMA - A System for Flexible Combination of Schema Matching Approaches , 2002, VLDB.

[15]  A. Zeroual,et al.  MSQL: A Multidatabase Language , 1989, Inf. Sci..

[16]  Alon Y. Halevy,et al.  Semantic Integration Research in the Database Community : A Brief Survey , 2005 .

[17]  W. H. Davidow,et al.  The Virtual Corporation: Structuring and Revitalizing the Corporation for the 21st Century , 1992 .

[18]  Daniel P. Miranker,et al.  SPHINX: Schema integration by example , 2007, Journal of Intelligent Information Systems.

[19]  Amihai Motro,et al.  Database Schema Matching Using Machine Learning with Feature Selection , 2002, CAiSE.

[20]  Ralph Kimball,et al.  The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling , 1996 .

[21]  Abbe Mowshowitz,et al.  Social Dimensions of Office Automation , 1986, Adv. Comput..

[22]  Michael Schrefl,et al.  Analysing Multi-dimensional Data Across Autonomous Data Warehouses , 2006, DaWaK.

[23]  Pedro M. Domingos,et al.  iMAP: discovering complex semantic matches between database schemas , 2004, SIGMOD '04.

[24]  Laura M. Haas,et al.  Garlic: a new flavor of federated query processing for DB2 , 2002, SIGMOD '02.

[25]  P. Mertens,et al.  Virtuelle Unternehmen — Einführung und Überblick , 1997 .

[26]  David W. Embley,et al.  A composite approach to automating direct and indirect schema mappings , 2006, Inf. Syst..

[27]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[28]  W. H. Inmon,et al.  Building the data warehouse , 1992 .

[29]  Philip A. Bernstein,et al.  Industrial-strength schema matching , 2004, SGMD.

[30]  Erhard Rahm,et al.  On Matching Schemas Automatically , 2001 .

[31]  Michael R. Genesereth,et al.  Infomaster: an information integration system , 1997, SIGMOD '97.

[32]  Silvana Castano,et al.  A schema analysis and reconciliation tool environment for heterogeneous databases , 1999, Proceedings. IDEAS'99. International Database Engineering and Applications Symposium (Cat. No.PR00265).

[33]  Edward L. Robertson,et al.  Relational languages for metadata integration , 2005, TODS.

[34]  W. H. Inmon,et al.  Building the data warehouse (2nd ed.) , 1996 .

[35]  M. Malone The Virtual Corporation , 1993 .

[36]  Oksana Arnold,et al.  Virtuelle Unternehmen als Unternehmenstyp der Zukunft? , 1995, HMD Prax. Wirtsch..

[37]  Hicham G. Elmongui,et al.  Lazy Maintenance of Materialized Views , 2007, VLDB.

[38]  Pedro M. Domingos,et al.  Reconciling schemas of disparate data sources: a machine-learning approach , 2001, SIGMOD '01.

[39]  Paul Buitelaar,et al.  Ontology-based information extraction and integration from heterogeneous data sources , 2008, Int. J. Hum. Comput. Stud..

[40]  Luigi Palopoli,et al.  The System DIKE: Towards the Semi-Automatic Synthesis of Cooperative Information Systems and Data Warehouses , 2000, ADBIS-DASFAA Symposium.

[41]  Jennifer Widom,et al.  The TSIMMIS Approach to Mediation: Data Models and Languages , 1997, Journal of Intelligent Information Systems.

[42]  K. Selçuk Candan,et al.  Query caching and optimization in distributed mediator systems , 1996, SIGMOD '96.

[43]  Matteo Golfarelli,et al.  The Dimensional Fact Model: A Conceptual Model for Data Warehouses , 1998, Int. J. Cooperative Inf. Syst..

[44]  Nicolas Spyratos,et al.  Mediators over taxonomy-based information sources , 2005, The VLDB Journal.

[45]  Tova Milo,et al.  Using Schema Matching to Simplify Heterogeneous Data Translation , 1998, VLDB.

[46]  Laks V. S. Lakshmanan,et al.  SchemaSQL: An extension to SQL for multidatabase interoperability , 2001, ACM Trans. Database Syst..

[47]  Witold Litwin,et al.  Multidatabase Interoperability , 1986, Computer.