IXIA (IndeX-based Integration Approach) A Hybrid Approach to Data Integration

There is a large and increasing volume of documents, data sources and data base management systems available in the world, and many autonomous and heterogeneous sources speak of a same reality while using different words and conceptual structures. Many organizations need to dispose of a system that handles such data in a homogeneous way, which necessitates the integration of these data sources. The goal of a data integration system is to develop a homogeneous interface for the end users to query several heterogeneous and autonomous sources. Building such a homogeneous interface raises many challenges among which the heterogeneity of data sources, the fragmentation of data, the processing and optimization of queries appear to be the most important. There are many research projects that present different approaches and each of them proposes a solution to each of these problems. Depending on the integrated view, these approaches can be categorized into two main categories: materialized and virtual approaches; there are also some hybrid approaches when there is a composition of materialized and virtual views. The main advantage of a hybrid approach is to offer a trade-off between the query response time and data freshness in a data integration system. In the existing approaches, query optimization is often privileged for the materialized part of the system. In this thesis, we develop a hybrid approach which aims to extend query optimization to all the queries of the integration system. It also provides a flexible data refreshing mechanism in order to tolerate different characteristics of sources and their data. This approach is based on the Osiris object indexing system. Osiris is a database and knowledge base platform with a specific object data model based on a hierarchy of views. Its indexation system relies on the partitioning of the object space using the view constraints. IXIA, the hybrid approach presented in this thesis, materializes the indexation structure of the underlying objects at the mediator level. The Oids of objects, their correspondence with the source objects and the needed data to refresh the indexation data are also materialized. Our index-based data integration approach offers more flexibility in data refreshing than a fully materialized approach and a better query response time in comparison with a fully virtual data integration system.

[1]  Craig A. Knoblock,et al.  Selectively materializing data in mediators by analyzing source structure, query distribution and maintenance cost , 1999, WIDM '99.

[2]  Michael Gruninger,et al.  ONTOLOGY Applications and Design , 2002 .

[3]  Divesh Srivastava,et al.  The Information Manifold , 1995 .

[4]  Fausto Giunchiglia,et al.  Data Management for Peer-to-Peer Computing : A Vision , 2002, WebDB.

[5]  Dennis McLeod,et al.  A federated architecture for database systems , 1899 .

[6]  Wolfgang Faber,et al.  The INFOMIX system for advanced integration of incomplete and inconsistent data , 2005, SIGMOD '05.

[7]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[8]  Vipul Kashyap,et al.  InfoSleuth: agent-based semantic integration of information in open and dynamic environments , 1997, SIGMOD '97.

[9]  Jennifer Widom,et al.  The TSIMMIS Approach to Mediation: Data Models and Languages , 1997, Journal of Intelligent Information Systems.

[10]  Ana Simonet,et al.  A Semantic View-based Multi-mediator Architecture , 2007, IKE.

[11]  Markus Tresch,et al.  Updatable Views in Object-Oriented Databases , 1991, DOOD.

[12]  Vipul Kashyap,et al.  Semantic heterogeneity in global information systems: The role of metadata , 1996 .

[13]  Divesh Srivastava,et al.  Data model and query evaluation in global information systems , 1995, Journal of Intelligent Information Systems.

[14]  Alexandra Poulovassilis,et al.  Schema Evolution in Heterogeneous Database Architectures, A Schema Transformation Approach , 2002, CAiSE.

[15]  Jennifer Widom,et al.  Integrating dynamically-fetched external information into a DBMS for semistructured data , 1997, SGMD.

[16]  Craig A. Knoblock,et al.  Query Processing in an Information Mediator , 1994 .

[17]  Ana Simonet,et al.  Bringing Together Description Logics and Database in an Object Oriented Model , 2002, DEXA.

[18]  Michael Boyd,et al.  AutoMed: A BAV Data Integration System for Heterogeneous Data Sources , 2004, CAiSE.

[19]  Laura M. Haas,et al.  Towards heterogeneous multimedia information systems: the Garlic approach , 1995, Proceedings RIDE-DOM'95. Fifth International Workshop on Research Issues in Data Engineering-Distributed Object Management.

[20]  Jennifer Widom,et al.  The TSIMMIS Project: Integration of Heterogeneous Information Sources , 1994, IPSJ.

[21]  Maurizio Lenzerini,et al.  Representing and Using Interschema Knowledge in Cooperative Information Systems , 1993, Int. J. Cooperative Inf. Syst..

[22]  Volker Haarslev,et al.  A Hybrid Approach for Ontology Integration , 2005 .

[23]  Jennifer Widom,et al.  Object exchange across heterogeneous information sources , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[24]  Alexandra Poulovassilis,et al.  Data integration by bi-directional schema transformation rules , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[25]  Alon Y. Halevy,et al.  Efficiently ordering query plans for data integration , 1999, Proceedings 18th International Conference on Data Engineering.

[26]  Ralph Kimball,et al.  The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling , 1996 .

[27]  Gang Zhou,et al.  A framework for supporting data integration using the materialized and virtual approaches , 1996, SIGMOD '96.

[28]  Felix Naumann,et al.  Quality-driven Integration of Heterogenous Information Systems , 1999, VLDB.

[29]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[30]  Ron van der Meyden,et al.  Logical Approaches to Incomplete Information: A Survey , 1998, Logics for Databases and Information Systems.

[31]  Surajit Chaudhuri,et al.  Maintenance of Materialized Views: Problems, Techniques, and Applications. , 1995 .

[32]  Catriel Beeri,et al.  Querying XML Sources Using an Ontology-Based Mediator , 2002, CoopIS/DOA/ODBASE.

[33]  Dan Suciu,et al.  Warehousing and incremental evaluation for Web Site management , 1998, BDA.

[34]  Antonella Poggi,et al.  Filling the gap between data federation and data integration , 2004, SEBD.

[35]  Kenneth Revett,et al.  Utilizing Staging Tables in Data Integration to Load Data into Materialized Views , 2004, CIS.

[36]  Craig A. Knoblock,et al.  Query reformulation for dynamic information integration , 1996, Journal of Intelligent Information Systems.

[37]  David F. McAllister,et al.  Discrete mathematics in computer science , 1977 .

[38]  Vipul Kashyap,et al.  So Far (Schematically) yet So Near (Semantically) , 1992, DS-5.

[39]  Dennis McLeod,et al.  A federated architecture for information management , 1985, TOIS.

[40]  Yannis Kalfoglou,et al.  Ontology mapping: the state of the art , 2003, The Knowledge Engineering Review.

[41]  Dejing Dou,et al.  Knowledge Representation Formalisms and Methods—Representation , 2022 .

[42]  Maurizio Lenzerini,et al.  On reconciling data exchange, data integration, and peer data management , 2007, PODS '07.

[43]  Jie Zhao,et al.  Schema Mediation in Peer Data Management Systems , 2011, Int. J. Cooperative Inf. Syst..

[44]  Diego Calvanese,et al.  Answering Queries Using Views over Description Logics Knowledge Bases , 2000, AAAI/IAAI.

[45]  Roger King,et al.  Generating data integration mediators that use materialization , 1996, Journal of Intelligent Information Systems.

[46]  K. Selçuk Candan,et al.  Query caching and optimization in distributed mediator systems , 1996, SIGMOD '96.

[47]  Ioana Manolescu,et al.  Agora: Living with XML and Relational , 2000, VLDB.

[48]  Jungyun Seo,et al.  Classifying schematic and data heterogeneity in multidatabase systems , 1991, Computer.

[49]  Hector Garcia-Molina,et al.  Expiring Data in a Warehouse , 1998, VLDB.

[50]  Gunter Saake,et al.  A Unified Schema Matching Framework , 2007, Grundlagen von Datenbanken.

[51]  Georges Gardarin,et al.  P2P Semantic Mediation of Web Sources , 2006, ICEIS.

[52]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2003, Theor. Comput. Sci..

[53]  Andrea Calì,et al.  Models for Information Integration: Turning Local-as-View Into Global-as-View , 2001 .

[54]  Shiwei Tang,et al.  Discovering and generating materialized XML views in data integration system , 2004, Proceedings. International Database Engineering and Applications Symposium, 2004. IDEAS '04..

[55]  Roy Goldman,et al.  Lore: a database management system for semistructured data , 1997, SGMD.

[56]  Joann J. Ordille,et al.  Data integration: the teenage years , 2006, VLDB.

[57]  Diego Calvanese,et al.  Description Logic Framework for Information Integration , 1998, KR.

[58]  Amanda Bennett,et al.  Data Federation With IBM DB2 Information Integrator V8.1 , 2003 .

[59]  Tore Risch,et al.  Functional Data Integration in a Distributed Mediator System , 2004 .

[60]  Jia-Lang Seng,et al.  Data warehouse enhancement: A semantic cube model approach , 2007, Inf. Sci..

[61]  Alexandra Poulovassilis,et al.  View Generation and Optimisation in the AutoMed Data Integration Framework , 2003, CAiSE Short Paper Proceedings.

[62]  Ioana Manolescu,et al.  Answering XML Queries on Heterogeneous Data Sources , 2001, VLDB.

[63]  Chantal Reynaud,et al.  PICSEL and Xyleme: Two Illustrative Information Integration Agents , 2003, AgentLink.

[64]  Venkataraman Ramesh,et al.  Management of Heterogeneous and Autonomous Database Systems , 1999 .

[65]  Craig A. Knoblock,et al.  Semantic Query Optimization for Query Plans of Heterogeneous Multidatabase Systems , 2000, IEEE Trans. Knowl. Data Eng..

[66]  Domenico Beneventano,et al.  The MOMIS methodology for integrating heterogeneous data sources , 2004, IFIP Congress Topical Sessions.

[67]  Laura M. Haas,et al.  Beauty and the Beast: The Theory and Practice of Information Integration , 2007, ICDT.

[68]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[69]  Dennis McLeod,et al.  An intelligent system for identifying and integrating non-local objects in federated database systems , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[70]  Silvana Castano,et al.  Information Integration: The MOMIS Project Demonstration , 2000, VLDB.

[71]  David Jordan,et al.  The Object Database Standard: ODMG 2.0 , 1997 .

[72]  H. V. Jagadish,et al.  Data Integration using Self-Maintainable Views , 1996, EDBT.

[73]  Agnieszka Lawrynowicz,et al.  On Reducing Redundancy in Mining Relational Association Rules from the Semantic Web , 2008, RR.

[74]  Alon Y. Halevy,et al.  Adapting to source properties in processing data integration queries , 2004, SIGMOD '04.

[75]  Alejandro P. Buchmann,et al.  Research Issues in Data Warehousing , 1997, BTW.

[76]  Laura M. Haas,et al.  Data integration through database federation , 2002, IBM Syst. J..

[77]  Tore Risch,et al.  AMOS-SDDS: A Scalable Distributed Data Manager for Windows Multicomputers , 2001, PDCS.

[78]  Alon Y. Halevy,et al.  Enterprise information integration: successes, challenges and controversies , 2005, SIGMOD '05.

[79]  Diego Calvanese,et al.  Linking Data to Ontologies , 2008, J. Data Semant..

[80]  Andrea Calì,et al.  IBIS: Semantic Data Integration at Work , 2003, CAiSE.

[81]  Stéphane Bressan,et al.  Context Interchange: New Features and Formalisms for the Intelligent Integration of Information Context Interchange: New Features and Formalisms for the Intelligent Integration of Information , 1997 .

[82]  Philippe Chatalic,et al.  Reasoning with Inconsistencies in Propositional Peer-to-Peer Inference Systems , 2006, ECAI.

[83]  Heiner Stuckenschmidt,et al.  Ontology-Based Integration of Information - A Survey of Existing Approaches , 2001, OIS@IJCAI.

[84]  Alon Y. Halevy,et al.  Combining Rules and Description Logics: An Overview of CARIN , 1995, ILPS.

[85]  Luis Gravano,et al.  The Stanford InfoBus and Its Service Layers: Augmenting the Internet with High-Level Information Management Protocols , 1998, The MeDoc Approach.

[86]  Eduardo Mena Nieto Observer: an approach for query processing in global information systems based on interoperation across pre-existing ontologies , 1999 .

[87]  Inderpal Singh Mumick,et al.  Using Object Matching And Materialization To Integrate Heterogeneous Databases , 1999 .

[88]  Diego Calvanese,et al.  Logical foundations of peer-to-peer data integration , 2004, PODS '04.

[89]  Maurizio Lenzerini,et al.  Introduction to the special issue on data extraction, cleaning, and reconciliation , 2001, Inf. Syst..

[90]  François Goasdoué,et al.  The Use of CARIN Language and Algorithms for Information Integration: The PICSEL System , 2000, Int. J. Cooperative Inf. Syst..

[91]  Anand Rajaraman,et al.  Conjunctive query containment revisited , 1997, Theor. Comput. Sci..