A Schema-Based Approach to Enable Data Integration on the Fly

On-the-fly data integration, i.e. at query time, happens mostly in tightly coupled, homogeneous environments where the partitioning of the data can be controlled or is known in advance. During the process of data fusion, the information is homogenized and data inconsistencies are hidden from the application. Beyond this, we propose in this paper the Nexus metadata model and a processing approach that support on-the-fly data integration in a loosely coupled federation of autonomous data providers, thereby advancing the status quo in terms of flexibility and expressive power. It is able to represent data and schema inconsistencies like multi-valued attributes and multi-typed objects. In an open environment, this best suites the application needs where the data processing infrastructure is not able to decide which attribute value is correct. The Nexus metadata model provides the foundation for integration schemata that are specific to a given application domain. The corresponding processing model provides four complementary query semantics in order to account for the subtleties of multi-valued and missing attributes. In this paper we show that this query semantics is sound, easy to implement, and it builds upon existing query processing techniques. Thus the Nexus metadata model provides a unique level of flexibility for on-the-fly data integration.

[1]  Laura M. Haas,et al.  Schema AND Data: A Holistic Approach to Mapping, Resolution and Fusion in Information Integration , 2009, ER.

[2]  Amanda Bennett,et al.  Data Federation With IBM DB2 Information Integrator V8.1 , 2003 .

[3]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[4]  Joseph M. Hellerstein,et al.  Potter's Wheel: An Interactive Data Cleaning System , 2001, VLDB.

[5]  Rick Cattell,et al.  Scalable SQL and NoSQL data stores , 2011, SGMD.

[6]  Roland Ducournau,et al.  Metamodeling semantics of multiple inheritance , 2011, Sci. Comput. Program..

[7]  C. Becker Where do spatial context-models end and where do ontologies start ? A proposal of a combined approach , .

[8]  Joan M. Morrissey,et al.  Imprecise information and uncertainty in information systems , 1990, TOIS.

[9]  Tomasz Imielinski,et al.  Incomplete information and dependencies in relational databases , 1983, SIGMOD '83.

[10]  Anind K. Dey,et al.  Understanding and Using Context , 2001, Personal and Ubiquitous Computing.

[11]  Jadwiga Indulska,et al.  A software engineering framework for context-aware pervasive computing , 2004, Second IEEE Annual Conference on Pervasive Computing and Communications, 2004. Proceedings of the.

[12]  Jennifer Widom,et al.  The TSIMMIS Approach to Mediation: Data Models and Languages , 1997, Journal of Intelligent Information Systems.

[13]  Carlo Zaniolo,et al.  Using SQL to Build New Aggregates and Extenders for Object- Relational Systems , 2000, VLDB.

[14]  Daniela Nicklas,et al.  DCbot: exploring the Web as value-added service for location-based applications , 2005, 21st International Conference on Data Engineering (ICDE'05).

[15]  E. F. CODD,et al.  A relational model of data for large shared data banks , 1970, CACM.

[16]  Mehedi Masud,et al.  Transaction processing in a peer to peer database network , 2011, Data Knowl. Eng..

[17]  Amihai Motro,et al.  Fusionplex: resolution of data inconsistencies in the integration of heterogeneous information sources , 2006, Inf. Fusion.

[18]  Michael Stonebraker,et al.  Object-Relational DBMSs: The Next Great Wave , 1995 .

[19]  Yamine Aït Ameur,et al.  An Object-Oriented Based Algebra for Ontologies and Their Instances , 2007, ADBIS.

[20]  Sergio Greco,et al.  Integrating and Managing Conflicting Data , 2001, Ershov Memorial Conference.

[21]  D. Dou,et al.  Data-driven Matching Of Geospatial Schemas , 2015 .

[22]  Dennis Shasha,et al.  AJAX: an extensible data cleaning tool , 2000, SIGMOD '00.

[23]  Laura M. Haas,et al.  The Clio project: managing heterogeneity , 2001, SGMD.

[24]  Ohya Jun,et al.  Policy Design and Producer's strategy under Extended Producer Responsibility : Economic Responsibility (情報システムと社会環境(IS) Vol.2009-IS-111) , 2010 .

[25]  Felix Naumann,et al.  Data Fusion in Three Steps: Resolving Schema, Tuple, and Value Inconsistencies , 2006, IEEE Data Eng. Bull..

[26]  John T. Rickard Level 2/3 fusion in conceptual spaces , 2006, 2006 9th International Conference on Information Fusion.

[27]  Daniela Nicklas,et al.  Efficiently Managing Context Information for Large-Scale Scenarios , 2005, Third IEEE International Conference on Pervasive Computing and Communications.

[28]  Scott Shenker,et al.  Shark: SQL and rich analytics at scale , 2012, SIGMOD '13.

[29]  Bernhard Mitschang,et al.  On building location aware applications using an open platform based on the NEXUS Augmented World Model , 2003, Software and Systems Modeling.

[30]  N. Cipriani,et al.  Tool support for the design and management of context models , 2011, Inf. Syst..

[31]  S. Volz,et al.  LINKING DIFFERENT GEOSPATIAL DATABASES BY EXPLICIT RELATIONS , 2004 .

[32]  Dan Suciu,et al.  Schema mediation in peer data management systems , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[33]  E. F. Codd,et al.  Missing information (applicable and inapplicable) in relational databases , 1986, SGMD.

[34]  David Maier,et al.  Challenges for Query Processing in Object-Oriented Databases , 1991, Query Processing for Advanced Database Systems.

[35]  James A. Hendler Probing the Pachyderm: A Plea for Proaction , 2000, IEEE Intell. Syst..

[36]  Gunter Saake,et al.  Adding Conflict Resolution Features to a Query Language for Database Federations , 2000, Australas. J. Inf. Syst..

[37]  Wolfgang Hoschek A database for Dynamic Distributed Content and its Application , 2002, Sci. Ann. Cuza Univ..

[38]  Yannis Papakonstantinou,et al.  Object Fusion in Mediator Systems , 1996, VLDB.

[39]  Abdelkader Hameurlain,et al.  Ontology-Based Method for Schema Matching in a Peer-to-Peer Database System , 2009, BNCOD.

[40]  M. Tamer Özsu,et al.  Conflict tolerant queries in AURORA , 1999, Proceedings Fourth IFCIS International Conference on Cooperative Information Systems. CoopIS 99 (Cat. No.PR00384).

[41]  Laura M. Haas,et al.  Transforming Heterogeneous Data with Database Middleware: Beyond Integration , 1999, IEEE Data Eng. Bull..

[42]  D. Nicklas,et al.  Efficient Domain-Specific Information Integration in Nexus , 2004 .

[43]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[44]  Christian Becker,et al.  From home to world - supporting context-aware applications through world models , 2004, Second IEEE Annual Conference on Pervasive Computing and Communications, 2004. Proceedings of the.

[45]  Bertram Ludäscher,et al.  Registering Scientific Information Sources for Semantic Mediation , 2002, ER.

[46]  Daniela Nicklas,et al.  NexusScout: An Advanced Location-Based Application on a Distributed, Open Mediation Platform , 2003, VLDB.

[47]  Frank Leymann,et al.  Managing Technical Processes Using Smart Workflows , 2008, ServiceWave.

[48]  Daniela Nicklas,et al.  Adding High-level Reasoning to Efficient Low-level Context Management: A Hybrid Approach , 2008, 2008 Sixth Annual IEEE International Conference on Pervasive Computing and Communications (PerCom).

[49]  Bernhard Mitschang,et al.  Design and Implementation Issues for Explorative Location-based Applications: The NexusRallye , 2004, GEOINFO.

[50]  Ian T. Foster,et al.  The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets , 2000, J. Netw. Comput. Appl..