Composing Mappings Among Data Sources

Semantic mappings between data sources play a key role in several data sharing architectures. Mappings provide the relationships between data stored in different sources, and therefore enable answering queries that require data from other nodes in a data sharing network. Composing mappings is one of the core problems that lies at the heart of several optimization methods in data sharing networks, such as caching frequently traversed paths and redundancy analysis. This paper investigates the theoretical underpinnings of mapping composition. We study the problem for a rich mapping language, GLAV, that combines the advantages of the known mapping formalisms globalas-view and local-as-view. We first show that even when composing two simple GLAV mappings, the full composition may be an infinite set of GLAV formulas. Second, we show that if we restrict the set of queries to be in CQk (a common restriction in practice), then we can always encode the infinite set of GLAV formulas using a finite representation. Furthermore, we describe an algorithm that given a query and a finite encoding of an infinite set of GLAV formulas, finds all the certain answers to the query. Consequently, we show that for a commonly occuring class of queries it is possible to pre-compose mappings, thereby potentially offering significant savings in query processing.

[1]  Ashok K. Chandra,et al.  Optimal implementation of conjunctive queries in relational data bases , 1977, STOC '77.

[2]  Anand Rajaraman,et al.  Answering Queries Using Limited External Processors. , 1996, ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems.

[3]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[4]  Alon Y. Halevy,et al.  Static analysis in datalog extensions , 2001, JACM.

[5]  Anand Rajaraman,et al.  Conjunctive query containment revisited , 1997, Theor. Comput. Sci..

[6]  Renée J. Miller,et al.  Mapping data in peer-to-peer systems: semantics and algorithmic issues , 2003, SIGMOD '03.

[7]  Chen Li,et al.  Minimizing View Sets without Losing Query-Answering Power , 2001, ICDT.

[8]  Moshe Y. Vardi On the Complexity of Bounded-Variable Queries. , 1995, PODS 1995.

[9]  Alon Y. Halevy,et al.  Piazza: data management infrastructure for semantic web applications , 2003, WWW '03.

[10]  Erhard Rahm,et al.  Rondo: a programming platform for generic model management , 2003, SIGMOD '03.

[11]  Michael R. Genesereth,et al.  Answering recursive queries using views , 1997, PODS '97.

[12]  Serge Abiteboul,et al.  Complexity of answering queries using materialized views , 1998, PODS.

[13]  Jeffrey D. Ullman,et al.  Information integration using logical views , 1997, Theor. Comput. Sci..

[14]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[15]  Ronald Fagin,et al.  Translating Web Data , 2002, VLDB.

[16]  Philip A. Bernstein,et al.  Applying Model Management to Classical Meta Data Problems , 2003, CIDR.

[17]  Pedro M. Domingos,et al.  Representing and reasoning about mappings between domain models , 2002, AAAI/IAAI.

[18]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2003, Theor. Comput. Sci..

[19]  Diego Calvanese,et al.  View-based query processing for regular path queries with inverse , 2000, PODS '00.

[20]  Tova Milo,et al.  Using Schema Matching to Simplify Heterogeneous Data Translation , 1998, VLDB.

[21]  Beng Chin Ooi,et al.  An adaptive peer-to-peer network for distributed caching of OLAP results , 2002, SIGMOD '02.

[22]  Fausto Giunchiglia,et al.  Data Management for Peer-to-Peer Computing : A Vision , 2002, WebDB.

[23]  Richard Hull,et al.  Managing semantic heterogeneity in databases: a theoretical prospective , 1997, PODS.

[24]  Todd D. Millstein,et al.  Navigational Plans For Data Integration , 1999, AAAI/IAAI.

[25]  Beng Chin Ooi,et al.  PeerDB: a P2P-based system for distributed data sharing , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[26]  Moshe Y. Vardi On the complexity of bounded-variable queries (extended abstract) , 1995, PODS '95.

[27]  Jeffrey D. Ullman,et al.  Answering queries using limited external query processors (extended abstract) , 1996, PODS.

[28]  Gio Wiederhold,et al.  Mediators in the architecture of future information systems , 1992, Computer.