Schema mediation for large-scale semantic data sharing

Abstract.Intuitively, data management and data integration tools should be well suited for exchanging information in a semantically meaningful way. Unfortunately, they suffer from two significant problems: they typically require a common and comprehensive schema design before they can be used to store or share information, and they are difficult to extend because schema evolution is heavyweight and may break backward compatibility. As a result, many large-scale data sharing tasks are more easily facilitated by non-database-oriented tools that have little support for semantics.The goal of the peer data management system (PDMS) is to address this need: we propose the use of a decentralized, easily extensible data management architecture in which any user can contribute new data, schema information, or even mappings between other peers’ schemas. PDMSs represent a natural step beyond data integration systems, replacing their single logical schema with an interlinked collection of semantic mappings between peers’ individual schemas.This paper considers the problem of schema mediation in a PDMS. Our first contribution is a flexible language for mediating between peer schemas that extends known data integration formalisms to our more complex architecture. We precisely characterize the complexity of query answering for our language. Next, we describe a reformulation algorithm for our language that generalizes both global-as-view and local-as-view query answering algorithms. Then we describe several methods for optimizing the reformulation algorithm and an initial set of experiments studying its performance. Finally, we define and consider several global problems in managing semantic mappings in a PDMS.

[1]  Jennifer Widom,et al.  The TSIMMIS Project: Integration of Heterogeneous Information Sources , 1994, IPSJ.

[2]  Michael R. Genesereth,et al.  Answering recursive queries using views , 1997, PODS '97.

[3]  Alon Y. Halevy,et al.  Query Optimization by Predicate Move-Around , 1994, VLDB.

[4]  Laura M. Haas,et al.  Optimizing Queries Across Diverse Data Sources , 1997, VLDB.

[5]  Divesh Srivastava,et al.  Pushing constraint selections , 1992, J. Log. Program..

[6]  Craig A. Knoblock,et al.  Retrieving and Integrating Data from Multiple Information Sources , 1993, Int. J. Cooperative Inf. Syst..

[7]  Alon Y. Halevy,et al.  Static analysis in datalog extensions , 2001, JACM.

[8]  Serge Abiteboul,et al.  Complexity of answering queries using materialized views , 1998, PODS.

[9]  Maurizio Lenzerini,et al.  Interschema knowledge in cooperative information , 1993, [1993] Proceedings International Conference on Intelligent and Cooperative Information Systems.

[10]  Marc Friedman,et al.  Efficiently Executing Information-Gathering Plans , 1997, IJCAI.

[11]  Vipul Kashyap,et al.  InfoSleuth: Semantic Integration of Information in Open and Dynamic Environments (Experience Paper) , 1997, SIGMOD Conference.

[12]  Alon Y. Halevy,et al.  MiniCon: A scalable algorithm for answering queries using views , 2000, The VLDB Journal.

[13]  Verena Kantere,et al.  The hyperion project: from data integration to data coordination , 2003, SGMD.

[14]  Alon Y. Halevy,et al.  Piazza: data management infrastructure for semantic web applications , 2003, WWW '03.

[15]  Laura M. Haas,et al.  Schema Mapping as Query Discovery , 2000, VLDB.

[16]  K. Selçuk Candan,et al.  Query caching and optimization in distributed mediator systems , 1996, SIGMOD '96.

[17]  Karl Aberer,et al.  A framework for semantic gossiping , 2002, SGMD.

[18]  Vipul Kashyap,et al.  InfoSleuth: agent-based semantic integration of information in open and dynamic environments , 1997, SIGMOD '97.

[19]  Vipul Kashyap,et al.  Imprecise Answers in Distributed Environments: Estimation of Information Loss for Multi-Ontology Based Query Processing , 2000, Int. J. Cooperative Inf. Syst..

[20]  Maurice Bruynooghe,et al.  Compiling Control , 1989, J. Log. Program..

[21]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[22]  Amit P. Sheth,et al.  Specifying interdatabase dependencies in a multidatabase environment , 1991, Computer.

[23]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[24]  Nick Roussopoulos,et al.  Interoperability of multiple autonomous databases , 1990, CSUR.

[25]  Alon Y. Halevy,et al.  An XML query engine for network-bound data , 2002, The VLDB Journal.

[26]  Surajit Chaudhuri,et al.  Maintenance of Materialized Views: Problems, Techniques, and Applications. , 1995 .

[27]  Ravi Krishnamurthy,et al.  Language features for interoperability of databases with schematic discrepancies , 1991, SIGMOD '91.

[28]  G Stix,et al.  The mice that warred. , 2001, Scientific American.

[29]  Zachary G. Ives,et al.  Integrating Network-Bound XML Data , 2001, IEEE Data Eng. Bull..

[30]  Ioana Manolescu,et al.  Answering XML Queries on Heterogeneous Data Sources , 2001, VLDB.

[31]  Dan Suciu,et al.  What Can Database Do for Peer-to-Peer? , 2001, WebDB.

[32]  Maurizio Lenzerini,et al.  Representing and Using Interschema Knowledge in Cooperative Information Systems , 1993, Int. J. Cooperative Inf. Syst..

[33]  Alon Y. Halevy,et al.  Efficiently ordering query plans for data integration , 1999, Proceedings 18th International Conference on Data Engineering.

[34]  David P. Anderson,et al.  Formula: a programming language for expressive computer music , 1991, Computer.

[35]  ArenasMarcelo,et al.  The hyperion project , 2003 .

[36]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[37]  Erhard Rahm,et al.  Generic Schema Matching with Cupid , 2001, VLDB.

[38]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[39]  Beng Chin Ooi,et al.  An adaptive peer-to-peer network for distributed caching of OLAP results , 2002, SIGMOD '02.

[40]  Dan Suciu,et al.  The Piazza peer data management system , 2004, IEEE Transactions on Knowledge and Data Engineering.

[41]  Todd D. Millstein,et al.  Navigational Plans For Data Integration , 1999, AAAI/IAAI.

[42]  Jennifer Widom,et al.  The TSIMMIS Approach to Mediation: Data Models and Languages , 1997, Journal of Intelligent Information Systems.

[43]  Eugene Wong,et al.  Multibase: integrating heterogeneous distributed database systems , 1981, AFIPS '81.

[44]  Jayant Madhavan,et al.  Composing Mappings Among Data Sources , 2003, VLDB.

[45]  Beng Chin Ooi,et al.  PeerDB: a P2P-based system for distributed data sharing , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[46]  Fausto Giunchiglia,et al.  Data Management for Peer-to-Peer Computing : A Vision , 2002, WebDB.

[47]  Dan Suciu,et al.  The Piazza peer data management project , 2003, SGMD.

[48]  Subbarao Kambhampati,et al.  Optimizing Recursive Information-Gathering Plans , 1999, IJCAI.