Update Exchange with Mappings and Provenance

We consider systems for data sharing among heterogeneous peers related by a network of schema mappings. Each peer has a locally controlled and edited database instance, but wants to ask queries over related data from other peers as well. To achieve this, every peer's updates propagate along the mappings to the other peers. However, this update exchange is filtered by trust conditions --- expressing what data and sources a peer judges to be authoritative --- which may cause a peer to reject another's updates. In order to support such filtering, updates carry provenance information. These systems target scientific data sharing applications, and their general principles and architecture have been described in [20]. In this paper we present methods for realizing such systems. Specifically, we extend techniques from data integration, data exchange, and incremental view maintenance to propagate updates along mappings; we integrate a novel model for tracking data provenance, such that curators may filter updates based on trust conditions over this provenance; we discuss strategies for implementing our techniques in conjunction with an RDBMS; and we experimentally demonstrate the viability of our techniques in the ORCHESTRA prototype system.

[1]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[2]  Alexandra Poulovassilis,et al.  Using Schema Transformation Pathways for Data Lineage Tracing , 2005, BNCOD.

[3]  Alin Deutsch,et al.  Reformulation of XML Queries and Constraints , 2003, ICDT.

[4]  Phokion G. Kolaitis,et al.  Peer data exchange , 2005, PODS '05.

[5]  Alon Y. Halevy,et al.  PQL: a declarative query language over dynamic biological schemata , 2002, AMIA.

[6]  Alexandra Poulovassilis,et al.  P2P Query Reformulation over Both-As-View Data Transformation Rules , 2006, DBISP2P.

[7]  Alon Y. Halevy,et al.  An adaptive query execution system for data integration , 1999, SIGMOD '99.

[8]  Renée J. Miller,et al.  Mapping data in peer-to-peer systems: semantics and algorithmic issues , 2003, SIGMOD '03.

[9]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[10]  Guido Moerkotte,et al.  Efficient maintenance of materialized mediated views , 1995, SIGMOD '95.

[11]  Fausto Giunchiglia,et al.  Data Management for Peer-to-Peer Computing : A Vision , 2002, WebDB.

[12]  Michael R. Genesereth,et al.  Answering recursive queries using views , 1997, PODS '97.

[13]  Nicole Schweikardt,et al.  CWA-solutions for data exchange settings with target dependencies , 2007, PODS '07.

[14]  Todd D. Millstein,et al.  Navigational Plans For Data Integration , 1999, AAAI/IAAI.

[15]  Val Tannen,et al.  Provenance semirings , 2007, PODS.

[16]  Michael J. Carey,et al.  XPERANTO: Publishing Object-Relational Data as XML , 2000, WebDB.

[17]  Wang Chiew Tan,et al.  Debugging schema mappings with routes , 2006, VLDB.

[18]  Jennifer Widom,et al.  ULDBs: databases with uncertainty and lineage , 2006, VLDB.

[19]  Alin Deutsch,et al.  Query reformulation with constraints , 2006, SGMD.

[20]  Latha S. Colby,et al.  Algorithms for deferred view maintenance , 1996, SIGMOD '96.

[21]  Zachary G. Ives,et al.  Reconciling while tolerating disagreement in collaborative data sharing , 2006, SIGMOD Conference.

[22]  Ronald Fagin,et al.  Translating Web Data , 2002, VLDB.

[23]  Sanjeev Khanna,et al.  Data Provenance: Some Basic Issues , 2000, FSTTCS.

[24]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[25]  Oded Shmueli,et al.  Finiteness Properties of Database Queries , 1993, Australian Database Conference.

[26]  Diego Calvanese,et al.  Logical foundations of peer-to-peer data integration , 2004, PODS '04.

[27]  Sanjeev Khanna,et al.  Why and Where: A Characterization of Data Provenance , 2001, ICDT.

[28]  Jennifer Widom,et al.  Lineage tracing in data warehouses , 2001 .

[29]  Leonid Libkin,et al.  Data exchange and incomplete information , 2006, PODS '06.

[30]  Hamid Pirahesh,et al.  The Magic of Duplicates and Aggregates , 1990, VLDB.

[31]  Zachary G. Ives,et al.  ORCHESTRA: Rapid, Collaborative Sharing of Dynamic Data , 2005, CIDR.

[32]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2003, Theor. Comput. Sci..

[33]  V. S. Subrahmanian,et al.  Maintaining views incrementally , 1993, SIGMOD Conference.