Collaborative data sharing via update exchange and provenance

Recent work [Ives et al. 2005] proposed a new class of systems for supporting data sharing among scientific and other collaborations: this new collaborative data sharing system connects heterogeneous logical peers using a network of schema mappings. Each peer has a locally controlled and edited database instance, but wants to incorporate related data from other peers as well. To achieve this, every peer's data and updates propagate along the mappings to the other peers. However, this operation, termed update exchange, is filtered by trust conditions—expressing what data and sources a peer judges to be authoritative—which may cause a peer to reject another's updates. In order to support such filtering, updates carry provenance information. This article develops methods for realizing such systems: we build upon techniques from data integration, data exchange, incremental view maintenance, and view update to propagate updates along mappings, both to derived and optionally to source instances. We incorporate a novel model for tracking data provenance, such that curators may filter updates based on trust conditions over this provenance. We implement our techniques in a layer above an off-the-shelf RDBMS, and we experimentally demonstrate the viability of these techniques in the Orchestra prototype system.

[1]  Christoph Koch,et al.  Cooperative Update Exchange in the Youtopia System , 2009, Proc. VLDB Endow..

[2]  Zachary G. Ives,et al.  ORCHESTRA: Rapid, Collaborative Sharing of Dynamic Data , 2005, CIDR.

[3]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2003, Theor. Comput. Sci..

[4]  Arthur M. Keller,et al.  Algorithms for translating view updates to database updates for views involving selections, projections, and joins , 1985, PODS.

[5]  Wang Chiew Tan,et al.  Debugging schema mappings with routes , 2006, VLDB.

[6]  Nicole Schweikardt,et al.  CWA-solutions for data exchange settings with target dependencies , 2007, PODS '07.

[7]  Val Tannen,et al.  Provenance semirings , 2007, PODS.

[8]  V. S. Subrahmanian,et al.  Maintaining views incrementally , 1993, SIGMOD Conference.

[9]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[10]  Jennifer Widom,et al.  ULDBs: databases with uncertainty and lineage , 2006, VLDB.

[11]  Jennifer Widom,et al.  Lineage tracing for general data warehouse transformations , 2003, The VLDB Journal.

[12]  Jie Zhao,et al.  Schema Mediation in Peer Data Management Systems , 2011, Int. J. Cooperative Inf. Syst..

[13]  Michael R. Genesereth,et al.  Answering recursive queries using views , 1997, PODS '97.

[14]  Inderpal Singh Mumick,et al.  The Stanford Data Warehousing Project , 1995 .

[15]  Michael J. Carey,et al.  XPERANTO: Publishing Object-Relational Data as XML , 2000, WebDB.

[16]  Val Tannen,et al.  Update Exchange with Mappings and Provenance , 2007, VLDB.

[17]  Susan B. Davidson,et al.  BioGuideSRS: querying multiple sources with a user-centric perspective , 2007, Bioinform..

[18]  Oded Shmueli,et al.  Finiteness Properties of Database Queries , 1993, Australian Database Conference.

[19]  Diego Calvanese,et al.  Logical foundations of peer-to-peer data integration , 2004, PODS '04.

[20]  Renée J. Miller,et al.  Mapping data in peer-to-peer systems: semantics and algorithmic issues , 2003, SIGMOD '03.

[21]  Zachary G. Ives,et al.  Bidirectional Mappings for Data and Update Exchange , 2008, WebDB.

[22]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[23]  Todd D. Millstein,et al.  Navigational Plans For Data Integration , 1999, AAAI/IAAI.

[24]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[25]  Nicolas Spyratos,et al.  Update semantics of relational views , 1981, TODS.

[26]  Alin Deutsch,et al.  Query reformulation with constraints , 2006, SGMD.

[27]  Alin Deutsch,et al.  XML queries and constraints, containment and reformulation , 2005, Theor. Comput. Sci..

[28]  Alexandra Poulovassilis,et al.  Using Schema Transformation Pathways for Data Lineage Tracing , 2005, BNCOD.

[29]  Alon Y. Halevy,et al.  PQL: a declarative query language over dynamic biological schemata , 2002, AMIA.

[30]  Val Tannen,et al.  Provenance in collaborative data sharing , 2009 .

[31]  Alexandra Poulovassilis,et al.  P2P Query Reformulation over Both-As-View Data Transformation Rules , 2006, DBISP2P.

[32]  Umeshwar Dayal,et al.  On the correct translation of update operations on relational views , 1982, TODS.

[33]  Phokion G. Kolaitis,et al.  Peer data exchange , 2005, PODS '05.

[34]  Fausto Giunchiglia,et al.  Data Management for Peer-to-Peer Computing : A Vision , 2002, WebDB.

[35]  Dan Suciu,et al.  Schema mediation in peer data management systems , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[36]  Dan Suciu,et al.  Data conflict resolution using trust mappings , 2010, SIGMOD Conference.

[37]  Guido Moerkotte,et al.  Efficient maintenance of materialized mediated views , 1995, SIGMOD '95.

[38]  Gerhard Weikum,et al.  ACM Transactions on Database Systems , 2005 .

[39]  Zachary G. Ives,et al.  Reliable storage and querying for collaborative data sharing systems , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[40]  Koby Crammer,et al.  Learning to create data-integrating queries , 2008, Proc. VLDB Endow..

[41]  Zachary G. Ives,et al.  Reconciling while tolerating disagreement in collaborative data sharing , 2006, SIGMOD Conference.

[42]  Benjamin C. Pierce,et al.  Relational lenses: a language for updatable views , 2006, PODS '06.

[43]  Ronald Fagin,et al.  Translating Web Data , 2002, VLDB.

[44]  James Cheney,et al.  Provenance in Databases: Why, How, and Where , 2009, Found. Trends Databases.

[45]  Jennifer Widom,et al.  Lineage tracing in data warehouses , 2001 .

[46]  Sanjeev Khanna,et al.  Why and Where: A Characterization of Data Provenance , 2001, ICDT.

[47]  Val Tannen,et al.  Querying data provenance , 2010, SIGMOD Conference.

[48]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[49]  Val Tannen,et al.  ORCHESTRA: facilitating collaborative data sharing , 2007, SIGMOD '07.

[50]  Leonid Libkin,et al.  Data exchange and incomplete information , 2006, PODS '06.

[51]  Hamid Pirahesh,et al.  The Magic of Duplicates and Aggregates , 1990, VLDB.