Representing and querying data transformations

Modern information systems often store data that has been transformed and integrated from a variety of sources. This integration may obscure the original source semantics of data items. For many tasks, it is important to be able to determine not only where data items originated, but also why they appear in the integration as they do and through what transformation they were derived. This problem is known as data provenance. In this work, we consider data provenance at the schema and mapping level. In particular, we consider how to answer questions such as "what schema elements in the source(s) contributed to this value", or "through what transformations or mappings was this value derived?" Towards this end, we elevate schemas and mappings to first-class citizens that are stored in a repository and are associated with the actual data values. An extended query language, called MXQL, is also developed that allows meta-data to be queried as regular data and we describe its implementation scenario.

[1]  Erhard Rahm,et al.  Rondo: a programming platform for generic model management , 2003, SIGMOD '03.

[2]  Lois M. L. Delcambre,et al.  Superimposed Schematics: Introducing E-R Structure for In-Situ Information Selections , 2002, ER.

[3]  Jennifer Widom,et al.  Practical lineage tracing in data warehouses , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[4]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2003, Theor. Comput. Sci..

[5]  Wang Chiew Tan,et al.  An annotation management system for relational databases , 2004, The VLDB Journal.

[6]  Ronald Fagin,et al.  Translating Web Data , 2002, VLDB.

[7]  Philip A. Bernstein,et al.  A vision for management of complex models , 2000, SGMD.

[8]  Stéphane Bressan,et al.  The Context Interchange mediator prototype , 1997, SIGMOD '97.

[9]  Jayant Madhavan,et al.  Composing Mappings Among Data Sources , 2003, VLDB.

[10]  Laura M. Haas,et al.  Schema Mapping as Query Discovery , 2000, VLDB.

[11]  Alexandra Poulovassilis,et al.  Tracing Data Lineage Using Schema Transformation Pathways , 2003, Knowledge Transformation for the Semantic Web.

[12]  Renée J. Miller,et al.  Mapping Adaptation under Evolving Schemas , 2003, VLDB.

[13]  Stuart E. Madnick,et al.  A Polygen Model for Heterogeneous Database Systems: The Source Tagging Perspective , 1990, VLDB.

[14]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[15]  Ronald Fagin,et al.  Composing schema mappings: second-order dependencies to the rescue , 2004, PODS '04.

[16]  Cong Yu,et al.  Constraint-based XML query rewriting for data integration , 2004, SIGMOD '04.

[17]  V. Vianu,et al.  Edinburgh Why and Where: A Characterization of Data Provenance , 2017 .

[18]  Eric Prud'hommeaux,et al.  Annotea: an open RDF infrastructure for shared Web annotations , 2002, Comput. Networks.

[19]  Divesh Srivastava,et al.  Answering Queries Using Views. , 1999, PODS 1995.

[20]  Jennifer Widom,et al.  Representing and querying changes in semistructured data , 1998, Proceedings 14th International Conference on Data Engineering.

[21]  Laks V. S. Lakshmanan,et al.  SchemaSQL - A Language for Interoperability in Relational Multi-Database Systems , 1996, VLDB.

[22]  Matthew O. Ward,et al.  Managing Derived Data in the Gaea Scientific DBMS , 1993, VLDB.

[23]  Todd D. Millstein,et al.  Navigational Plans For Data Integration , 1999, AAAI/IAAI.

[24]  Alon Y. Halevy,et al.  Efficient query reformulation in peer data management systems , 2004, SIGMOD '04.

[25]  Forouzan Golshani,et al.  Proceedings of the Eighth International Conference on Data Engineering , 1992 .

[26]  Sanjeev Khanna,et al.  Edinburgh Research Explorer On the Propagation of Deletions and Annotations through Views , 2013 .

[27]  Divesh Srivastava,et al.  Answering Queries Using Views. , 1999, PODS 1995.