Annotated XML: queries and provenance

We present a formal framework for capturing the provenance of data appearing in XQuery views of XML. Building on previous work on relations and their (positive) query languages, we decorate unordered XML with annotations from commutative semirings and show that these annotations suffice for a large positive fragment of XQuery applied to this data. In addition to tracking provenance metadata, the framework can be used to represent and process XML with repetitions, incomplete XML, and probabilistic XML, and provides a basis for enforcing access control policies in security applications. Each of these applications builds on our semantics for XQuery, which we present in several steps: we generalize the semantics of the Nested Relational Calculus (NRC) to handle semiring-annotated complex values, we extend it with a recursive type and structural recursion operator for trees, and we define a semantics for XQuery on annotated XML by translation into this calculus.

[1]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[2]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[3]  Limsoon Wong,et al.  Principles of Programming with Complex Objects and Collection Types , 1995, Theor. Comput. Sci..

[4]  Limsoon Wong,et al.  Query Languages for Bags and Aggregate Functions , 1997, J. Comput. Syst. Sci..

[5]  Val Tannen,et al.  A Calculus for Collections and Aggregates , 1997, Category Theory and Computer Science.

[6]  Werner Nutt,et al.  Queries with incomplete answers over semistructured data , 1999, PODS '99.

[7]  David J. DeWitt,et al.  Relational Databases for Querying XML Documents: Limitations and Opportunities , 1999, VLDB.

[8]  Daniela Florescu,et al.  Storing and Querying XML Data using an RDMBS , 1999, IEEE Data Eng. Bull..

[9]  Vishu Krishnamurthy,et al.  Performance Challenges in Object-Relational DBMSs , 1999, IEEE Data Eng. Bull..

[10]  Dan Suciu,et al.  UnQL: a query language and algebra for semistructured data based on structural recursion , 2000, The VLDB Journal.

[11]  Sanjeev Khanna,et al.  Why and Where: A Characterization of Data Provenance , 2001, ICDT.

[12]  H. V. Jagadish,et al.  ProTDB: Probabilistic Data in XML , 2002, VLDB.

[13]  Tim Furche,et al.  XPath: Looking Forward , 2002, EDBT Workshops.

[14]  Peter Buneman,et al.  A Structural Approach to Query Language Design , 2004 .

[15]  Jan Van den Bussche,et al.  Well-Definedness and Semantic Type-Checking in the Nested Relational Calculus and XQuery Extended Abstract , 2004, ICDT.

[16]  Maurice van Keulen,et al.  A probabilistic XML approach to data integration , 2005, 21st International Conference on Data Engineering (ICDE'05).

[17]  Serge Abiteboul,et al.  Representing and querying XML with incomplete information , 2006, TODS.

[18]  Christopher Ré,et al.  A Complete and Efficient Algebraic Compiler for XQuery , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[19]  Serge Abiteboul,et al.  Querying and Updating Probabilistic Information in XML , 2006, EDBT.

[20]  William W. Wadge,et al.  Preferentially Annotated Regular Path Queries , 2007, ICDT.

[21]  Edward L. Robertson,et al.  Structural Recursion on Ordered Trees and List-Based Complex Objects , 2007, ICDT.

[22]  Val Tannen,et al.  Provenance semirings , 2007, PODS.

[23]  Val Tannen,et al.  Update Exchange with Mappings and Provenance , 2007, VLDB.

[24]  Serge Abiteboul,et al.  On the complexity of managing probabilistic XML data , 2007, PODS '07.

[25]  Jacek Sroka,et al.  A Formal Model of Dataflow Repositories , 2007, DILS.

[26]  V. S. Subrahmanian,et al.  Probabilistic interval XML , 2003, TOCL.

[27]  James Cheney,et al.  On the expressiveness of implicit provenance in query and update languages , 2008, TODS.

[28]  James Cheney,et al.  Curated databases , 2008, PODS.

[29]  Cong Yu,et al.  XQuery 1.0 and XPath 2.0 Full-Text , 2009, Encyclopedia of Database Systems.