Projecting XML Documents

XQuery is not only useful to query XML in databases, but also to applications that must process XML documents as files or streams. These applications suffer from the limitations of current main-memory XQuery processors which break for rather small documents. In this paper we propose techniques, based on a notion of projection for XML, which can be used to drastically reduce memory requirements in XQuery processors. The main contribution of the paper is a static analysis technique that can identify at compile time which parts of the input document are needed to answer an arbitrary XQuery. We present a loading algorithm that takes the resulting information to build a projected document, which is smaller than the original document, and on which the query yields the same result. We implemented projection in the Galax XQuery processor. Our experiments show that projection reduces memory requirements by a factor of 20 on average, and is effective for a wide variety of queries. In addition, projection results in some speedup during query evaluation.

[1]  Michael J. Franklin,et al.  Efficient Filtering of XML Documents for Selective Dissemination of Information , 2000, VLDB.

[2]  Laks V. S. Lakshmanan,et al.  TAX: A Tree Algebra for XML , 2001, DBPL.

[3]  S. Boag,et al.  XQuery 1.0 : An XML query language, W3C Working Draft 12 November 2003 , 2003 .

[4]  Peter Fankhauser,et al.  XQuery by the Book: The IPSI XQuery Demonstrator , 2002, EDBT.

[5]  Dan Brickley,et al.  The syntactic web , 2001 .

[6]  Catriel Beeri,et al.  Querying XML Sources Using an Ontology-Based Mediator , 2002, CoopIS/DOA/ODBASE.

[7]  Jonathan The Syntactic Web Syntax and Semantics on the Web , 2001 .

[8]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[9]  John C. Mitchell,et al.  Foundations for programming languages , 1996, Foundation of computing series.

[10]  David B. Lomet,et al.  Bulletin of the Technical Committee on Data Engineering Special Issue on Data Reduction Techniques Announcements and Notices Letter from the Editor-in-chief 1 Technical Committee Election Changing Editorial Staa Letter from the Special Issue Editor the New Jersey Data Reduction Report , 2022 .

[11]  Rajeev Rastogi,et al.  Efficient filtering of XML documents with XPath expressions , 2002, The VLDB Journal.

[12]  Catriel Beeri,et al.  SAL: An Algebra for Semistructured Data and XML , 1999, WebDB.

[13]  Tim Furche,et al.  XPath: Looking Forward , 2002, EDBT Workshops.

[14]  Daniela Florescu,et al.  XL: an XML programming language for web service specification and composition , 2002, Comput. Networks.

[15]  Peter F. Patel-Schneider,et al.  The Yin/Yang web: XML syntax and RDF semantics , 2002, WWW '02.

[16]  Carlo Zaniolo,et al.  Efficient Structural Joins on Indexed XML Documents , 2002, VLDB.

[17]  Jonathan Robie,et al.  Document Object Model (DOM) Level 2 Specification , 1998 .

[18]  Jeffrey F. Naughton,et al.  Covering indexes for branching path queries , 2002, SIGMOD '02.

[19]  Kyuseok Shim,et al.  APEX: an adaptive path index for XML data , 2002, SIGMOD '02.

[20]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[21]  Juliana Freire,et al.  From XML schema to relations: a cost-based approach to XML storage , 2002, Proceedings 18th International Conference on Data Engineering.