Query Processing : Fragmentation , Localization and Pruning

Distributing data collections by fragmenting them is an effective way of improving the scalability of a database system. While the distribution of relational data is wel l understood, the unique characteristics of XML data and its query model present challenges that require different distribution techniques. In this paper, we show how XML data can be fragmented horizontally and vertically. Based on thi s, we propose solutions to two of the problems encountered in distributed query processing and optimization on XML data, namely localization and pruning. Localization takes a fragmentation-unaware query plan and converts it to a distributed query plan that can be executed at the sites that hol d XML data fragments in a distributed system. We then show how the resulting distributed query plan can be pruned so that only those sites are accessed that can contribute to the query result. We demonstrate that our techniques can be integrated into a real-life XML database system and that they significantly improve the performance of distributed query execution.

[1]  Klaus-Dieter Schewe,et al.  Fragmentation of XML Documents , 2010, J. Inf. Data Manag..

[2]  Massimo Franceschet XpathMark: an Xpath benchmark for XMark , 2005 .

[3]  M. Tamer Özsu,et al.  Generating Efficient Execution Plans for Vertically Partitioned XML Databases , 2010, Proc. VLDB Endow..

[4]  Guido Moerkotte,et al.  Cost-sensitive reordering of navigational primitives , 2005, SIGMOD '05.

[5]  Ioana Manolescu,et al.  Active XML: Peer-to-Peer Data and Web Services Integration , 2002, VLDB.

[6]  Amélie Marian,et al.  Projecting XML Documents , 2003, VLDB.

[7]  Dan Suciu,et al.  Containment and equivalence for a fragment of XPath , 2004, JACM.

[8]  Ioana Manolescu,et al.  Dynamic XML documents with distribution and replication , 2003, SIGMOD '03.

[9]  Theo Härder,et al.  DeweyIDs - The Key to Fine-Grained Management of XML Documents , 2010, SBBD.

[10]  Alon Y. Halevy,et al.  An XML query engine for network-bound data , 2002, The VLDB Journal.

[11]  Yuan Ni,et al.  Content-based Dissemination of Fragmented XML Data , 2006, 26th IEEE International Conference on Distributed Computing Systems (ICDCS'06).

[12]  Michael Gertz,et al.  On Distributing XML Repositories , 2003, WebDB.

[13]  Harold Boley,et al.  A bottom-up algorithm for query decomposition , 2008 .

[14]  Peter Murray-Rust,et al.  Chemical markup language , 1997 .

[15]  Ioana Manolescu,et al.  Lazy query evaluation for Active XML , 2004, SIGMOD '04.

[16]  Alin Deutsch,et al.  MARS: A System for Publishing XML from Mixed and Redundant Storage , 2003, VLDB.

[17]  Evaggelia Pitoura,et al.  Distributed Structural Relaxation of XPath Queries , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[18]  Wenfei Fan,et al.  Distributed query evaluation with performance guarantees , 2007, SIGMOD '07.

[19]  Trevor Jim,et al.  Highly distributed XQuery with DXQ , 2007, SIGMOD '07.

[20]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[21]  Ying Zhang,et al.  XRPC: distributed XQuery and update processing with heterogeneous XQuery engines , 2008, SIGMOD Conference.

[22]  Marta Mattoso,et al.  Efficiently Processing XML Queries over Fragmented Repositories with PartiX , 2006, EDBT Workshops.

[23]  Marta Mattoso,et al.  PartiX : processing XQuery queries over fragmented XML repositories , 2005 .

[24]  M. Tamer Özsu,et al.  A succinct physical storage scheme for efficient evaluation of path queries in XML , 2004, Proceedings. 20th International Conference on Data Engineering.

[25]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[26]  Hiroyuki Kitagawa,et al.  Processing XPath Queries in PC-Clusters Using XML Data Partitioning , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[27]  Leonidas Fegaras,et al.  XFrag: A Query Processing Framework for Fragmented XML Data , 2005, WebDB.

[28]  E. Pitoura,et al.  Structural Relaxation of XPath Queries , 2009 .

[29]  G. Gottlob,et al.  Distributed XML design , 2009, J. Comput. Syst. Sci..

[30]  Klaus-Dieter Schewe,et al.  Heuristic Horizontal XML Fragmentation , 2005, CAiSE Short Paper Proceedings.

[31]  David J. DeWitt,et al.  Relational Databases for Querying XML Documents: Limitations and Opportunities , 1999, VLDB.

[32]  Ying Zhang,et al.  XRPC: Interoperable and Efficient Distributed XQuery , 2007, VLDB.

[33]  Wenfei Fan,et al.  Using partial evaluation in distributed query evaluation , 2006, VLDB.

[34]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[35]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[36]  Serge Abiteboul,et al.  The Active XML project: an overview , 2008, The VLDB Journal.

[37]  Sven Helmer,et al.  Full-fledged algebraic XPath processing in Natix , 2005, 21st International Conference on Data Engineering (ICDE'05).

[38]  Keishi Tajima,et al.  Answering XPath Queries over Networks by Sending Minimal Views , 2004, VLDB.

[39]  Volker Linnemann,et al.  On the intersection of XPath expressions , 2005, 9th International Database Engineering & Application Symposium (IDEAS'05).

[40]  Ying Zhang,et al.  Efficient Distribution of Full-Fledged XQuery , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[41]  Dan Suciu,et al.  Distributed query evaluation on semistructured data , 2002, TODS.

[42]  Kevin P. Hinshaw,et al.  Distributed XQuery , 2004 .

[43]  Melvil Dewey,et al.  A Classification and Subject Index for Cataloguing and Arranging the Books and Pamphlets of a Library , 2006 .

[44]  Patrick Valduriez,et al.  Principles of Distributed Database Systems , 1990 .