Efficiently Processing XML Queries over Fragmented Repositories with PartiX

The data volume of XML repositories and the response time of query processing have become critical issues for many applications, especially for those in the Web. An interesting alternative to improve query processing performance consists in reducing the size of XML databases through fragmentation techniques. However, traditional fragmentation definitions do not directly apply to collections of XML documents. This work formalizes the fragmentation definition for collections of XML documents, and shows the performance of query processing over fragmented XML data. Our prototype, PartiX, exploits intra-query parallelism on top of XQuery-enabled sequential DBMS modules. We have analyzed several experimental settings, and our results showed a performance improvement of up to a 72 scale up factor against centralized databases.

[1]  Patrick Valduriez,et al.  Principles of Distributed Database Systems , 1990 .

[2]  Shamkant B. Navathe,et al.  A Mixed Fragmentation Methodology For Initial Distributed Database Design , 1995 .

[3]  Philip Wadler,et al.  An Algebra for XML Query , 2000, FSTTCS.

[4]  Laks V. S. Lakshmanan,et al.  TAX: A Tree Algebra for XML , 2001, DBPL.

[5]  Elke A. Rundensteiner,et al.  Honey, I shrunk the XQuery!: an XML algebra optimization approach , 2002, WIDM '02.

[6]  Denilson Barbosa,et al.  ToXgene: a template-based data generator for XML , 2002, SIGMOD '02.

[7]  Wolfgang Meier,et al.  eXist: An Open Source Native XML Database , 2002, Web, Web-Services, and Database Systems.

[8]  Ioana Manolescu,et al.  Dynamic XML documents with distribution and replication , 2003, SIGMOD '03.

[9]  Klaus-Dieter Schewe,et al.  Fragmentation of XML Documents , 2010, J. Inf. Data Manag..

[10]  Michael Gertz,et al.  On Distributing XML Repositories , 2003, WebDB.

[11]  M. Tamer Özsu,et al.  XBench benchmark and performance testing of XML DBMSs , 2004, Proceedings. 20th International Conference on Data Engineering.

[12]  Alfredo Cuzzocrea,et al.  XPath lookup queries in P2P networks , 2004, WIDM '04.

[13]  Marta Mattoso,et al.  Adaptive Virtual Partitioning for OLAP Query Processing in a Database Cluster , 2004, J. Inf. Data Manag..

[14]  Marta Mattoso,et al.  A Distribution Design Methodology for Object DBMS , 2004, Distributed and Parallel Databases.

[15]  Laks V. S. Lakshmanan,et al.  Tree logical classes for efficient evaluation of XQuery , 2004, SIGMOD '04.

[16]  Sihem Amer-Yahia,et al.  A Web-services architecture for efficient XML data exchange , 2004, Proceedings. 20th International Conference on Data Engineering.

[17]  Marta Mattoso,et al.  OLAP Query Processing in a Database Cluster , 2004, Euro-Par.

[18]  Marta Mattoso,et al.  PartiX : processing XQuery queries over fragmented XML repositories , 2005 .

[19]  Sebastian Maneth,et al.  Efficient Memory Representation of XML Documents , 2005, DBPL.