A Methodology for Query Processing over Distributed XML Databases

The constant increase in the volume of data stored as native XML documents makes fragmentation techniques an important alternative to the performance issues in query processing over these data. Fragmented databases are feasible only if there is a transparent way to query the distributed database, without the need of knowing the fragmentation details and where each fragment is located. This paper presents our methodology for XQuery query processing over distributed XML databases, which consists on the steps of query decomposition, including the query’s TLC algebra representation; data localization; global optimization; global query execution and final result assembly. This methodology can be used in an XML database that allows fragmentation and also in a system that publishes an integrated view of semi-autonomous and homogeneous XML databases. We propose an architecture based on a Mediator with Adaptors (wrappers) attached to remote databases. The Mediator publishes a global XML view of the distributed data, which can be queried by users in a transparent way. A Mediator and two Adapters prototypes have been implemented and experiments were executed, where we could analyze the performance improvements and impacts of different queries over distributed XML databases.

[1]  Kishik Park,et al.  A design and implementation of XML-based Mediation Framework (XMF) for integration of Internet information resources , 2002, Proceedings of the 35th Annual Hawaii International Conference on System Sciences.

[2]  Gio Wiederhold,et al.  Mediators in the architecture of future information systems , 1992, Computer.

[3]  Marta Mattoso,et al.  Efficiently Processing XML Queries over Fragmented Repositories with PartiX , 2006, EDBT Workshops.

[4]  H. Schoning Tamino - a DBMS designed for XML , 2001, Proceedings 17th International Conference on Data Engineering.

[5]  Wolfgang Meier,et al.  eXist: An Open Source Native XML Database , 2002, Web, Web-Services, and Database Systems.

[6]  Michael Gertz,et al.  Distributed XML Repositories: Top-down Design and Transparent Query Processing , 2003 .

[7]  Flavius Frasincar,et al.  XAL: An Algebra For XML Query Optimization , 2002, Australasian Database Conference.

[8]  Peter Buneman,et al.  Semistructured data , 1997, PODS.

[9]  Cong Yu,et al.  TIMBER: A native XML database , 2002, The VLDB Journal.

[10]  F. Baiao,et al.  Horizontal fragmentation in object DBMS: new issues and performance evaluation , 2000, Conference Proceedings of the 2000 IEEE International Performance, Computing, and Communications Conference (Cat. No.00CH37086).

[11]  Laks V. S. Lakshmanan,et al.  Tree logical classes for efficient evaluation of XQuery , 2004, SIGMOD '04.

[12]  Laks V. S. Lakshmanan,et al.  TAX: A Tree Algebra for XML , 2001, DBPL.

[13]  Alon Y. Halevy,et al.  An XML query engine for network-bound data , 2002, The VLDB Journal.

[14]  Marta Mattoso,et al.  PartiX : processing XQuery queries over fragmented XML repositories , 2005 .

[15]  Klaus-Dieter Schewe,et al.  Fragmentation of XML Documents , 2010, J. Inf. Data Manag..

[16]  Philip Wadler,et al.  An Algebra for XML Query , 2000, FSTTCS.

[17]  M. Tamer Özsu,et al.  XBench - A Family of Benchmarks for XML DBMSs , 2002, EEXTT.

[18]  Dan Suciu,et al.  Data on the Web: From Relations to Semistructured Data and XML , 1999 .

[19]  Georges Gardarin,et al.  Integrating heterogeneous data sources with XML and XQuery , 2002, Proceedings. 13th International Workshop on Database and Expert Systems Applications.

[20]  Dan Suciu,et al.  Distributed query evaluation on semistructured data , 2002, TODS.

[21]  Chaitanya K. Baru,et al.  XML-based information mediation with MIX , 1999, SIGMOD '99.

[22]  Sven Helmer,et al.  Anatomy of a native XML base management system , 2002, The VLDB Journal.

[23]  Donald Kossmann,et al.  The state of the art in distributed query processing , 2000, CSUR.

[24]  Marta Mattoso,et al.  A Distribution Design Methodology for Object DBMS , 2004, Distributed and Parallel Databases.

[25]  C. M. Sperberg-McQueen,et al.  eXtensible Markup Language (XML) 1.0 (Second Edition) , 2000 .

[26]  Elke A. Rundensteiner,et al.  Honey, I shrunk the XQuery!: an XML algebra optimization approach , 2002, WIDM '02.

[27]  Tova Milo,et al.  Views in a large-scale XML repository , 2002, The VLDB Journal.

[28]  Michael Gertz,et al.  On Distributing XML Repositories , 2003, WebDB.

[29]  Serge Abiteboul,et al.  On views and XML , 1999, PODS '99.

[30]  Jayant R. Haritsa,et al.  Distributed Query Processing on the Web , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[31]  Kevin P. Hinshaw,et al.  Distributed XQuery , 2004 .

[32]  Patrick Valduriez,et al.  Principles of Distributed Database Systems , 1990 .