XML query evaluation

XML is now widely used and management of XML data has become important. To this end, there has been work on the native management of XML data in a database to utilize the different capabilities of such a system like transaction management and indexing structures. At the heart of such a native XML database is the query evaluator, which provides access methods specifically tailored for XML data manipulation. The design of efficient access methods is the topic of this thesis. The most frequently used operation in an XML database is called structural join. Almost all XML queries contain at least one structural join. The structural join returns matches to a pattern from an XML document. We introduce a new efficient family of algorithms to address this task. These algorithms use a stack data structure that exploits the hierarchy of XML in favor of performance. We then develop variants that permit the combination of other operators, including projection, set difference, and universal quantification, with the structural join operation for greater efficiency. An important value provided by XML is the seamless representation of text and structured data. Querying the text with regard to the structure yields fast and accurate results. However, standard database query paradigms are not suitable for querying text. We introduce the TIX algebra for this purpose, and develop new access methods capable of efficiently computing and combining scores associated with intermediate results. In such applications, one is typically interested in only a few results with the highest scores. We develop new access methods to find results that score within a margin of error from the actual top results. These new access methods out-perform getting actual top results by at least an order of magnitude.

[1]  Tova Milo,et al.  Algebras for querying text regions (extended abstract) , 1995, PODS.

[2]  Ming-Ling Lo,et al.  Spatial hash-joins , 1996, SIGMOD '96.

[3]  Beng Chin Ooi,et al.  On getting some answers quickly, and perhaps more later , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[4]  Z. Galil,et al.  Pattern matching algorithms , 1997 .

[5]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[6]  Stanley B. Zdonik,et al.  The AQUA approach to querying lists and trees in object-oriented databases , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[7]  Gerhard Weikum,et al.  The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking , 2002, EDBT.

[8]  David J. DeWitt,et al.  Partition based spatial-merge join , 1996, SIGMOD '96.

[9]  Nick Koudas,et al.  Size separation spatial join , 1997, SIGMOD '97.

[10]  William W. Cohen Integration of heterogeneous databases without common domains using queries based on textual similarity , 1998, SIGMOD '98.

[11]  Dan Suciu,et al.  SilkRoute: trading between relations and XML , 2000, Comput. Networks.

[12]  Sihem Amer-Yahia,et al.  Adaptive processing of top-k queries in XML , 2005, 21st International Conference on Data Engineering (ICDE'05).

[13]  David J. DeWitt,et al.  Following the paths of XML Data: An algebraic framework for XML query evaluation , 2001 .

[14]  Michael J. Carey,et al.  Reducing the Braking Distance of an SQL Query Engine , 1998, VLDB.

[15]  Ping-Yu Hsu,et al.  Improving SQL with generalized quantifiers , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[16]  Luis Gravano,et al.  Top-k selection queries over relational databases: Mapping strategies and performance evaluation , 2002, TODS.

[17]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[18]  Peter J. Haas,et al.  Ripple joins for online aggregation , 1999, SIGMOD '99.

[19]  Norbert Fuhr,et al.  A probabilistic relational algebra for the integration of information retrieval and database systems , 1997, TOIS.

[20]  Bertram Ludäscher,et al.  Navigation-Driven Evaluation of Virtual Mediated Views , 2000, EDBT.

[21]  Jennifer Widom,et al.  Query Optimization for XML , 1999, VLDB.

[22]  Tova Milo,et al.  Optimizing queries on files , 1994, SIGMOD '94.

[23]  Seung-won Hwang,et al.  Minimal probing: supporting expensive predicates for top-k queries , 2002, SIGMOD '02.

[24]  Vassilis Christophides,et al.  On wrapping query languages and efficient XML integration , 2000, SIGMOD '00.

[25]  Kyu-Young Whang,et al.  Supporting universal quantification in a two-dimensional database query language , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[26]  Alin Deutsch,et al.  XML-QL: A Query Language for XML , 1998 .

[27]  Marc Gyssens,et al.  A grammar-based approach towards unifying hierarchical data models , 1989, SIGMOD '89.

[28]  Daniela Florescu,et al.  Quilt: An XML Query Language for Heterogeneous Data Sources , 2000, WebDB.

[29]  Michael J. Carey,et al.  On saying “Enough already!” in SQL , 1997, SIGMOD '97.

[30]  Walid G. Aref,et al.  Rank-aware query optimization , 2004, SIGMOD '04.

[31]  Daniela Florescu,et al.  Storing and Querying XML Data using an RDMBS , 1999, IEEE Data Eng. Bull..

[32]  Luis Gravano,et al.  Evaluating Top-k Selection Queries , 1999, VLDB.

[33]  David J. DeWitt,et al.  Shoring up persistent applications , 1994, SIGMOD '94.

[34]  Michael J. Carey,et al.  A performance evaluation of pointer-based joins , 1990, SIGMOD '90.

[35]  Chad Carson,et al.  Optimizing queries over multimedia repositories , 1996, SIGMOD '96.

[36]  Roy Goldman,et al.  Lore: a database management system for semistructured data , 1997, SGMD.

[37]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[38]  Laks V. S. Lakshmanan,et al.  Querying network directories , 1999, SIGMOD '99.

[39]  Christoph M. Hoffmann,et al.  Pattern Matching in Trees , 1982, JACM.

[40]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[41]  C. M. Sperberg-McQueen,et al.  eXtensible Markup Language (XML) 1.0 (Second Edition) , 2000 .

[42]  Özgür Ulusoy,et al.  Sideway Value Algebra for Object-Relational Databases , 2002, VLDB.

[43]  David J. DeWitt,et al.  An Evaluation of Non-Equijoin Algorithms , 1991, VLDB.

[44]  John V. Carlis HAS, a relational algebra operator or divide is not enough to conquer , 1986, 1986 IEEE Second International Conference on Data Engineering.

[45]  Balachander Krishnamurthy,et al.  Focusing search in hierarchical structures with directory sets , 1998, CIKM '98.

[46]  Guido Moerkotte,et al.  Evaluating Queries on Structure with eXtended Access Support Relations , 2000, WebDB.

[47]  Carlo Zaniolo,et al.  Efficient Structural Joins on Indexed XML Documents , 2002, VLDB.

[48]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[49]  Hamid Pirahesh,et al.  Efficiently publishing relational data as XML documents , 2001, The VLDB Journal.

[50]  David Beech,et al.  A Formal Data Model and Algebra for XML , 1999 .

[51]  Norbert Fuhr,et al.  XIRQL: a query language for information retrieval in XML documents , 2001, SIGIR '01.

[52]  Torsten Schlieder,et al.  Result Ranking for Structured Queries against XML Documents , 2000, DELOS.

[53]  Laks V. S. Lakshmanan,et al.  TAX: A Tree Algebra for XML , 2001, DBPL.

[54]  John Miles Smith,et al.  Optimizing the performance of a relational algebra database interface , 1975, CACM.

[55]  Surya Nepal,et al.  Query processing issues in image (multimedia) databases , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[56]  H. V. Jagadish,et al.  ProTDB: Probabilistic Data in XML , 2002, VLDB.

[57]  Menzo Windhouwer,et al.  Querying XML documents made easy: nearest concept queries , 2001, Proceedings 17th International Conference on Data Engineering.

[58]  Jignesh M. Patel,et al.  Estimating Answer Sizes for XML Queries , 2002, EDBT.

[59]  David J. DeWitt,et al.  Relational Databases for Querying XML Documents: Limitations and Opportunities , 1999, VLDB.

[60]  Goetz Graefe,et al.  Fast algorithms for universal quantification in large databases , 1995, TODS.

[61]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[62]  Raghu Ramakrishnan,et al.  Probabilistic Optimization of Top N Queries , 1999, VLDB.

[63]  François Bry,et al.  Logical Rewritings for Improving the Evaluation of Quantified Queries , 1989, MFDBS.

[64]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[65]  Luis Gravano,et al.  Evaluating top-k queries over Web-accessible databases , 2002, Proceedings 18th International Conference on Data Engineering.

[66]  Michael J. Carey,et al.  XPERANTO: Middleware for Publishing Object-Relational Data as XML Documents , 2000, VLDB.

[67]  Rakesh Agrawal,et al.  A framework for expressing and combining preferences , 2000, SIGMOD '00.

[68]  Chung-Min Chen,et al.  A Sampling-Based Estimator for Top-k Query. , 2002, ICDE 2002.

[69]  Catriel Beeri,et al.  SAL: An Algebra for Semistructured Data and XML , 1999, WebDB.

[70]  Guido Moerkotte,et al.  Optimizing Queries with Universal Quantification in Object-Oriented and Object-Relational Databases , 1997, VLDB.

[71]  Vagelis Hristidis,et al.  PREFER: a system for the efficient execution of multi-parametric ranked queries , 2001, SIGMOD '01.