Estimating the Selectivity of XML Path Expression with Predicates by Histograms

Selectivity estimation of path expressions in querying XML data plays an important role in query optimization. A path expression may contain multiple branches with predicates, each of which having its impact on the selectivity of the entire query. In this paper, we propose a novel method based on 2-dimensional value histograms to estimate the selectivity of path expressions embedded with predicates. The value histograms capture the correlation between the structures and the values in the XML data. We define a set of operations on the value histograms as well as on the traditional histograms that capture nodes positional distribution. We then construct a cost tree based on such operations. The selectivity of any node (or branch) in a path expression can be estimated by executing the cost tree. Compared with previous methods (which ignore value distribution) our method offers much better estimation accuracy.

[1]  Jennifer Widom,et al.  Query Optimization for XML , 1999, VLDB.

[2]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[3]  Quanzhong Li,et al.  Indexing and Querying XML Data for Regular Path Expressions , 2001, VLDB.

[4]  Michael J. Franklin,et al.  A Fast Index for Semistructured Data , 2001, VLDB.

[5]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[6]  Divesh Srivastava,et al.  Counting twig matches in a tree , 2001, Proceedings 17th International Conference on Data Engineering.

[7]  Jeffrey F. Naughton,et al.  Estimating the Selectivity of XML Path Expressions for Internet Scale Applications , 2001, VLDB.

[8]  Matthias Jarke,et al.  Advances in Database Technology — EDBT 2002 , 2002, Lecture Notes in Computer Science.

[9]  Jignesh M. Patel,et al.  Estimating Answer Sizes for XML Queries , 2002, EDBT.

[10]  Juliana Freire,et al.  StatiX: making XML count , 2002, SIGMOD '02.

[11]  Rajeev Rastogi,et al.  Efficient filtering of XML documents with XPath expressions , 2002, The VLDB Journal.

[12]  Cong Yu,et al.  TIMBER: A native XML database , 2002, The VLDB Journal.

[13]  Byron Choi,et al.  The XQuery Formal Semantics: A Foundation for Implementation and Optimization , 2002 .

[14]  Neoklis Polyzotis,et al.  Statistical synopses for graph-structured XML databases , 2002, SIGMOD '02.

[15]  Hongjun Lu,et al.  Containment join size estimation: models and methods , 2003, SIGMOD '03.