SF-Tree: An Ecient and Flexible Structure for Selectivity Estimation

Estimating the selectivity of a simple path expression (SPE ) is essential for selecting the most ecient evaluation plans for XML queries. To estimate selectivity, we need an ecient and exible structure to store a summary of the path expressions that are present in an XML document collection. In this paper we propose a new structure called SF-Tree to address the selectivity estimation problem. SF-Tree provides a exible way for the users to choose among accuracy, space requirement and selectivity retrieval speed. It makes use of signature les to store the SPEs in a tree form to increase the selectivity retrieval speed and the accuracy of the retrieved selectivity. Our analysis shows that the probability that a selectivity estimation error occurs decreases exponentially with respect to the error size.

[1]  Neoklis Polyzotis,et al.  Statistical synopses for graph-structured XML databases , 2002, SIGMOD '02.

[2]  J. Widom,et al.  Approximate DataGuides , 1998 .

[3]  Christos Faloutsos,et al.  Signature files: an access method for documents and its analytical performance evaluation , 1984, TOIS.

[4]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[5]  Divesh Srivastava,et al.  Counting twig matches in a tree , 2001, Proceedings 17th International Conference on Data Engineering.

[6]  Steven J. DeRose,et al.  XML Path Language (XPath) Version 1.0 , 1999 .

[7]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[8]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[9]  C. M. Sperberg-McQueen,et al.  Extensible markup language , 1997 .

[10]  Jeffrey Scott Vitter,et al.  XPathLearner: An On-line Self-Tuning Markov Histogram for XML Path Selectivity Estimation , 2002, VLDB.

[11]  Alin Deutsch,et al.  Workshop on Query Processing for Semistructured Data and Non-Standard Data Formats , 1999 .

[12]  Neoklis Polyzotis,et al.  Structure and Value Synopses for XML Data Graphs , 2002, VLDB.

[13]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[14]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[15]  Quanzhong Li,et al.  Indexing and Querying XML Data for Regular Path Expressions , 2001, VLDB.

[16]  Graham A. Stephen String Searching Algorithms , 1994, Lecture Notes Series on Computing.

[17]  Jeffrey F. Naughton,et al.  Estimating the Selectivity of XML Path Expressions for Internet Scale Applications , 2001, VLDB.

[18]  Ioana Manolescu,et al.  The XML benchmark project , 2001 .