APEX: an adaptive path index for XML data

The emergence of the Web has increased interests in XML data. XML query languages such as XQuery and XPath use label paths to traverse the irregularly structured data. Without a structural summary and efficient indexes, query processing can be quite inefficient due to an exhaustive traversal on XML data. To overcome the inefficiency, several path indexes have been proposed in the research community. Traditional indexes generally record all label paths from the root element in XML data. Such path indexes may result in performance degradation due to large sizes and exhaustive navigations for partial matching path queries start with the self-or-descendent axis("//").In this paper, we propose APEX, an adaptive path index for XML data. APEX does not keep all paths starting from the root and utilizes frequently used paths to improve the query performance. APEX also has a nice property that it can be updated incrementally according to the changes of query workloads. Experimental results with synthetic and real-life data sets clearly confirm that APEX improves query processing cost typically 2 to 54 times better than the existing indexes, with the performance gap increasing with the irregularity of XML data.

[1]  Dan Suciu,et al.  Optimizing regular path expressions using graph schemas , 1998, Proceedings 14th International Conference on Data Engineering.

[2]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[3]  Steven J. DeRose,et al.  XML Path Language (XPath) Version 1.0 , 1999 .

[4]  Jennifer Widom,et al.  Object exchange across heterogeneous information sources , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[5]  Quanzhong Li,et al.  Indexing and Querying XML Data for Regular Path Expressions , 2001, VLDB.

[6]  Jennifer Widom,et al.  Compile-Time Path Expansion in Lore , 1998 .

[7]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[8]  Dan Suciu,et al.  Adding Structure to Unstructured Data , 1997, ICDT.

[9]  Guido Moerkotte,et al.  Access Support Relations: An Indexing Method for Object Bases , 1992, Inf. Syst..

[10]  Dan Suciu,et al.  A query language and optimization techniques for unstructured data , 1996, SIGMOD '96.

[11]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[12]  David Maier,et al.  Review of "Introduction to automata theory, languages and computation" by John E. Hopcroft and Jeffrey D. Ullman. Addison-Wesley 1979. , 1980, SIGA.

[13]  Dan Suciu,et al.  A query language for a Web-site management system , 1997, SGMD.

[14]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[15]  Jeffrey D. Ullman,et al.  Representative objects: concise representations of semistructured, hierarchical data , 1997, Proceedings 13th International Conference on Data Engineering.

[16]  Michael J. Franklin,et al.  A Fast Index for Semistructured Data , 2001, VLDB.

[17]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[18]  HanJiawei,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998 .

[19]  Jennifer Widom,et al.  The Lorel query language for semistructured data , 1997, International Journal on Digital Libraries.

[20]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[21]  Serge Abiteboul,et al.  Querying Semi-Structured Data , 1997, Encyclopedia of Database Systems.

[22]  Kyuseok Shim,et al.  SPIRIT: Sequential Pattern Mining with Regular Expression Constraints , 1999, VLDB.