RRSi: indexing XML data for proximity twig queries

Twig query pattern matching is a core operation in XML query processing. Indexing XML documents for twig query processing is of fundamental importance to supporting effective information retrieval. In practice, many XML documents on the web are heterogeneous and have their own formats; documents describing relevant information can possess different structures. Therefore some “user-interesting” documents having similar but non-exact structures against a user query are often missed out. In this paper, we propose the RRSi, a novel structural index designed for structure-based query lookup on heterogeneous sources of XML documents supporting proximate query answers. The index avoids the unnecessary processing of structurally irrelevant candidates that might show good content relevance. An optimized version of the index, oRRSi, is also developed to further reduce both space requirements and computational complexity. To our knowledge, these structural indexes are the first to support proximity twig queries on XML documents. The results of our preliminary experiments show that RRSi and oRRSi based query processing significantly outperform previously proposed techniques in XML repositories with structural heterogeneity.

[1]  P. Sreenivasa Kumar,et al.  Efficient indexing and querying of XML data using modified Prüfer sequences , 2005, CIKM '05.

[2]  Torsten Schlieder,et al.  Querying and ranking XML documents , 2002, J. Assoc. Inf. Sci. Technol..

[3]  Sihem Amer-Yahia,et al.  Structure and Content Scoring for XML , 2005, VLDB.

[4]  Weiru Liu,et al.  Merging uncertain information with semantic heterogeneity in XML , 2006, Knowledge and Information Systems.

[5]  François Bry,et al.  Content and structure in indexing and ranking XML , 2004, WebDB '04.

[6]  Roy Goldman,et al.  Lore: a database management system for semistructured data , 1997, SGMD.

[7]  Neoklis Polyzotis,et al.  Approximate XML query answers , 2004, SIGMOD '04.

[8]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[9]  Quanzhong Li,et al.  Indexing and Querying XML Data for Regular Path Expressions , 2001, VLDB.

[10]  Hongjun Lu,et al.  Holistic Twig Joins on Indexed XML Documents , 2003, VLDB.

[11]  Tok Wang Ling,et al.  On boosting holism in XML twig pattern matching using structural indexing techniques , 2005, SIGMOD '05.

[12]  S. Boag,et al.  XQuery 1.0 : An XML query language, W3C Working Draft 12 November 2003 , 2003 .

[13]  Torsten Schlieder Schema-Driven Evaluation of Approximate Tree-Pattern Queries , 2002, EDBT.

[14]  Philip S. Yu,et al.  ViST: a dynamic index method for querying XML data by tree structures , 2003, SIGMOD '03.

[15]  Bongki Moon,et al.  PRIX: indexing and querying XML using prufer sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[16]  Carlo Zaniolo,et al.  Efficient Structural Joins on Indexed XML Documents , 2002, VLDB.

[17]  Andrew Lim,et al.  D(k)-index: an adaptive structural summary for graph-structured data , 2003, SIGMOD '03.

[18]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[19]  Hans-Peter Kriegel,et al.  Knowledge and Information Systems SHORT PAPER , 2006 .

[20]  Hao He,et al.  Multiresolution indexing of XML for frequent queries , 2004, Proceedings. 20th International Conference on Data Engineering.

[21]  Calisto Zuzarte,et al.  Optimizing complex queries based on similarities of subqueries , 2004, Knowledge and Information Systems.

[22]  Wesley W. Chu,et al.  Configurable indexing and ranking for XML information retrieval , 2004, SIGIR '04.

[23]  Ehud Gudes,et al.  Exploiting local similarity for indexing paths in graph-structured data , 2002, Proceedings 18th International Conference on Data Engineering.

[24]  Kyuseok Shim,et al.  APEX: an adaptive path index for XML data , 2002, SIGMOD '02.

[25]  Nicholas Kushmerick,et al.  Expressive and Efficient Ranked Querying of XML data , 2001, WebDB.