Efficient creation and incremental maintenance of the HOPI index for complex XML document collections

The HOPI index, a connection index for XML documents based on the concept of a 2-hop cover, provides space- and time-efficient reachability tests along the ancestor, descendant, and link axes to support path expressions with wildcards in XML search engines. This paper presents enhanced algorithms for building HOPI, shows how to augment the index with distance information, and discusses incremental index maintenance. Our experiments show substantial improvements over the existing divide-and-conquer algorithm for index creation, low space overhead for including distance information in the index, and efficient updates.

[1]  Yehoshua Sagiv,et al.  XSEarch: A Semantic Search Engine for XML , 2003, VLDB.

[2]  Gerhard Weikum,et al.  The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking , 2002, EDBT.

[3]  Hao He,et al.  Multiresolution indexing of XML for frequent queries , 2004, Proceedings. 20th International Conference on Data Engineering.

[4]  Mong-Li Lee,et al.  A Prime Number Labeling Scheme for Dynamic Ordered XML Trees , 2004, ICDE.

[5]  Norbert Fuhr,et al.  XIRQL: a query language for information retrieval in XML documents , 2001, SIGIR '01.

[6]  Forouzan Golshani,et al.  Proceedings of the Eighth International Conference on Data Engineering , 1992 .

[7]  Laks V. S. Lakshmanan,et al.  FleXPath: flexible structure and full-text querying for XML , 2004, SIGMOD '04.

[8]  Rainer Unland,et al.  Index-supported on XML-documents containing links , 2003, 2003 46th Midwest Symposium on Circuits and Systems.

[9]  Haim Kaplan,et al.  A comparison of labeling schemes for ancestor queries , 2002, SODA '02.

[10]  Pavel Zezula,et al.  Tree Signatures for XML Querying and Navigation , 2003, Xsym.

[11]  Cong Yu,et al.  Querying structured text in an XML database , 2003, SIGMOD '03.

[12]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[13]  Gabriella Kazai,et al.  The INEX Evaluation Initiative , 2003, Intelligent Search on XML Data.

[14]  Sihem Amer-Yahia,et al.  Tree Pattern Relaxation , 2002, EDBT.

[15]  Jeffrey F. Naughton,et al.  Covering indexes for branching path queries , 2002, SIGMOD '02.

[16]  Hao He,et al.  Incremental maintenance of XML structural indexes , 2004, SIGMOD '04.

[17]  Edith Cohen,et al.  Labeling dynamic XML trees , 2002, PODS '02.

[18]  Patrick E. O'Neil,et al.  ORDPATHs: insert-friendly XML node labels , 2004, SIGMOD '04.

[19]  David J. DeWitt,et al.  X-Diff: an effective change detection algorithm for XML documents , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[20]  Kyuseok Shim,et al.  APEX: an adaptive path index for XML data , 2002, SIGMOD '02.

[21]  Andrew Lim,et al.  D(k)-index: an adaptive structural summary for graph-structured data , 2003, SIGMOD '03.

[22]  Gerhard Weikum,et al.  HOPI: An Efficient Connection Index for Complex XML Document Collections , 2004, EDBT.

[23]  Lin Guo XRANK : Ranked Keyword Search over XML Documents , 2003 .

[24]  Gerhard Weikum,et al.  The XXL search engine: ranked retrieval of XML data using indexes and ontologies , 2002, SIGMOD '02.

[25]  Arnold O. Allen,et al.  Probability, statistics and queueing theory - with computer science applications (2. ed.) , 1981, Int. CMG Conference.

[26]  Ralf Schenkel FliX: A Flexible Framework for Indexing Complex XML Document Collections , 2004, EDBT Workshops.

[27]  Edith Cohen,et al.  Reachability and distance queries via 2-hop labels , 2002, SODA '02.

[28]  Torsten. Grust,et al.  Accelerating XPath location steps , 2002, SIGMOD '02.

[29]  Sriram Padmanabhan,et al.  L-Tree: A Dynamic Labeling Structure for Ordered XML Data , 2004, EDBT Workshops.

[30]  Serge Abiteboul,et al.  Detecting changes in XML documents , 2002, Proceedings 18th International Conference on Data Engineering.

[31]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[32]  Toshiyuki Amagasa,et al.  Dynamic Range Labeling for XML Trees , 2004, EDBT Workshops.