Improving Index Structures for Structured Document Retrieval

Structured document retrieval has established itself as a new research area in the overlap between Database Systems and Information Retrieval. This work proposes a filtering technique, that can be added to already existing index structures of many structured document retrieval systems. This new technique takes the contextual structure information of query and document database into account and reduces the occurrence sets returned by the original index structure drastically. This improves the performance of query evaluation. A measure is introduced that allows to quantify the added value of the proposed index structure. Based on this measure a heuristic is presented that allows to include only valuable context information in the index structure.

[1]  Ricardo A. Baeza-Yates,et al.  Integrating contents and structure in text retrieval , 1996, SGMD.

[2]  Pekka Kilpeläinen,et al.  Tree Matching Problems with Applications to Structured Text Databases , 2022 .

[3]  Charles L. A. Clarke,et al.  Schema-Independent Retrieval from Heterogeneous Structured Text , 1994 .

[4]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[5]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[6]  Timos K. Sellis,et al.  Review - The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles , 2000, ACM SIGMOD Digital Review.

[7]  Ricardo A. Baeza-Yates,et al.  Proximal nodes: a model to query document databases by content and structure , 1997, TOIS.

[8]  Arjan Loeffen Text databases: a survey of text models and systems , 1994, SGMD.

[9]  Charles L. A. Clarke,et al.  An Algebra for Structured Text Search and a Framework for its Implementation , 1995, Comput. J..

[10]  Klaus U. Schulz,et al.  DAG Matching Techniques for Information Retrieval on Structured Documents , 1998 .

[11]  Ricardo A. Baeza-Yates,et al.  A language for queries on structure and contents of textual databases , 1995, SIGIR '95.

[12]  Jennifer Widom,et al.  Query Optimization for Semistructured Data , 1997 .

[13]  Jennifer Widom,et al.  Indexing Semistructured Data , 1998 .

[14]  Holger Meuss Indexed Tree Matching with Complete Answer Representations , 1998, PODDP.