On the Integration of Structure Indexes and Inverted Lists

Recently, there has been a great deal of interest in the development of techniques to evaluate path expressions over collections of XML documents. In general, these path expressions contain both structural and keyword components. Several methods have been proposed for processing path expressions over graph/tree-structured XML data. These methods can be classified into two broad classes. The first involves graph traversal where the input query is evaluated by traversing the data graph or some compressed representation. The other class involves information-retrieval style processing using inverted lists. In this framework, structure indexes have been proposed to be used as a substitute for graph traversal. Here, we focus on a subclass of CAS queries consisting of simple path expressions. We study algorithmic issues in integrating structure indexes with inverted lists for the evaluation of these queries, where we rank all documents that match the query and return the top k documents in order of relevance.