论文信息 - TopX & XXL at INEX 2005

TopX & XXL at INEX 2005

We participated with two dierent and independent search engines in this year's INEX round: The XXL Search Engine and the TopX engine. As this is the first participation for TopX, this paper focuses on the design principles, scoring, query evaluation and results of TopX. We shortly discuss the results with XXL afterwards. 1 TopX - System overview Our query processing methods are based on precomputed index lists that are sorted in descending order of appropriately defined scores for individual tag- term content conditions, and our algorithmic rationale for top-k queries follows that of the family of threshold algorithms (TA) (2,4,5). In order to find the top-k matches for multidimensional queries (e.g., with multiple content and structure conditions), scoring, and ranking them, TopX scans all relevant index lists in an interleaved manner. In each scan step, when the engine sees the score for a data item in one list, it combines this score with scores for the same data item previously seen in other index lists into a global score using a monotonic aggre- gation function such as weighted summation. We perform in-memory structural joins for content-and-structure (CAS) queries using pre-/postorder labels be- tween whole element blocks for each query condition grouped by their document ids.

[1] Stephen E. Robertson,et al. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[2] Surya Nepal,et al. Query processing issues in image (multimedia) databases , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[3] Werner Kießling,et al. Optimizing Multi-Feature Queries for Image Databases , 2000, VLDB.

[4] Gerhard Weikum,et al. Adding Relevance to XML , 2000, WebDB.

[5] M. Naor,et al. Optimal aggregation algorithms for middleware , 2001, PODS '01.

[6] Torsten. Grust,et al. Accelerating XPath location steps , 2002, SIGMOD '02.

[7] Seung-won Hwang,et al. Minimal probing: supporting expensive predicates for top-k queries , 2002, SIGMOD '02.

[8] Gerhard Weikum,et al. XXL @ INEX 2003 , 2003 .

[9] Gerhard Weikum,et al. Top-k Query Evaluation with Probabilistic Guarantees , 2004, VLDB.

[10] Gerhard Weikum,et al. Semantic Similarity Search on Semistructured Data with the XXL Search Engine , 2005, Information Retrieval.

[11] Gerhard Weikum,et al. An Efficient and Versatile Query Engine for TopX Search , 2005, VLDB.