This paper describes the ISpace retrieval system’s involvement in TREC8. The main goal for this year’s work was to speed up document indexing and query processing compared to previous years. This goal was achieved, but retrieval performance was not as good as for TREC7. System details for the AdHoc task, small Web task, and large Web (VLC) task are presented. The AdHoc task emphasized query expansion, while the large Web track emphasized rapid indexing and retrieval. The paper describes an implementation of a multidimensional tree structure for retrieval from information space based on the kd-tree. The larger setting for ISpace, the TeraScale Retrieval project, is summarized. A concluding section describes plans for ISpace. * School of Information and Library Science, Campus box 3360 Manning Hall, Chapel Hill, NC, 27599-3360. Email: gbnewby@ils.unc.edu Introduction Efforts for the 8 Text REtrieval Conference (TREC) included the following: 1. AdHoc task, fully automatic. 2. Small Web track 3. Large Web track (VLC) Throughout the work described here, the central question of interest is: How might information space techniques achieve high performance? The issue of performance is ambiguous, but was defined as emphasizing the following, in decreasing order of importance: a. Performance means being able to quickly produce a ranked response set for a query topic b. Performance means being able to handle the full variety of queries and documents – i.e., without limitations on the number of unique terms or number of documents c. Performance means the response set has a large proportion of relevant documents In this hierarchy, the goal of high relevance is uncharacteristically last, but not forgotten. Because post-hoc analysis of last year’s non-judged TREC submissions (Newby, 1999) indicated reasonable recall-precision performance with exact precision of 0.14, the emphasis was on developing a more practical and usable system. While 0.14 is unremarkable compared to other groups’ TREC submissions, it represented an order of magnitude improvement from prior years (Newby, 1998). A description of the information space technique, system design considerations for each phase of the work and outcomes follow. A concluding section summarizes this year’s TREC activities and lays out plans for the near future.
[1]
Gregory B. Newby,et al.
Information Space Gets Normal
,
1998,
TREC.
[2]
Ricardo Baeza-Yates,et al.
Information Retrieval: Data Structures and Algorithms
,
1992
.
[3]
Ian H. Witten,et al.
Managing gigabytes
,
1994
.
[4]
M. V. Wilkes,et al.
The Art of Computer Programming, Volume 3, Sorting and Searching
,
1974
.
[5]
Ken Kennedy,et al.
Information Technology Research Investing in Our Future
,
1999
.
[6]
Mark Allen Weiss,et al.
Data structures and algorithm analysis in C
,
1991
.
[7]
Gregory B. Newby.
Context-Based Statistical Sub-Spaces
,
1997,
TREC.
[8]
Venkata Subramaniam,et al.
Information Retrieval: Data Structures & Algorithms
,
1992
.
[9]
Richard A. Harshman,et al.
Indexing by Latent Semantic Analysis
,
1990,
J. Am. Soc. Inf. Sci..
[10]
Jon Louis Bentley,et al.
An Algorithm for Finding Best Matches in Logarithmic Expected Time
,
1977,
TOMS.