Techniques for query expansion from top retrieved documents have been recently used by many groups at TREC, often on a purely empirical ground. In this paper we present a novel method for ranking and weighting expansion terms. The method is based on the concept of relative entropy, or Kullback-Lieber distance, developed in Information Theory, from which we derive a computationally simple and theoretically justified formula to assign scores to candidate expansion terms. This method has been incorporated into a comprehensive prototype ranking system, tested in the ad hoc track of TREC-7. The system’s overall performance was comparable to median performance of TREC-7 participants, wich is quite good considering that we are new to TREC and that we used unsophisticated indexing and weighting techniques. More focused experiments showed that the use of an information-theoretic component for query expansion significantly improved mean retrieval effectiveness over unexpanded query, yielding performance gains as high as 14% (for non interpolated average precision), while a per-query analysis suggested that queries that are neither too difficult nor too easy can be more easily improved upon.
[1]
James Allan,et al.
Automatic Query Expansion Using SMART: TREC 3
,
1994,
TREC.
[2]
Stephen E. Robertson,et al.
Okapi at TREC-6 Automatic ad hoc, VLC, routing, filtering and QSDR
,
1997,
TREC.
[3]
Thomas M. Cover,et al.
Elements of Information Theory
,
2005
.
[4]
Dania Egedi,et al.
A Freely Available Wide Coverage Morphological Analyzer for English
,
1992,
COLING.
[5]
Peter Bailey,et al.
ANU/ACSys TREC-5 Experiments
,
1996,
TREC.
[6]
Donna K. Harman,et al.
Overview of the Sixth Text REtrieval Conference (TREC-6)
,
1997,
Inf. Process. Manag..