The aims of this paper are twofold. Our first aim
is to compare results of the earlier Terabyte tracks
to the Million Query track. We submitted a number
of runs using different document representations
(such as full-text, title-fields, or incoming
anchor-texts) to increase pool diversity. The initial
results show broad agreement in system rankings
over various measures on topic sets judged at both
Terabyte and Million Query tracks, with runs using
the full-text index giving superior results on
all measures, but also some noteworthy upsets.
Our second aim is to explore the use of parsimonious
language models for retrieval on terabyte-scale
collections. These models are smaller thus
more efficient than the standard language models
when used at indexing time, and they may also improve
retrieval performance. We have conducted
initial experiments using parsimonious models in
combination with pseudo-relevance feedback, for
both the Terabyte and Million Query track topic
sets, and obtained promising initial results.
[1]
M. de Rijke,et al.
Approaches to Robust and Web Retrieval
,
2003,
TREC.
[2]
Gilad Mishne,et al.
Language Models for Searching in Web Corpora
,
2004,
TREC.
[3]
Kenney Ng.
A Maximum Likelihood Ratio Information Retrieval Model
,
1999,
TREC.
[4]
Xie Kanglin.
Lucene Search Engine
,
2007
.
[5]
Carl Eklund,et al.
National Institute for Standards and Technology
,
2009,
Encyclopedia of Biometrics.
[6]
Djoerd Hiemstra,et al.
Language Modelling and Relevance
,
2003
.
[7]
Jaap Kamps.
Effective Smoothing for a Terabyte of Text
,
2005,
TREC.
[8]
Djoerd Hiemstra,et al.
Parsimonious language models for information retrieval
,
2004,
SIGIR '04.
[9]
Jaap Kamps.
Experiments with Document and Query Representations for a Terabyte of Text
,
2006,
TREC.