UMass at TREC 2004: Notebook

1 Terabyte 1.1 Model The retrieval model implemented in the Indri search engine is an enhanced version of the model described in [30], which combines the language modeling [35] and inference network [38] approaches to information retrieval. The resulting model allows structured queries similar to those used in INQUERY [4] to be evaluated using language modeling estimates within the network, rather than tf.idf estimates. Figure 1.1 shows a graphical model representation of the network. As in the original inference network framework, documents are ranked according to P (I|D,α, β), the belief the information need I is met given document D and hyperparameters α and β as evidence. Due to space limitations, a general understanding of the inference network framework is assumed. See [30] and [38] to fill in any missing details.

[1]  Howard R. Turtle,et al.  Query Evaluation: Strategies and Optimizations , 1995, Inf. Process. Manag..

[2]  Charles L. A. Clarke,et al.  Shortest Substring Ranking (MultiText Experiments for TREC-4) , 1995, TREC.

[3]  Rong Yan,et al.  On predicting rare classes with SVM ensembles in scene classification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[4]  W. Bruce Croft,et al.  Time-based language models , 2003, CIKM '03.

[5]  Ramesh Nallapati,et al.  Discriminative models for information retrieval , 2004, SIGIR '04.

[6]  Jussi Karlgren,et al.  Recognizing Text Genres With Simple Metrics Using Discriminant Analysis , 1994, COLING.

[7]  Edward A. Fox,et al.  Digital libraries , 1995, CACM.

[8]  David Hawking,et al.  Overview of the TREC-2001 Web track , 2002 .

[9]  Justin Zobel,et al.  Passage retrieval revisited , 1997, SIGIR '97.

[10]  Thorsten Joachims,et al.  A statistical learning learning model of text classification for support vector machines , 2001, SIGIR '01.

[11]  ChengXiang Zhai,et al.  The lemur toolkit for lan-guage modeling and information retrieval , 2003 .

[12]  Thorsten Joachims,et al.  A Statistical Learning Model of Text Classification for Support Vector Machines. , 2001, SIGIR 2002.

[13]  James P. Callan,et al.  Experiments Using the Lemur Toolkit , 2001, TREC.

[14]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[15]  Ian Ruthven,et al.  Re-examining the potential effectiveness of interactive query expansion , 2003, SIGIR.

[16]  Carol Van Ess-Dykema,et al.  The Form is the Substance: Classification of Genres in Text , 2001, HTLKM@ACL.

[17]  Xiaoyan Li,et al.  An Answer Updating Approach to Novelty Detection , 2004 .

[18]  Alistair Moffat,et al.  Effective document presentation with a locality-based similarity heuristic , 1999, SIGIR '99.

[19]  Fernando Diaz,et al.  Using temporal profiles of queries for precision prediction , 2004, SIGIR '04.

[20]  James P. Callan,et al.  Combining document representations for known-item search , 2003, SIGIR.

[21]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[22]  Robert Krovetz,et al.  Viewing morphology as an inference process , 1993, Artif. Intell..

[23]  Paul Ogilvie,et al.  Acrophile: an automated acronym extractor and server , 2000, DL '00.

[24]  James P. Callan,et al.  Passage-level evidence in document retrieval , 1994, SIGIR '94.

[25]  Efstathios Stamatatos,et al.  Text Genre Detection Using Common Word Frequencies , 2000, COLING.

[27]  Richard M. Schwartz,et al.  An Algorithm that Learns What's in a Name , 1999, Machine Learning.

[28]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[29]  W. Bruce Croft,et al.  Combining the language model and inference network approaches to retrieval , 2004, Inf. Process. Manag..

[30]  Aidan Finn,et al.  Learning to classify documents according to genre , 2006, J. Assoc. Inf. Sci. Technol..

[31]  W. Bruce Croft,et al.  Evaluation of an inference network-based retrieval model , 1991, TOIS.

[32]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[33]  James Allan,et al.  Text classification and named entities for new event detection , 2004, SIGIR '04.

[34]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[35]  W. Bruce Croft,et al.  The INQUERY Retrieval System , 1992, DEXA.

[36]  In-Ho Kang,et al.  Integration of multiple evidences based on a query type for web search , 2004, Inf. Process. Manag..

[37]  Hinrich Schütze,et al.  Automatic Detection of Text Genre , 1997, ACL.

[38]  ZobelJustin,et al.  Efficient single-pass index construction for text databases , 2003 .

[39]  W. Bruce Croft,et al.  Formal multiple-bernoulli models for language modeling , 2004, SIGIR '04.

[40]  David Hawking,et al.  Overview of the TREC-9 Web Track , 2000, TREC.

[41]  Justin Zobel,et al.  Efficient single-pass index construction for text databases , 2003, J. Assoc. Inf. Sci. Technol..