BBN at TREC7: Using Hidden Markov Models for Information Retrieval

We present a new method for information retrieval using hidden Markov models (HMMs) and relate our experience with this system on the TREC-7 ad hoc task. We develop a general framework for incorporating multiple word generation mechanisms within the same model. We then demonstrate that an extremely simple realization of this model substantially outperforms tf :idf ranking on both the TREC-6 and TREC7 ad hoc retrieval tasks. We go on to present several algorithmic re nements, including a novel method for performing blind feedback in the HMM framework. Together, these methods form a state-of-the-art retrieval system that ranked among the best on the TREC-7 ad hoc retrieval task, and showed extraordinary performance in development experiments on TREC-6.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[3]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[4]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[5]  Richard M. Schwartz,et al.  A maximum likelihood model for topic classification of broadcast news , 1997, EUROSPEECH.

[6]  V. Rich Personal communication , 1989, Nature.

[7]  Yoram Singer,et al.  Boosting and Rocchio applied to text filtering , 1998, SIGIR '98.

[8]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[9]  J Makhoul,et al.  State of the art in continuous speech recognition. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Richard M. Schwartz,et al.  Nymble: a High-Performance Learning Name-finder , 1997, ANLP.

[11]  Yoram Singer,et al.  Context-sensitive learning methods for text categorization , 1996, SIGIR '96.

[12]  D. Metcalf On Relevance , 1999, Stem cells.

[13]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[14]  James Allan,et al.  INQUERY Does Battle With TREC-6 , 1997, TREC.

[15]  Ellen M. Voorhees,et al.  Overview of the Seventh Text REtrieval Conference , 1998 .

[16]  M. E. Maron,et al.  On Relevance, Probabilistic Indexing and Information Retrieval , 1960, JACM.

[17]  W. Bruce Croft,et al.  Improving the effectiveness of information retrieval with local context analysis , 2000, TOIS.