论文信息 - Language Models for Topic Tracking - 字舞流文

Language Models for Topic Tracking

Generative unigram language models have proven to be a simple though effective model for information retrieval tasks. In contrast to ad-hoc retrieval, topic tracking requires that matching scores are comparable across topics. Several ranking functions based on generative language models: straight likelihood, likelihood ratio, normalized likelihood ratio, and the related Kullback-Leibler divergence are evaluated in two orientations. Best performance is achieved by the models based on a normalized log-likelihood ratio. Key component of these models is the a-priori probability of a story with respect to a common reference distribution.

Wessel Kraaij | Martijn Spitters | Wessel Kraaij | M. Spitters | Martijn Spitters

[1] James Allan,et al. Relevance models for topic detection and tracking , 2002 .

[2] M. E. Maron,et al. On Relevance, Probabilistic Indexing and Information Retrieval , 1960, JACM.

[3] Mark Liberman,et al. Large, Multilingual, Broadcast News Corpora for Cooperative Research in Topic Detection and Tracking: The TDT-2 and TDT-3 Corpus Efforts , 2000, LREC.

[4] Charles L. Wayne. Multilingual Topic Detection and Tracking: Successful Research Enabled by Corpora and Evaluation , 2000, LREC.

[5] W. Bruce Croft,et al. Workshop on language modeling and information retrieval , 2001, SIGF.

[6] Avi Arampatzis,et al. The score-distributional threshold optimization for adaptive binary classification tasks , 2001, SIGIR '01.

[7] R. Manmatha,et al. Modeling score distributions for combining the outputs of search engines , 2001, SIGIR '01.

[8] Wessel Kraaij,et al. Using language models for tracking events of interest over time , 2001 .

[9] Richard M. Schwartz,et al. A hidden Markov model information retrieval system , 1999, SIGIR '99.

[10] Hinrich Schütze,et al. Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[11] Wessel Kraaij,et al. Unsupervised Event Clustering in Multilingual News Streams , 2002 .

[12] Stephen E. Robertson,et al. A probabilistic model of information retrieval: development and comparative experiments - Part 1 , 2000, Inf. Process. Manag..

[13] Ellen M. Voorhees,et al. The seventh text REtrieval conference (TREC-7) , 1999 .

[14] W. Bruce Croft,et al. Predicting query performance , 2002, SIGIR '02.

[15] Christoph Baumgarten,et al. A probabilistic model for distributed information retrieval , 1997, Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.

[16] Djoerd Hiemstra,et al. Twenty-One at TREC7: Ad-hoc and Cross-Language Track , 1998, TREC.

[17] Christoph Baumgarten,et al. A probabilistic solution to the selection and fusion problem in distributed information retrieval , 1999, SIGIR '99.

[18] Djoerd Hiemstra,et al. A Linguistically Motivated Probabilistic Model of Information Retrieval , 1998, ECDL.

[19] Jean-Pierre Chevallet,et al. About Retrieval Models and Logic , 1992, Comput. J..

[20] Djoerd Hiemstra,et al. Twenty-One at TREC-8: using Language Technology for Information Retrieval , 1999, TREC.

[21] Richard M. Schwartz,et al. Topic tracking for radio, TV broadcast, and newswire , 1999, EUROSPEECH.

[22] Norbert Fuhr,et al. Probabilistic Models in Information Retrieval , 1992, Comput. J..

[23] Kenney Ng. A Maximum Likelihood Ratio Information Retrieval Model , 1999, TREC.

[24] David D. Lewis,et al. Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[25] S. Robertson. The probability ranking principle in IR , 1997 .

[26] W. Bruce Croft,et al. Cross-lingual relevance models , 2002, SIGIR '02.

[27] Djoerd Hiemstra,et al. The Importance of Prior Probabilities for Entry Page Search , 2002, SIGIR '02.

[28] James P. Callan,et al. Experiments Using the Lemur Toolkit , 2001, TREC.

[29] Stephen E. Robertson,et al. Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..