论文信息 - Explorations within topic tracking and detection

Explorations within topic tracking and detection

This chapter presents the system used by the Center for Intelligent Information Retrieval (CIIR) at the University of Massachusetts for its participation in four of the five TDT tasks: tracking, detection, first story detection, and story link detection. For each task, we discuss the parameter setting approach that we used and the results of our system on the test data.For the task of link detection, we look more carefully at score normalization across different languages and media types. We find that we can improve results noticeably though not substantially by normalizing scores differently depending upon the source language. We also consider smoothing the vocabulary in stories using a "query expansion" technique from Information Retrieval to add additional words from the corpus to each story. This results in substantial improvements.In addition, we use TDT evaluation approaches to show that the tracking performance that sites are achieving is what is expected from Information Retrieval technology. We further show that any first story detection system based on a tracking approach is unlikely to be sufficiently accurate for most purposes. Finally, we present an overview of an automatic timeline generation system that we developed using TDT data.

[1] Jinxi Xu,et al. The Design and Implementation of a Part of Speech Tagger for English , 1994 .

[2] Richard D. Deveaux,et al. Applied Smoothing Techniques for Data Analysis , 1999, Technometrics.

[3] Ian H. Witten,et al. The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[4] Richard M. Schwartz,et al. Nymble: a High-Performance Learning Name-finder , 1997, ANLP.

[5] Stephen E. Robertson,et al. Okapi at TREC-3 , 1994, TREC.

[6] James Allan,et al. Extracting significant time varying features from text , 1999, CIKM '99.

[7] James Allan,et al. Automatic generation of overview timelines , 2000, SIGIR '00.

[8] James Allan,et al. First story detection in TDT is hard , 2000, CIKM '00.