论文信息 - Topic Detection and Tracking: Event Clustering as a Basis for First Story Detection

Topic Detection and Tracking: Event Clustering as a Basis for First Story Detection

Topic Detection and Tracking (TDT) is a new research area that investigates the organization of information by event rather than by subject. In this paper, we provide an overview of the TDT research program from its inception to the third phrase that is now underway. We also discuss our approach to two of the TDT problems in detail. For event clustering (Detection), we show that classic Information Retrieval clustering techniques can be modified slightly to provide effective solutions. For first story detection, we show that similar methods provide satisfactory results, although substantial work remains. In both cases, we explore solutions that model the temporal relationship between news stories. We also investigate the use of phrase extraction to capture the who, what, when, and where contained in news.

James Allan | Ron Papka

[1] Ellen Riloff,et al. Information extraction as a basis for high-precision text classification , 1994, TOIS.

[2] W. Bruce Croft,et al. TREC and Tipster Experiments with Inquery , 1995, Inf. Process. Manag..

[3] W. Bruce Croft,et al. Text Segmentation by Topic , 1997, ECDL.

[4] Richard M. Schwartz,et al. Topic detection in broadcast news , 1999, EUROSPEECH.

[5] Alvin F. Martin,et al. The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[6] James Allan,et al. Document classification using multiword features , 1998, CIKM '98.

[7] Ron Kohavi,et al. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[8] Eugene Charniak,et al. Tree-Bank Grammars , 1996, AAAI/IAAI, Vol. 2.

[9] Evelyne Tzoukermann,et al. Effective use of natural language processing techniques for automatic conflation of multi-word terms: the role of derivational morphology, part of speech tagging, and shallow parsing , 1997, SIGIR '97.

[10] William A. Gale,et al. A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[11] Ellen M. Voorhees,et al. The seventh text REtrieval conference (TREC-7) , 1999 .