On-Line New Event Detection and Tracking

We define and describe the related problems of new event detection and event tracking within a stream of broadcast news stories. We focus on a strict on-line setting-i.e., the system must make decisions about one story before looking at any subsequent stories. Our approach to detection uses a single pass clustering algorithm and a novel thresholding model that incorporates the properties of events as a major component. Our approach to tracking is similar to typical information filtering methods. We discuss the value of “surprising” features that have unusual occurrence characteristics, and briefly explore on-line adaptive filtering to handle evolving events in the news. New event detection and event tracking are part of the Topic Detection and Tracking (TDT) initiative.

[1]  Donna K. Harman,et al.  Overview of the Sixth Text REtrieval Conference (TREC-6) , 1997, Inf. Process. Manag..

[2]  Gerard Salton,et al.  Optimization of relevance feedback weights , 1995, SIGIR '95.

[3]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[4]  David L. Waltz,et al.  Classifying news stories using memory based reasoning , 1992, SIGIR '92.

[5]  Javed Mostafa,et al.  Detection of shifts in user interests for personalized information filtering , 1996, SIGIR '96.

[6]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[7]  James Allan,et al.  Recent Experiments with INQUERY , 1995, TREC.

[8]  Gerald Salton,et al.  Automatic text processing , 1988 .

[9]  Jean Tague-Sutcliffe Measuring the informativeness of a retrieval process , 1992, SIGIR '92.

[10]  James P. Callan,et al.  Training algorithms for linear text classifiers , 1996, SIGIR '96.

[11]  Yiming Yang,et al.  Topic Detection and Tracking Pilot Study Final Report , 1998 .

[12]  Carolyn Watters,et al.  Automatic association of news items , 1997, Inf. Process. Manag..

[13]  James P. Callan,et al.  Document filtering with inference networks , 1996, SIGIR '96.

[14]  Chris Buckley,et al.  Learning routing queries in a query zone , 1997, SIGIR '97.

[15]  James P. Callan,et al.  Text-Based Information Retrieval Using Exponentiated Gradient Descent , 1996, NIPS.

[16]  Yiming Yang,et al.  A study of retrospective and on-line event detection , 1998, SIGIR '98.

[17]  J Allan,et al.  Readings in information retrieval. , 1998 .

[18]  Gerald DeJong Prediction and substantiation: A new approach to natural language processing , 1979 .

[19]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[20]  Philip J. Hayes,et al.  A News Story Categorization System , 1988, ANLP.

[21]  David D. Lewis,et al.  The TREC-5 Filtering Track , 1996, TREC.

[22]  James Allan,et al.  Incremental relevance feedback for information filtering , 1996, SIGIR '96.

[23]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.