Improving novelty detection for general topics using sentence level information patterns

The detection of new information in a document stream is an important component of many potential applications. In this work, a new novelty detection approach based on the identification of sentence level information patterns is proposed. First, the information-pattern concept for novelty detection is presented with the emphasis on new information patterns for general topics (queries) that cannot be simply turned into specific questions whose answers are specific named entities (NEs). Then we elaborate a thorough analysis of sentence level information patterns on data from the TREC novelty tracks, including sentence lengths, named entities, sentence level opinion patterns. This analysis provides guidelines in applying those patterns in novelty detection particularly for the general topics. Finally, a unified pattern-based approach is presented to novelty detection for both general and specific topics. The new method for dealing with general topics will be the focus. Experimental results show that the proposed approach significantly improves the performance of novelty detection for general topics as well as the overall performance for all topics from the 2002-2004 TREC novelty tracks.

[1]  Yi Zhang,et al.  Novelty and redundancy detection in adaptive filtering , 2002, SIGIR '02.

[2]  Joe Carthy,et al.  First Story Detection using a Composite Document Representation , 2001, HLT.

[3]  W. Bruce Croft,et al.  Evaluating Question-Answering Techniques in Chinese , 2001, HLT.

[4]  James Allan,et al.  First story detection in TDT is hard , 2000, CIKM '00.

[5]  James Allan,et al.  Retrieval and novelty detection at the sentence level , 2003, SIGIR.

[6]  Yiqun Liu,et al.  THU TREC 2002: Novelty Track Experiments , 2002, TREC.

[7]  Yiming Yang,et al.  Topic-conditioned novelty detection , 2002, KDD.

[8]  S. Robertson The probability ranking principle in IR , 1997 .

[9]  Dragomir R. Radev,et al.  The University of Michigan at TREC 2002: Question Answering and Novelty Tracks , 2002, TREC.

[10]  Wei Dai,et al.  Minimal document set retrieval , 2005, CIKM '05.

[11]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[12]  Tsutomu Hirao,et al.  A Machine Learning Approach for QA and Novelty Tracks: NTT System Description , 2002, TREC.

[13]  Padmini Srinivasan,et al.  Novel Results and Some Answers - The University of Iowa TREC 11 Results , 2002, TREC.

[14]  Thorsten Brants,et al.  A System for new event detection , 2003, SIGIR.

[15]  Kui-Lam Kwok,et al.  TREC 2002 Web, Novelty and Filtering Track Experiments using PIRCS , 2002, TREC.

[16]  W. Bruce Croft,et al.  Novelty detection based on sentence level patterns , 2005, CIKM '05.

[17]  James Allan,et al.  On-Line New Event Detection and Tracking , 1998, SIGIR.

[18]  Ellen M. Voorhees,et al.  Overview of the TREC 2002 Question Answering Track , 2003, TREC.

[19]  Xiaoyan Li,et al.  Syntactic features in question answering , 2003, SIGIR.

[20]  Yiming Yang,et al.  A study of retrospective and on-line event detection , 1998, SIGIR '98.

[21]  Richard M. Schwartz,et al.  An Algorithm that Learns What's in a Name , 1999, Machine Learning.

[22]  Donna K. Harman,et al.  Overview of the TREC 2002 Novelty Track , 2002, TREC.

[23]  Soo-Min Kim,et al.  ISI Novelty Track System for TREC 2004 , 2004, TREC.

[24]  Donna K. Harman,et al.  Overview of the TREC 2003 Novelty Track , 2003, TREC.