Novelty detection based on sentence level patterns

The detection of new information in a document stream is an important component of many potential applications. In this paper, a new novelty detection approach based on the identification of sentence level patterns is proposed. Given a user's information need, some patterns in sentences such as combinations of query words, named entities and phrases, may contain more important and relevant information than single words. Therefore, the proposed novelty detection approach focuses on the identification of previously unseen query-related patterns in sentences. Specifically, a query is preprocessed and represented with patterns that include both query words and required answer types. These patterns are used to retrieve sentences, which are then determined to be novel if it is likely that a new answer is present. An analysis of patterns in sentences was performed with data from the TREC 2002 novelty track and experiments on novelty detection were carried out on data from the TREC 2003 and 2004 novelty tracks. The experimental results show that the proposed pattern-based approach significantly outperforms all three baselines in terms of precision at top ranks.

[1]  Richard M. Schwartz,et al.  An Algorithm that Learns What's in a Name , 1999, Machine Learning.

[2]  Donna K. Harman,et al.  Overview of the TREC 2002 Novelty Track , 2002, TREC.

[3]  Thorsten Brants,et al.  A System for new event detection , 2003, SIGIR.

[4]  James Allan,et al.  Retrieval and novelty detection at the sentence level , 2003, SIGIR.

[5]  Yiming Yang,et al.  Topic-conditioned novelty detection , 2002, KDD.

[6]  S. Robertson The probability ranking principle in IR , 1997 .

[7]  Yiming Yang,et al.  A study of retrospective and on-line event detection , 1998, SIGIR '98.

[8]  Yi Zhang,et al.  Novelty and redundancy detection in adaptive filtering , 2002, SIGIR '02.

[9]  Joe Carthy,et al.  First Story Detection using a Composite Document Representation , 2001, HLT.

[10]  Ellen M. Voorhees,et al.  Overview of the TREC 2002 Question Answering Track , 2003, TREC.

[11]  Xiaoyan Li,et al.  Syntactic features in question answering , 2003, SIGIR.

[12]  Padmini Srinivasan,et al.  Novel Results and Some Answers - The University of Iowa TREC 11 Results , 2002, TREC.

[13]  Tsutomu Hirao,et al.  A Machine Learning Approach for QA and Novelty Tracks: NTT System Description , 2002, TREC.

[14]  James Allan,et al.  On-Line New Event Detection and Tracking , 1998, SIGIR.

[15]  Kui-Lam Kwok,et al.  TREC 2002 Web, Novelty and Filtering Track Experiments using PIRCS , 2002, TREC.

[16]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[17]  W. Bruce Croft,et al.  Evaluating Question-Answering Techniques in Chinese , 2001, HLT.

[18]  James Allan,et al.  First story detection in TDT is hard , 2000, CIKM '00.

[19]  Yiqun Liu,et al.  THU TREC 2002: Novelty Track Experiments , 2002, TREC.

[20]  Donna K. Harman,et al.  Overview of the TREC 2003 Novelty Track , 2003, TREC.

[21]  Dragomir R. Radev,et al.  The University of Michigan at TREC 2002: Question Answering and Novelty Tracks , 2002, TREC.