Novelty Detection: The TREC Experience

A challenge for search systems is to detect not only when an item is relevant to the user's information need, but also when it contains something new which the user has not seen before. In the TREC novelty track, the task was to highlight sentences containing relevant and new information in a short, topical document stream. This is analogous to highlighting key parts of a document for another person to read, and this kind of output can be useful as input to a summarization system. Search topics involved both news events and reported opinions on hot-button subjects. When people performed this task, they tended to select small blocks of consecutive sentences, whereas current systems identified many relevant and novel passages. We also found that opinions are much harder to track than events.

[1]  Donna K. Harman,et al.  Overview of the Second Text REtrieval Conference (TREC-2) , 1994, HLT.

[2]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[3]  Alan F. Smeaton,et al.  Experiments in Terabyte Searching, Genomic Retrieval and Novelty Detection for TREC 2004 , 2004, TREC.

[4]  Li Zhou,et al.  Novelty, Question Answering and Genomics: The University of Iowa Response , 2004, TREC.

[5]  Kenneth C. Litkowski Evolving XML and Dictionary Strategies for Question Answering and Novelty Tasks , 2004, TREC.

[6]  Ian Soboroff,et al.  Overview of the TREC 2004 Novelty Track , 2004, TREC.

[7]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[8]  Kathleen McKeown,et al.  Columbia University in the Novelty Track at TREC 2004 , 2004, TREC.

[9]  Stephen E. Robertson,et al.  Introduction to the Special Issue: Overview of the TREC Routing and Filtering Tasks , 2002, Information Retrieval.

[10]  Barbara Di Eugenio,et al.  On the Usage of Kappa to Evaluate Agreement on Coding Tasks , 2000, LREC.

[11]  Thomas Heitz,et al.  From the Texts to the Contexts They Contain: A Chain of Linguistic Treatments , 2004, TREC.

[12]  Ellen M. Voorhees Variations in relevance judgments and the measurement of retrieval effectiveness , 2000, Inf. Process. Manag..

[13]  James Allan,et al.  First story detection in TDT is hard , 2000, CIKM '00.

[14]  Donna K. Harman,et al.  Overview of the TREC 2003 Novelty Track , 2003, TREC.

[15]  James Allan,et al.  Classification Models for New Event Detection , 2004 .

[16]  Donna K. Harman,et al.  Overview of the TREC 2002 Novelty Track , 2002, TREC.

[17]  Fernando Diaz,et al.  UMass at TREC 2004: Novelty and HARD , 2004, TREC.

[18]  Ian Soboro Overview of the TREC 2004 Novelty Track , 2004 .