Utility-based information distillation over temporally sequenced documents

This paper examines a new approach to information distillation over temporally ordered documents, and proposes a novel evaluation scheme for such a framework. It combines the strengths of and extends beyond conventional adaptive filtering, novelty detection and non-redundant passage ranking with respect to long-lasting information needs ("tasks" with multiple queries). Our approach supports fine-grained user feedback via highlighting of arbitrary spans of text, and leverages such information for utility optimization in adaptive settings. For our experiments, we defined hypothetical tasks based on news events in the TDT4 corpus, with multiple queries per task. Answer keys (nuggets) were generated for each query and a semi-automatic procedure was used for acquiring rules that allow automatically matching nuggets against system responses. We also propose an extension of the NDCG metric for assessing the utility of ranked passages as a combination of relevance and novelty. Our results show encouraging utility enhancements using the new approach, compared to the baseline systems without incremental learning or the novelty detection components.

[1]  James Allan,et al.  Automatic Retrieval With Locality Information Using SMART , 1992, TREC.

[2]  Ellen Riloff,et al.  Automatically Constructing a Dictionary for Information Extraction Tasks , 1993, AAAI.

[3]  James Allan,et al.  Incremental relevance feedback for information filtering , 1996, SIGIR '96.

[4]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[5]  James P. Callan Learning while filtering documents , 1998, SIGIR '98.

[6]  Yoram Singer,et al.  Boosting and Rocchio applied to text filtering , 1998, SIGIR '98.

[7]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[8]  Stephen E. Robertson,et al.  Microsoft Cambridge at TREC-9: Filtering Track , 2000, TREC.

[9]  Yi Zhang,et al.  Novelty and redundancy detection in adaptive filtering , 2002, SIGIR '02.

[10]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[11]  Stephen E. Robertson,et al.  Microsoft Cambridge at TREC 2002: Filtering Track , 2002, TREC.

[12]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[13]  William W. Cohen,et al.  Beyond independent relevance: methods and evaluation metrics for subtopic retrieval , 2003, SIGIR.

[14]  Yiming Yang,et al.  Margin-based local regression for adaptive filtering , 2003, CIKM '03.

[15]  Ellen M. Voorhees,et al.  Overview of the TREC 2004 Novelty Track. , 2005 .

[16]  Yiming Yang,et al.  Robustness of regularized linear classification methods in text categorization , 2003, SIGIR.

[17]  James Allan,et al.  Retrieval and novelty detection at the sentence level , 2003, SIGIR.

[18]  Xiaoqiang Luo,et al.  A Mention-Synchronous Coreference Resolution Algorithm Based On the Bell Tree , 2004, ACL.

[19]  Xiaoqiang Luo,et al.  A Statistical Model for Multilingual Entity Detection and Tracking , 2004, NAACL.

[20]  Yi Zhang Using bayesian priors to combine classifiers for adaptive filtering , 2004, SIGIR '04.

[21]  Marcel Worring,et al.  NIST Special Publication , 2005 .

[22]  Jimmy J. Lin,et al.  Automatically Evaluating Answers to Definition Questions , 2005, HLT.

[23]  Yiming Yang,et al.  Robustness of adaptive filtering methods in a cross-benchmark evaluation , 2005, SIGIR '05.

[24]  W. Bruce Croft,et al.  Indri : A language-model based search engine for complex queries ( extended version ) , 2005 .

[25]  Jimmy J. Lin,et al.  Will Pyramids Built of Nuggets Topple Over? , 2006, NAACL.

[26]  Alexey Radul,et al.  Nuggeteer: Automatic Nugget-Based Evaluation using Descriptions and Judgements , 2006, NAACL.