Performance Confidence Estimation for Automatic Summarization

We address the task of automatically predicting if summarization system performance will be good or bad based on features derived directly from either single- or multi-document inputs. Our labelled corpus for the task is composed of data from large scale evaluations completed over the span of several years. The variation of data between years allows for a comprehensive analysis of the robustness of features, but poses a challenge for building a combined corpus which can be used for training and testing. Still, we find that the problem can be mitigated by appropriately normalizing for differences within each year. We examine different formulations of the classification task which considerably influence performance. The best results are 84% prediction accuracy for single- and 74% for multi-document summarization.

[1]  Mirella Lapata,et al.  Modeling Local Coherence: An Entity-Based Approach , 2005, ACL.

[2]  W. Bruce Croft,et al.  Predicting query performance , 2002, SIGIR '02.

[3]  Paul Over,et al.  DUC in context , 2007, Inf. Process. Manag..

[4]  Lisa F. Rau,et al.  Automatic Condensation of Electronic Publications by Sentence Selection , 1995, Inf. Process. Manag..

[5]  Elad Yom-Tov,et al.  Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval , 2005, SIGIR '05.

[6]  Kathleen R. McKeown,et al.  Columbia multi-document summarization : Approach and evaluation , 2001 .

[7]  Marilyn A. Walker,et al.  SPoT: A Trainable Sentence Planner , 2001, NAACL.

[8]  Rebecca Hwa,et al.  Localization of Difficult-to-Translate Phrases , 2007, WMT@ACL.

[9]  Marti A. Hearst,et al.  Improving Search Results Quality by Customizing Summary Lengths , 2008, ACL.

[10]  Eduard H. Hovy,et al.  The Automated Acquisition of Topic Signatures for Text Summarization , 2000, COLING.

[11]  Susan T. Dumais,et al.  An Analysis of the AskMSR Question-Answering System , 2002, EMNLP.

[12]  Ani Nenkova,et al.  Can You Summarize This? Identifying Correlates of Input Difficulty for Multi-Document Summarization , 2008, ACL.

[13]  Dianne P. O'Leary,et al.  Topic-Focused Multi-Document Summarization Using an Approximate Oracle Score , 2006, ACL.

[14]  Mark Dredze Learning to Admit You ’ re Wrong : Statistical Tools for Evaluating Web QA , 2007 .

[15]  Philipp Koehn,et al.  Predicting Success in Machine Translation , 2008, EMNLP.

[16]  Nikiforos Karamanis,et al.  Entity coherence for descriptive text structuring , 2004 .

[17]  Elad Yom-Tov,et al.  What makes a query difficult? , 2006, SIGIR.