Automatic Evaluation of Linguistic Quality in Multi-Document Summarization

To date, few attempts have been made to develop and validate methods for automatic evaluation of linguistic quality in text summarization. We present the first systematic assessment of several diverse classes of metrics designed to capture various aspects of well-written text. We train and test linguistic quality models on consecutive years of NIST evaluation data in order to show the generality of results. For grammaticality, the best results come from a set of syntactic features. Focus, coherence and referential clarity are best evaluated by a class of features measuring local coherence on the basis of cosine similarity between sentences, coreference information, and summarization specific features. Our best results are 90% accuracy for pairwise comparisons of competing systems over a test set of several inputs and 70% for ranking summaries of a specific input.

[1]  M. Just,et al.  The psychology of reading and language comprehension , 1986 .

[2]  John M. Conroy,et al.  Mind the Gap: Dangers of Divorcing Evaluations of Summary Content from Linguistic Quality , 2008, COLING.

[3]  Kari Fraurud,et al.  Definiteness and the Processing of Noun Phrases in Natural Discourse , 1990, J. Semant..

[4]  Chris D. Paice,et al.  The automatic generation of literature abstracts: an approach based on the identification of self-indicating phrases , 1980, SIGIR '80.

[5]  Micha Elsner,et al.  A Unified Local and Global Model for Discourse Coherence , 2007, NAACL.

[6]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[7]  Karel Jezek,et al.  Two uses of anaphora resolution in summarization , 2007, Inf. Process. Manag..

[8]  Scott Weinstein,et al.  Centering: A Framework for Modeling the Local Coherence of Discourse , 1995, CL.

[9]  Dragomir R. Radev,et al.  Revisions that improve cohesion in multi-document summaries: a preliminary study , 2002, ACL 2002.

[10]  Mirella Lapata,et al.  Automatic Evaluation of Text Coherence: Models and Representations , 2005, IJCAI.

[11]  Micha Elsner,et al.  EM Works for Pronoun Anaphora Resolution , 2009, EACL.

[12]  Ellen F. Prince,et al.  Toward a taxonomy of given-new information , 1981 .

[13]  Horacio Saggion,et al.  A Classification Algorithm for Predicting the Structure of Summaries , 2009 .

[14]  Mirella Lapata,et al.  Modeling Local Coherence: An Entity-Based Approach , 2005, ACL.

[15]  Michael Halliday,et al.  Cohesion in English , 1976 .

[16]  Daniel Marcu,et al.  Summarization beyond sentence extraction: A probabilistic approach to sentence compression , 2002, Artif. Intell..

[17]  Chris D. Paice,et al.  Constructing literature abstracts by computer: Techniques and prospects , 1990, Inf. Process. Manag..

[18]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[19]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[20]  Paul Over,et al.  DUC in context , 2007, Inf. Process. Manag..

[21]  Arthur C. Graesser,et al.  Component processes in text comprehension and some of their interactions , 1985 .

[22]  Ani Nenkova,et al.  Predicting the Fluency of Text with Shallow Structural Features: Case Studies of Machine Translation and Human-Written Text , 2009, EACL.

[23]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[24]  Philipp Koehn,et al.  Further Meta-Evaluation of Machine Translation , 2008, WMT@ACL.

[25]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[26]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[27]  Ani Nenkova,et al.  References to Named Entities: a Corpus Study , 2003, HLT-NAACL.

[28]  Arthur C. Graesser,et al.  Coh-Metrix: Analysis of text on cohesion and language , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[29]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[30]  Micha Elsner,et al.  Coreference-inspired Coherence Modeling , 2008, ACL.

[31]  Mirella Lapata,et al.  Probabilistic Text Structuring: Experiments with Sentence Ordering , 2003, ACL.

[32]  Daniel Marcu,et al.  Discourse Generation Using Utility-Trained Coherence Models , 2006, ACL.