Summary Evaluation with and without References

We study a new content–based method for the evaluation of text summarization systems without human models which is used to produce system rankings. The research is carried out using a new content–based evaluation framework called Fresa to compute a variety of divergences among probability distributions. We apply our comparison framework to various well–established content–based evaluation measures in text summarization such as COVERAGE, RESPONSIVENESS, PYRAMIDS and ROUGE studying their associations in various text summarization tasks including generic multi–document summarization in English and French, focus–based multi–document summarization in English and generic single–document summarization in French and Spanish

[1]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[2]  Choy-Kim Chuah Types of Lexical Substituion in Abstracting , 2001, ACL.

[3]  Juan-Manuel Torres-Moreno,et al.  Condensés de textes par des méthodes numériques , 2012, ArXiv.

[4]  Hoa Trang Dang,et al.  Evaluation of Automatic Summaries: Metrics under Varying Data Conditions , 2009 .

[5]  Karen Spärck Jones Automatic summarising: The state of the art , 2007, Inf. Process. Manag..

[6]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[7]  Daniel Marcu,et al.  Statistics-Based Summarization - Step One: Sentence Compression , 2000, AAAI/IAAI.

[8]  Jianfeng Gao,et al.  An Information-Theoretic Approach to Automatic Evaluation of Summaries , 2006, NAACL.

[9]  Ani Nenkova,et al.  Evaluating Content Selection in Summarization: The Pyramid Method , 2004, NAACL.

[10]  Robert L. Donaway,et al.  A Comparison of Rankings Produced by Summarization Evaluation Measures , 2000 .

[11]  Karen Sparck Jones,et al.  Book Reviews: Evaluating Natural Language Processing Systems: An Analysis and Review , 1996, CL.

[12]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[13]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[14]  Wai Lam,et al.  Evaluation Challenges in Large-Scale Document Summarization , 2003, ACL.

[15]  Iria da Cunha,et al.  Automatic Summarization Using Terminological and Semantic Resources , 2010, LREC.

[16]  V. A. Yatsko,et al.  A method for evaluating modern systems of automatic text summarization , 2007, Automatic Documentation and Mathematical Linguistics.

[17]  Ani Nenkova,et al.  Automatically Evaluating Content Selection in Summarization without Human Models , 2009, EMNLP.

[18]  Julia Galliers,et al.  Evaluating natural language processing systems , 1995 .

[19]  Juan-Manuel Torres-Moreno,et al.  A French Human Reference Corpus for Multi-Document Summarization and Sentence Compression , 2010, LREC.

[20]  Eric SanJuan,et al.  Textual Energy of Associative Memories: Performant Applications of Enertex Algorithm in Text Summarization and Topic Segmentation , 2007, MICAI.

[21]  Horacio Saggion,et al.  Colouring Summaries BLEU , 2003 .

[22]  Bernice W. Polemis Nonparametric Statistics for the Behavioral Sciences , 1959 .

[23]  Paul Over,et al.  DUC in context , 2007, Inf. Process. Manag..

[24]  Wai Lam,et al.  Meta-evaluation of Summaries in a Cross-lingual Environment using Content-based Metrics , 2002, COLING.

[25]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[26]  Iria da Cunha,et al.  Summarization of specialized discourse: the case of medical articles in spanish , 2007 .

[27]  Inderjeet Mani,et al.  SUMMAC: a text summarization evaluation , 2002, Natural Language Engineering.