Multilingual Summarization Evaluation without Human Models

We study correlation of rankings of text summarization systems using evaluation methods with and without human models. We apply our comparison framework to various well-established content-based evaluation measures in text summarization such as coverage, Responsiveness, Pyramids and Rouge studying their associations in various text summarization tasks including generic and focus-based multi-document summarization in English and generic single-document summarization in French and Spanish. The research is carried out using a new content-based evaluation framework called Fresa to compute a variety of divergences among probability distributions.

[1]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[2]  Ani Nenkova,et al.  Evaluating Content Selection in Summarization: The Pyramid Method , 2004, NAACL.

[3]  Jianfeng Gao,et al.  An Information-Theoretic Approach to Automatic Evaluation of Summaries , 2006, NAACL.

[4]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[5]  Inderjeet Mani,et al.  SUMMAC: a text summarization evaluation , 2002, Natural Language Engineering.

[6]  V. A. Yatsko,et al.  A method for evaluating modern systems of automatic text summarization , 2007, Automatic Documentation and Mathematical Linguistics.

[7]  Iria da Cunha,et al.  Summarization of specialized discourse: the case of medical articles in spanish , 2007 .

[8]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[9]  Robert L. Donaway,et al.  A Comparison of Rankings Produced by Summarization Evaluation Measures , 2000 .

[10]  Ani Nenkova,et al.  Automatically Evaluating Content Selection in Summarization without Human Models , 2009, EMNLP.

[11]  Paul Over,et al.  DUC in context , 2007, Inf. Process. Manag..

[12]  Wai Lam,et al.  Meta-evaluation of Summaries in a Cross-lingual Environment using Content-based Metrics , 2002, COLING.

[13]  Karen Sparck Jones,et al.  Book Reviews: Evaluating Natural Language Processing Systems: An Analysis and Review , 1996, CL.

[14]  Wai Lam,et al.  Evaluation Challenges in Large-Scale Document Summarization , 2003, ACL.

[15]  Eric SanJuan,et al.  Textual Energy of Associative Memories: Performant Applications of Enertex Algorithm in Text Summarization and Topic Segmentation , 2007, MICAI.

[16]  Horacio Saggion,et al.  Colouring Summaries BLEU , 2003 .

[17]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[18]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[19]  Hoa Trang Dang,et al.  Evaluation of Automatic Summaries: Metrics under Varying Data Conditions , 2009 .

[20]  Iria da Cunha,et al.  Automatic Summarization Using Terminological and Semantic Resources , 2010, LREC.

[21]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.