Metrics also Disagree in the Low Scoring Range: Revisiting Summarization Evaluation Metrics
暂无分享,去创建一个
[1] Kilian Q. Weinberger,et al. BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.
[2] Mirella Lapata,et al. Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization , 2018, EMNLP.
[3] Maxime Peyrard,et al. Studying Summarization Evaluation Metrics in the Appropriate Scoring Range , 2019, ACL.
[4] Ani Nenkova,et al. Automatically Assessing Machine Summary Content Without a Gold Standard , 2013, CL.
[5] Iryna Gurevych,et al. Learning to Score System Summaries for Better Content Selection Evaluation. , 2017, NFiS@EMNLP.
[6] Bowen Zhou,et al. Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.
[7] Jun-Ping Ng,et al. Better Summarization Evaluation with Word Embeddings for ROUGE , 2015, EMNLP.
[8] Fei Liu,et al. MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance , 2019, EMNLP.
[9] Mor Naaman,et al. Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies , 2018, NAACL.
[10] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[11] Phil Blunsom,et al. Teaching Machines to Read and Comprehend , 2015, NIPS.
[12] Graham Neubig,et al. Re-evaluating Evaluation in Text Summarization , 2020, EMNLP.
[13] Judith Eckle-Kohler,et al. A General Optimization Framework for Multi-Document Summarization Using Genetic Algorithms and Swarm Intelligence , 2016, COLING.
[14] Jianfeng Gao,et al. An Information-Theoretic Approach to Automatic Evaluation of Summaries , 2006, NAACL.
[15] Hoa Trang Dang,et al. Overview of the TAC 2008 Update Summarization Task , 2008, TAC.