UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation
暂无分享,去创建一个
[1] Qiaozhu Mei,et al. Judge the Judges: A Large-Scale Evaluation Study of Neural Language Models for Online Review Generation , 2019, EMNLP.
[2] Dongyan Zhao,et al. Plan-And-Write: Towards Better Automatic Storytelling , 2018, AAAI.
[3] Ke Xu,et al. Learning to Compare for Better Training and Evaluation of Open Domain Natural Language Generation Models , 2020, AAAI.
[4] Stanislau Semeniuta,et al. On Accurate Evaluation of GANs for Language Generation , 2018, ArXiv.
[5] Joelle Pineau,et al. How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation , 2016, EMNLP.
[6] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[7] Catherine Havasi,et al. Representing General Relational Knowledge in ConceptNet 5 , 2012, LREC.
[8] Mamoru Komachi,et al. RUSE: Regressor Using Sentence Embeddings for Automatic Machine Translation Evaluation , 2018, WMT.
[9] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[10] Fei Liu,et al. MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance , 2019, EMNLP.
[11] Nathanael Chambers,et al. A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories , 2016, NAACL.
[12] Thibault Sellam,et al. BLEURT: Learning Robust Metrics for Text Generation , 2020, ACL.
[13] Kilian Q. Weinberger,et al. BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.
[14] Nanyun Peng,et al. Better Automatic Evaluation of Open-Domain Dialogue Systems with Contextualized Embeddings , 2019, Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation.
[15] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[16] Yejin Choi,et al. The Curious Case of Neural Text Degeneration , 2019, ICLR.
[17] Dongyan Zhao,et al. RUBER: An Unsupervised Method for Automatic Evaluation of Open-Domain Dialog Systems , 2017, AAAI.
[18] Minlie Huang,et al. A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation , 2020, TACL.
[19] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[20] Joelle Pineau,et al. Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses , 2017, ACL.
[21] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[22] Maxine Eskénazi,et al. Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders , 2017, ACL.
[23] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[24] Percy Liang,et al. Unifying Human and Statistical Evaluation for Natural Language Generation , 2019, NAACL.
[25] Oriol Vinyals,et al. Adversarial Evaluation of Dialogue Models , 2017, ArXiv.
[26] Mitesh M. Khapra,et al. Re-evaluating ADEM: A Deeper Look at Scoring Dialogue Responses , 2019, AAAI.
[27] Yann Dauphin,et al. Hierarchical Neural Story Generation , 2018, ACL.
[28] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.