暂无分享,去创建一个
[1] Sivaraman Balakrishnan,et al. Hypothesis Testing for High-Dimensional Multinomials: A Selective Review , 2017, ArXiv.
[2] Melissa Roemmele,et al. Writing Stories with Help from Recurrent Neural Networks , 2016, AAAI.
[3] Elia Bruni,et al. Adversarial evaluation for open-domain dialogue generation , 2017, SIGDIAL Conference.
[4] Lyle H. Ungar,et al. The Good Judgment Project: A Large Scale Test of Different Methods of Combining Expert Predictions , 2012, AAAI Fall Symposium: Machine Aggregation of Human Judgment.
[5] Joelle Pineau,et al. Language GANs Falling Short , 2018, ICLR.
[6] Joelle Pineau,et al. Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses , 2017, ACL.
[7] Ani Nenkova,et al. The Pyramid Method: Incorporating human content selection variation in summarization evaluation , 2007, TSLP.
[8] Ian J. Goodfellow,et al. Skill Rating for Generative Models , 2018, ArXiv.
[9] A. M. Turing,et al. Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.
[10] Jianfeng Gao,et al. deltaBLEU: A Discriminative Metric for Generation Tasks with Intrinsically Diverse Targets , 2015, ACL.
[11] Bowen Zhou,et al. Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.
[12] Nathanael Chambers,et al. A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories , 2016, NAACL.
[13] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[14] Yonghui Wu,et al. Exploring the Limits of Language Modeling , 2016, ArXiv.
[15] Joelle Pineau,et al. How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation , 2016, EMNLP.
[16] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.
[17] Teruko Mitamura,et al. Diversity-aware Evaluation for Paraphrase Patterns , 2011, TextInfer@EMNLP.
[18] Graham Neubig,et al. Retrieval-Based Neural Code Generation , 2018, EMNLP.
[19] Alexander M. Rush,et al. OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.
[20] Chin-Yew Lin,et al. Looking for a Few Good Metrics: ROUGE and its Evaluation , 2004 .
[21] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[22] Chun-Liang Li,et al. Nonparametric Density Estimation under Adversarial Losses , 2018, NeurIPS.
[23] Neri Merhav,et al. Relations between entropy and error probability , 1994, IEEE Trans. Inf. Theory.
[24] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[25] Alon Lavie,et al. The Meteor metric for automatic evaluation of machine translation , 2009, Machine Translation.
[26] Olivier Bachem,et al. Assessing Generative Models via Precision and Recall , 2018, NeurIPS.
[27] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[28] Edmund A. Mennis. The Wisdom of Crowds: Why the Many Are Smarter than the Few and How Collective Wisdom Shapes Business, Economies, Societies, and Nations , 2006 .
[29] Verena Rieser,et al. Why We Need New Evaluation Metrics for NLG , 2017, EMNLP.
[30] Percy Liang,et al. Delete, Retrieve, Generate: a Simple Approach to Sentiment and Style Transfer , 2018, NAACL.
[31] Hong Sun,et al. Joint Learning of a Dual SMT System for Paraphrase Generation , 2012, ACL.
[32] Percy Liang,et al. A Retrieve-and-Edit Framework for Predicting Structured Outputs , 2018, NeurIPS.
[33] Samy Bengio,et al. Generating Sentences from a Continuous Space , 2015, CoNLL.
[34] Alan Ritter,et al. Adversarial Learning for Neural Dialogue Generation , 2017, EMNLP.
[35] Bernhard Schölkopf,et al. Minimax Estimation of Maximum Mean Discrepancy with Radial Kernels , 2016, NIPS.
[36] Jianfeng Gao,et al. A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.
[37] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.
[38] Oriol Vinyals,et al. Adversarial Evaluation of Dialogue Models , 2017, ArXiv.
[39] Charles L. A. Clarke,et al. Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.
[40] Matthias Bethge,et al. A note on the evaluation of generative models , 2015, ICLR.
[41] Jianfeng Gao,et al. A Neural Network Approach to Context-Sensitive Generation of Conversational Responses , 2015, NAACL.
[42] Percy Liang,et al. The price of debiasing automatic metrics in natural language evalaution , 2018, ACL.