Findings of the E2E NLG Challenge

This paper summarises the experimental setup and results of the first shared task on end-to-end (E2E) natural language generation (NLG) in spoken dialogue systems. Recent end-to-end generation systems are promising since they reduce the need for data annotation. However, they are currently limited to small, delexicalised datasets. The E2E NLG shared task aims to assess whether these novel approaches can generate better-quality output by learning from a dataset containing higher lexical richness, syntactic complexity and diverse discourse phenomena. We compare 62 systems submitted by 17 institutions, covering a wide range of approaches, including machine learning architectures -- with the majority implementing sequence-to-sequence models (seq2seq) -- as well as systems based on grammatical rules and templates.

[1]  Andy Way,et al.  Ethical Considerations in NLP Shared Tasks , 2017, EthNLP@EACL.

[2]  Marc Dymetman,et al.  Char2char Generation with Reranking for the E2E NLG Challenge , 2018, INLG.

[3]  Emiel Krahmer,et al.  Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation , 2017, J. Artif. Intell. Res..

[4]  TNT-NLG , System 1 : Using a Statistical NLG to Massively Augment Crowd-Sourced Data for Neural Generation , 2018 .

[5]  Frank Schilder,et al.  The E2E NLG Challenge: A Tale of Two Systems , 2018, INLG.

[6]  Raymond J. Mooney,et al.  Learning to sportscast: a test of grounded language acquisition , 2008, ICML '08.

[7]  Matthew R. Walter,et al.  What to talk about and how? Selective Generation using LSTMs with Coarse-to-Fine Alignment , 2015, NAACL.

[8]  Ondrej Bojar,et al.  Results of the WMT17 Metrics Shared Task , 2017, WMT.

[9]  Shuang Chen,et al.  A General Model for Neural Text Generation from Structured Data , 2018 .

[10]  Verena Rieser,et al.  Referenceless Quality Estimation for Natural Language Generation , 2017, ArXiv.

[11]  Lucia Specia,et al.  Machine translation evaluation versus quality estimation , 2010, Machine Translation.

[12]  Qun Liu,et al.  E2E NLG Challenge Submission: Towards Controllable Generation of Diverse Natural Language , 2018, INLG.

[13]  Robert Dale,et al.  Building Natural Language Generation Systems (Studies in Natural Language Processing) , 2006 .

[14]  Alexander M. Rush,et al.  Challenges in Data-to-Document Generation , 2017, EMNLP.

[15]  Marilyn A. Walker,et al.  A Deep Ensemble Model with Slot Alignment for Sequence-to-Sequence Natural Language Generation , 2018, NAACL.

[16]  Albert Gatt,et al.  The attribute selection for GRE challenge: overview and evaluation results , 2007, MTSUMMIT.

[17]  Verena Rieser,et al.  Why We Need New Evaluation Metrics for NLG , 2017, EMNLP.

[18]  Biao Zhang,et al.  Attention Regularized Sequence-to-Sequence Learning for E 2 E NLG Challenge , 2018 .

[19]  Stamatia Dasiopoulou,et al.  FORGe at E2E 2017 , 2017 .

[20]  Philipp Koehn,et al.  Findings of the 2017 Conference on Machine Translation (WMT17) , 2017, WMT.

[21]  Ondrej Dusek,et al.  Sequence-to-Sequence Generation for Spoken Dialogue via Deep Syntax Trees and Strings , 2016, ACL.

[22]  Verena Rieser,et al.  RankME: Reliable Human Ratings for Natural Language Generation , 2018, NAACL.

[23]  Iryna Gurevych,et al.  E2E NLG Challenge: Neural Models vs. Templates , 2018, INLG.

[24]  Claire Gardent,et al.  The WebNLG Challenge: Generating Text from RDF Data , 2017, INLG.

[25]  David Vandyke,et al.  Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems , 2015, EMNLP.

[26]  Ondrej Dusek,et al.  A Context-aware Natural Language Generator for Dialogue Systems , 2016, SIGDIAL Conference.

[27]  David Vandyke,et al.  Stochastic Language Generation in Dialogue using Recurrent Neural Networks with Convolutional Sentence Reranking , 2015, SIGDIAL Conference.

[28]  Philip Bachman,et al.  Natural Language Generation in Dialogue using Lexicalized and Delexicalized Data , 2016, ICLR.

[29]  Matt Post,et al.  Efficient Elicitation of Annotations for Human Evaluation of Machine Translation , 2014, WMT@ACL.

[30]  Milica Gasic,et al.  Phrase-Based Statistical Language Generation Using Graphical Models and Active Learning , 2010, ACL.

[31]  Shubhangi Tandon,et al.  TNT-NLG , System 2 : Data Repetition and Meaning Representation Manipulation to Improve Neural Generation , 2018 .

[32]  David Vandyke,et al.  Multi-domain Neural Network Language Generation for Spoken Dialogue Systems , 2016, NAACL.

[33]  Ondrej Dusek,et al.  Training a Natural Language Generator From Unaligned Data , 2015, ACL.

[34]  Sebastian Gehrmann,et al.  End-to-End Content and Plan Selection for Natural Language Generation , 2018 .

[35]  Mark Cieliebak,et al.  End-to-end trainable system for enhancing diversity in natural language generation , 2017 .

[36]  Andreas Vlachos,et al.  Imitation learning for language generation from unaligned data , 2016, COLING.

[37]  Andreas Vlachos,et al.  Sheffield at E2E: structured prediction approaches to end-to-end language generation. , 2018 .

[38]  Oliver Lemon,et al.  Crowd-sourcing NLG Data: Pictures Elicit Better Data. , 2016, INLG.

[39]  Xinlei Chen,et al.  Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.

[40]  Verena Rieser,et al.  The E2E Dataset: New Challenges For End-to-End Generation , 2017, SIGDIAL Conference.

[41]  Tom Minka,et al.  TrueSkillTM: A Bayesian Skill Rating System , 2006, NIPS.