Data-driven Natural Language Generation: Paving the Road to Success

We argue that there are currently two major bottlenecks to the commercial use of statistical machine learning approaches for natural language generation (NLG): (a) The lack of reliable automatic evaluation metrics for NLG, and (b) The scarcity of high quality in-domain corpora. We address the first problem by thoroughly analysing current evaluation metrics and motivating the need for a new, more reliable metric. The second problem is addressed by presenting a novel framework for developing and evaluating a high quality corpus for NLG training.

[1]  Steve J. Young,et al.  Partially observable Markov decision processes for spoken dialog systems , 2007, Comput. Speech Lang..

[2]  Maxine Eskénazi,et al.  Spoken Dialog Challenge 2010: Comparison of Live and Control Test Results , 2011, SIGDIAL Conference.

[3]  Ondrej Dusek,et al.  Sequence-to-Sequence Generation for Spoken Dialogue via Deep Syntax Trees and Strings , 2016, ACL.

[4]  Lucia Specia,et al.  Machine translation evaluation versus quality estimation , 2010, Machine Translation.

[5]  Xiaofei Lu The Relationship of Lexical Richness to the Quality of ESL Learners' Oral Narratives. , 2012 .

[6]  Ondrej Dusek,et al.  Training a Natural Language Generator From Unaligned Data , 2015, ACL.

[7]  Oliver Lemon,et al.  Crowd-sourcing NLG Data: Pictures Elicit Better Data. , 2016, INLG.

[8]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[9]  Jonathan Weese,et al.  UMBC_EBIQUITY-CORE: Semantic Textual Similarity Systems , 2013, *SEMEVAL.

[10]  Xiaofei Lu,et al.  Automatic measurement of syntactic complexity in child language acquisition , 2009 .

[11]  David Vandyke,et al.  Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems , 2015, EMNLP.

[12]  Matthew R. Walter,et al.  What to talk about and how? Selective Generation using LSTMs with Coarse-to-Fine Alignment , 2015, NAACL.

[13]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[14]  David Vandyke,et al.  Multi-domain Neural Network Language Generation for Spoken Dialogue Systems , 2016, NAACL.

[15]  Milica Gasic,et al.  Phrase-Based Statistical Language Generation Using Graphical Models and Active Learning , 2010, ACL.

[16]  Dimitra Gkatzia,et al.  A Snapshot of NLG Evaluation Practices 2005 - 2014 , 2015, ENLG.

[17]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[18]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[19]  Andreas Vlachos,et al.  Imitation learning for language generation from unaligned data , 2016, COLING.