Data-driven Natural Language Generation: Making Machines Talk Like Humans Using Natural Corpora

With the significant improvements that have been seen in speech applications, the long-held goal of building machines that can have humanlike conversations has begun to seem more reachable; there exist spoken dialog systems which can now be used effectively by much of the general public. Despite these improvements, however, applications are still frequently limited by their unnatural spoken language generation. This thesis discusses the problem of human-like spoken language generation: how to make machine-generated speech more like natural human speech. The scope of this problem is large, with issues in speech synthesis, natural language generation, and spoken dialog, among other areas. The work in this thesis is primarily focused on natural language generation, with some discussion of the issues related to speech synthesis and the intersection between synthesis and language generation. In particular, we discuss a method that uses signal modifications of the synthesized waveforms to emulate what humans do when trying to be understood better while speaking in noisy conditions. One of the main differences between humanand machineproduced speech is in understandability; natural human speech is typically easier to understand. We describe a general framework, which we call uGloss, designed to improve the understandability of spoken generation of complex information. The uGloss framework employs a set of tactical generation strategies that attempt to take the expected capability of the human listener into account; by staying within those abilities the resulting spoken output is typically more easily understood. Though uGloss can improve understandability, it is not a complete solution to machine-generated human-like speech. In many other fields, from speech recognition and synthesis, to parsing and understanding, using corpus-based statistical knowledge has led to improved systems. We propose a similar data-driven approach intended to improve language generation systems, specifically for speech and dialog applications. Our proposed approach – the MOUNTAIN language generation system – is a fully-automatic approach which uses machine translation techniques to generate novel examples from a natural corpus. This system is designed to be a domain-independent method for training a generator that can produce human-like language in speech applications, by translating the machine’s internal representation of

[1]  Hoa Trang Dang,et al.  DUC 2005: Evaluation of Question-Focused Summarization Systems , 2006 .

[2]  Michael Picheny,et al.  A corpus-based approach to expressive speech synthesis , 2004, SSW.

[3]  Mari Ostendorf,et al.  Efficient integrated response generation from multiple targets using weighted finite state transducers , 2002, Comput. Speech Lang..

[4]  Simon King,et al.  The Blizzard Challenge 2008 , 2008 .

[5]  Raymond J. Mooney,et al.  Generation by Inverting a Semantic Parser that Uses Statistical Machine Translation , 2007, NAACL.

[6]  Adwait Ratnaparkhi,et al.  Trainable Methods for Surface Natural Language Generation , 2000, ANLP.

[7]  Robert Dale,et al.  Building applied natural language generation systems , 1997, Natural Language Engineering.

[8]  Mattias Heldner,et al.  Two faces of spoken dialogue systems , 2006 .

[9]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[10]  Julia Hirschberg,et al.  Exploring features from natural language generation for prosody modeling , 2002, Comput. Speech Lang..

[11]  Rohit Kumar,et al.  Generating time-constrained audio presentations of structured information , 2006, INTERSPEECH.

[12]  Anja Belz,et al.  System Building Cost vs. Output Quality in Data-to-Text Generation , 2009, ENLG.

[13]  W. A. Bousfield,et al.  Serial position effects and the Marbe effect in the free recall of meaningful words. , 1958, The Journal of general psychology.

[14]  Simon King,et al.  The Blizzard Challenge 2007 , 2007 .

[15]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[16]  Alan W. Black,et al.  Limited domain synthesis , 2000, INTERSPEECH.

[17]  Alexander I. Rudnicky,et al.  Task and domain specific modelling in the Carnegie Mellon communicator system , 2000, INTERSPEECH.

[18]  Gregory A. Sanders,et al.  DARPA communicator dialog travel planning systems: the june 2000 data collection , 2001, INTERSPEECH.

[19]  Keiichi Tokuda,et al.  The blizzard challenge - 2005: evaluating corpus-based speech synthesis on common datasets , 2005, INTERSPEECH.

[20]  Gina-Anne Levow,et al.  Designing SpeechActs: issues in speech user interfaces , 1995, CHI '95.

[21]  Paul Taylor,et al.  Automatically clustering similar units for unit selection in speech synthesis , 1997, EUROSPEECH.

[22]  Chris Quirk,et al.  Monolingual Machine Translation for Paraphrase Generation , 2004, EMNLP.

[23]  Raymond J. Mooney,et al.  Learning for semantic parsing and natural language generation using statistical machine translation techniques , 2007 .

[24]  Anja Belz,et al.  Comparing Automatic and Human Evaluation of NLG Systems , 2006, EACL.

[25]  Michele Banko,et al.  Headline Generation Based on Statistical Translation , 2000, ACL.

[26]  Scott Prevost An Information Structural Approach to Spoken Language Generation , 1996, ACL.

[27]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[28]  Kevin Knight,et al.  Generation that Exploits Corpus-Based Statistical Knowledge , 1998, ACL.

[29]  A. Baddeley,et al.  Word length and the structure of short-term memory , 1975 .

[30]  Robert Dale,et al.  Evaluation in Natural Language Generation: Lessons from Referring Expression Generation , 2007, TAL.

[31]  Chris Mellish,et al.  On the use of automatically generated discourse-level information in a concept-to-speech synthesis system , 1998, ICSLP.

[32]  Donia Scott,et al.  Book Reviews: Generating Referring Expressions , 1994, CL.

[33]  Robert Dale,et al.  Algorithms for Generating Referring Expressions: Do They Do What People Do? , 2006, INLG.

[34]  A.W. Black,et al.  Using speech in noise to improve understandability for elderly listeners , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[35]  Alan W. Black,et al.  Ugloss: a Framework for Improving Spoken Language Generation Understandability , 2007, INTERSPEECH.

[36]  Alan W. Black,et al.  The Blizzard Challenge 2006 , 2006 .

[37]  Antoine Raux,et al.  A unit selection approach to F0 modeling and its application to emphasis , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[38]  Albert Gatt,et al.  The TUNA-REG Challenge 2009: Overview and Evaluation Results , 2009, ENLG.

[39]  Sebastian Varges,et al.  Interactive Question Answering and Constraint Relaxation in Spoken Dialogue Systems , 2006, SIGDIAL Workshop.

[40]  Jim Hunter,et al.  Exploiting a parallel TEXT - DATA corpus , 2003 .

[41]  Oliver Lemon,et al.  Predicting how it sounds: re-ranking dialogue prompts based on TTS quality for adaptive spoken dialogue systems , 2009, INTERSPEECH.

[42]  Simon King,et al.  The Blizzard Challenge 2009 , 2009 .

[43]  James C. Lester,et al.  Developing and Empirically Evaluating Robust Explanation Generators: The KNIGHT Experiments , 1997, Comput. Linguistics.

[44]  Paul Taylor,et al.  Festival Speech Synthesis System , 1998 .

[45]  Jim Hunter,et al.  Choosing words in computer-generated weather forecasts , 2005, Artif. Intell..

[46]  Hua Ai,et al.  Comparing Spoken Dialog Corpora Collected with Recruited Subjects versus Real Users , 2007, SIGDIAL.

[47]  Simon King,et al.  Modelling prominence and emphasis improves unit-selection synthesis , 2007, INTERSPEECH.

[48]  Marilyn A. Walker,et al.  Natural Language Generation in Dialog Systems , 2001, HLT.

[49]  Derek M. Jones The 7+/-2 Urban Legend , 2002 .

[50]  M. Strube,et al.  Using an Annotated Corpus As a Knowledge Source For Language Generation , 2005 .

[51]  Alexander I. Rudnicky,et al.  Stochastic Language Generation for Spoken Dialogue Systems , 2000 .

[52]  Arthur C. Graesser,et al.  Evaluation in Natural Language Generation : The Question Generation Task , 2007 .

[53]  Alan W. Black,et al.  Perfect synthesis for all of the people all of the time , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[54]  R. Michael Young,et al.  Using Grice's maxim of Quantity to select the content of plan descriptions , 1999, Artif. Intell..

[55]  Mark Steedman,et al.  Specifying intonation from context for speech synthesis , 1994, Speech Communication.

[56]  Albert Gatt,et al.  The GREC Main Subject Reference Generation Challenge 2009: Overview and Evaluation Results , 2009 .

[57]  Anja Belz Prodigy-METEO : Pre-Alpha Release Notes ( Nov 2009 ) , 2009 .

[58]  Oliver Lemon,et al.  Author manuscript, published in "European Conference on Speech Communication and Technologies (Interspeech'07), Anvers: Belgium (2007)" Machine Learning for Spoken Dialogue Systems , 2022 .

[59]  Julia Hirschberg,et al.  Comparing lexical, acoustic/prosodic, structural and discourse features for speech summarization , 2005, INTERSPEECH.

[60]  Eduard Hovy,et al.  Towards Emotional Variation in Speech-Based Natural Language Processing , 2002, INLG.

[61]  Tomoki Toda,et al.  High-quality and flexible speech synthesis with segment selection and voice conversion , 2003 .

[62]  Alexander I. Rudnicky,et al.  Olympus: an open-source framework for conversational spoken language interface research , 2007, HLT-NAACL 2007.

[63]  Alan W. Black,et al.  Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[64]  Julia Hirschberg,et al.  Automatic summarization of broadcast news using structural features , 2003, INTERSPEECH.

[65]  Albert Gatt,et al.  The GREC Challenge 2008: Overview and Evaluation Results , 2008, INLG.

[66]  Alan W. Black,et al.  Creating a database of speech in noise for unit selection synthesis , 2004, SSW.

[67]  Nick Campbell,et al.  A corpus-based speech synthesis system with emotion , 2003, Speech Commun..

[68]  Robert Dale,et al.  User Response to Speech Recognition Errors: Consistency of Behaviour Across Domains , 2004 .

[69]  Erwin Marsi Intonation in spoken language generation , 2001 .

[70]  Anja Belz,et al.  Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models , 2008, Natural Language Engineering.

[71]  Michael Gamon,et al.  An Overview of Amalgam: A Machine-learned Generation Module , 2002, INLG.

[72]  Mattias Heldner,et al.  Towards human-like spoken dialogue systems , 2008, Speech Commun..

[73]  Aniket Kittur,et al.  Crowdsourcing user studies with Mechanical Turk , 2008, CHI.

[74]  Robert Dale,et al.  Computational Interpretations of the Gricean Maxims in the Generation of Referring Expressions , 1995, Cogn. Sci..

[75]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[76]  Raymond J. Mooney,et al.  Learning to sportscast: a test of grounded language acquisition , 2008, ICML '08.

[77]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[78]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[79]  Anja Belz,et al.  An Investigation into the Validity of Some Metrics for Automatically Evaluating Natural Language Generation Systems , 2009, CL.

[80]  Julia Hirschberg,et al.  Assigning Intonational Features in Synthesized Spoken Directions , 1988, ACL.

[81]  Somayajulu Sripada,et al.  SUMTIME-METEO: Parallel Corpus of Naturally Occurring Forecast Texts and Weather Data , 2008 .

[82]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[83]  Murdock,et al.  The serial position effect of free recall , 1962 .

[84]  Julia Hirschberg,et al.  Do Summaries Help? A Task-Based Evaluation of Multi-Document Summarization , 2005 .

[85]  Simon King,et al.  Statistical analysis of the Blizzard Challenge 2007 listening test results , 2007 .

[86]  Nathanael Chambers,et al.  Stochastic Language Generation in a Dialogue System: Toward a Domain Independent Generator , 2004, SIGDIAL Workshop.

[87]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[88]  Helmut Horacek Text Generation Methods for Dialog Systems , 2003 .

[89]  Matthew Marge,et al.  Evaluating Evaluation Methods for Generation in the Presence of Variation , 2005, CICLing.

[90]  A. Baddeley The magical number seven: still magic after all these years? , 1994, Psychological review.

[91]  Michael White,et al.  Learning to Say It Well: Reranking Realizations by Predicted Synthesis Quality , 2006, ACL.

[92]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[93]  Anja Belz,et al.  Statistical Generation: Three Methods Compared and Evaluated , 2005, ENLG.

[94]  Shimei Pan,et al.  Spoken language generation in a multimedia system , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[95]  Eric Moulines,et al.  Statistical methods for voice quality transformation , 1995, EUROSPEECH.

[96]  Chris Mellish,et al.  Towards Evaluation in Natural Language Generation , 1998, LREC.

[97]  Alan W. Black,et al.  Improving the understandability of speech synthesis by modeling speech in noise , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[98]  Amanda J. Stent Building Surface Realizers Automatically from Corpora ∗ Huayan Zhong and , 2005 .

[99]  Richard M. Schwartz,et al.  A Methodology for Extrinsic Evaluation of Text Summarization: Does ROUGE Correlate? , 2005, IEEvaluation@ACL.

[100]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[101]  Srinivas Bangalore,et al.  Evaluation Metrics for Generation , 2000, INLG.

[102]  Maxine Eskénazi,et al.  Doing research on a deployed spoken dialogue system: one year of let's go! experience , 2006, INTERSPEECH.

[104]  R. Bakis,et al.  A CORPUS-BASED APPROACH TO < AHEM / > EXPRESSIVE SPEECH SYNTHESIS , 2004 .

[105]  H. Lane,et al.  The Lombard Sign and the Role of Hearing in Speech , 1971 .

[106]  Srinivas Bangalore,et al.  Automatic Acquisition of Hierarchical Transduction Models for Machine Translation , 1998, COLING-ACL.

[107]  R. Likert “Technique for the Measurement of Attitudes, A” , 2022, The SAGE Encyclopedia of Research Design.