Better Surface Realization through Psycholinguistics

In this survey, we review recent progress on surface realization in natural language generation (NLG), highlighting how machine learning models have moved beyond n-grams to successfully incorporate linguistic insights into increasingly rich models. We also advance the view that NLG still has much to gain by taking up insights from psycholinguistic studies – not only of human production but also of comprehension. We highlight how realization ranking models can be improved by modeling the role of memory in human language comprehension and discuss how surface realizers might transition to using grammars developed for incremental parsing in computational psycholinguistics, thereby making them more suitable for integration into real-time incremental dialog systems. From a production standpoint, we suggest that the principle of uniform information density has the potential to enhance the theoretical basis for choice making in NLG and discuss two initial steps in this direction. Finally, we conclude our survey with a discussion of prospects for community-based evaluation of surface realization systems.

[1]  T. Florian Jaeger,et al.  Redundancy and reduction: Speakers manage syntactic information density , 2010, Cognitive Psychology.

[2]  Mark Johnson,et al.  How the Statistical Revolution Changes (Computational) Linguistics , 2009 .

[3]  Josef van Genabith,et al.  Dependency-Based N-Gram Models for General Purpose Sentence Realisation , 2008, COLING.

[4]  Jon Oberlander Do the Right Thing ... but Expect the Unexpected , 1998, Comput. Linguistics.

[5]  Gerard Kempen,et al.  An Incremental Procedural Grammar for Sentence Formulation , 1987, Cogn. Sci..

[6]  J. K. Bock Syntactic persistence in language production , 1986, Cognitive Psychology.

[7]  Maryellen C. MacDonald,et al.  Plausibility and grammatical agreement , 2003 .

[8]  Stephan Oepen,et al.  Maximum Entropy Models for Realization Ranking , 2005 .

[9]  Kees van Deemter Towards a Probabilistic Version of Bidirectional OT Syntax and Semantics , 2004, J. Semant..

[10]  Michael White,et al.  Linguistically Motivated Complementizer Choice in Surface Realization , 2011 .

[11]  Claire Gardent,et al.  Generating and Selecting Grammatical Paraphrases , 2005, ENLG.

[12]  Frank Keller,et al.  Data from eye-tracking corpora as evidence for theories of syntactic processing complexity , 2008, Cognition.

[13]  Michael White,et al.  Hypertagging: Supertagging for Surface Realization with CCG , 2008, ACL.

[14]  M. Guhe Incremental Conceptualization for Language Production , 2020 .

[15]  Hany Hassan,et al.  Incremental Combinatory Categorial Grammar and Its Derivations , 2011, CICLing.

[16]  John R Anderson,et al.  An integrated theory of the mind. , 2004, Psychological review.

[17]  Josef van Genabith,et al.  DCU at Generation Challenges 2011 Surface Realisation Track , 2011, ENLG.

[18]  John Hale,et al.  A Probabilistic Earley Parser as a Psycholinguistic Model , 2001, NAACL.

[19]  Shashi Narayan,et al.  Structure-Driven Lexicalist Generation , 2012, COLING.

[20]  Josef van Genabith,et al.  Robust PCFG-Based Generation Using Automatically Acquired LFG Approximations , 2006, ACL.

[21]  Amanda Stent,et al.  Determining the position of adverbial phrases in English , 2009, HLT-NAACL.

[22]  Srinivas Bangalore,et al.  Exploiting a Probabilistic Hierarchical Model for Generation , 2000, COLING.

[23]  Jun'ichi Tsujii,et al.  Probabilistic Models for Disambiguation of an HPSG-Based Chart Generator , 2005, IWPT.

[24]  Richard L. Lewis,et al.  An Activation-Based Model of Sentence Processing as Skilled Memory Retrieval , 2005, Cogn. Sci..

[25]  Günter Neumann,et al.  Self-Monitoring with Reversible Grammars , 1992, COLING.

[26]  Rena Torres Cacoullos,et al.  On the persistence of grammar in discourse formulas: a variationist study of that , 2009 .

[27]  B. Velichkovsky,et al.  Eye typing in application: A comparison of two interfacing systems with ALS patients , 2008 .

[28]  Frank Keller,et al.  Incremental, Predictive Parsing with Psycholinguistically Motivated Tree-Adjoining Grammar , 2013, CL.

[29]  Daniel Gildea,et al.  Optimizing Grammars for Minimum Dependency Length , 2007, ACL.

[30]  Stephen T. Wu,et al.  Complexity Metrics in an Incremental Right-Corner Parser , 2010, ACL.

[31]  Irene Langkilde Forest-Based Statistical Sentence Generation , 2000, ANLP.

[32]  Michael White,et al.  Further Meta-Evaluation of Broad-Coverage Surface Realization , 2010, EMNLP.

[33]  Richard L. Lewis,et al.  Argument-Head Distance and Processing Complexity: Explaining both Locality and Antilocality Effects , 2006 .

[34]  Martin Kay,et al.  Chart Generation , 1996, ACL.

[35]  Michael White,et al.  That's Not What I Meant! Using Parsers to Avoid Structural Ambiguities in Generated Text , 2014, ACL.

[36]  Michael White,et al.  Shared Task Proposal: Syntactic Paraphrase Ranking , 2012, INLG.

[37]  Aoife Cahill Correlating Human and Automatic Evaluation of a German Surface Realiser , 2009, ACL/IJCNLP.

[38]  Dan Klein,et al.  A Simple Domain-Independent Probabilistic Approach to Generation , 2010, EMNLP.

[39]  Gabriel Skantze,et al.  Towards Incremental Speech Generation in Dialogue Systems , 2010, SIGDIAL Conference.

[40]  Michael Gamon,et al.  Linguistically Informed Statistical Models of Constituent Structure for Ordering in Sentence Realization , 2004, COLING.

[41]  Brian Roark,et al.  Syntactic complexity measures for detecting Mild Cognitive Impairment , 2007, BioNLP@ACL.

[42]  Francis Chantree,et al.  Identifying Nocuous Ambiguities in Natural Language Requirements , 2006, 14th IEEE International Requirements Engineering Conference (RE'06).

[43]  Robert J. Hartsuiker,et al.  Object Attraction in Subject-Verb Agreement Construction , 2001 .

[44]  Karin Harbusch,et al.  Generating Natural Word Orders in a Semi?free Word Order Language: Treebank-Based Linearization Preferences for German , 2004, CICLing.

[45]  Michael Strube,et al.  Tree Linearization in English: Improving Language Model Based Approaches , 2009, NAACL.

[46]  Benoit Favre,et al.  from deep representation to surface , 2011 .

[47]  Josef van Genabith,et al.  Exploiting Multi-Word Units in History-Based Probabilistic Generation , 2007, EMNLP-CoNLL.

[48]  Kees van Deemter,et al.  Generation of Referring Expressions: Managing Structural Ambiguities , 2008, COLING.

[49]  Charles Callaway The Types and Distributions of Errors in a Wide Coverage Surface Realizer Evaluation , 2005, ENLG.

[50]  Michael White,et al.  Perceptron Reranking for CCG Realization , 2009, EMNLP.

[51]  Jun'ichi Tsujii,et al.  Probabilistic Disambiguation Models for Wide-Coverage HPSG Parsing , 2005, ACL.

[52]  Stephan Oepen,et al.  High Efficiency Realization for a Wide-Coverage Unification Grammar , 2005, IJCNLP.

[53]  K. Bock Regulating mental energy: Performance units in language production , 1992 .

[54]  Helen F. Hastie,et al.  Optimising Incremental Dialogue Decisions Using Information Density for Interactive Systems , 2012, EMNLP-CoNLL.

[55]  James R. Curran,et al.  Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models , 2007, Computational Linguistics.

[56]  David Temperley,et al.  Minimization of dependency length in written English , 2007, Cognition.

[57]  Michael White,et al.  Learning to Say It Well: Reranking Realizations by Predicted Synthesis Quality , 2006, ACL.

[58]  Helen F. Hastie,et al.  Conditional Random Fields for Responsive Surface Realisation using Global Features , 2013, ACL.

[59]  Anja Belz,et al.  The First Surface Realisation Shared Task: Overview and Evaluation Results , 2011, ENLG.

[60]  Mark Steedman,et al.  CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank , 2007, CL.

[61]  William Schuler,et al.  Broad-Coverage Parsing Using Human-Like Memory Constraints , 2010, CL.

[62]  Victor H. Yngve,et al.  A model and an hypothesis for language structure , 1960 .

[63]  João Costa,et al.  A multifactorial approach to adverb placement: assumptions, facts, and problems , 2004 .

[64]  Michael White,et al.  Designing Agreement Features for Realization Ranking , 2010, COLING.

[65]  Aoife Cahill,et al.  Incorporating Information Status into Generation Ranking , 2009, ACL/IJCNLP.

[66]  Michael White,et al.  Efficient Realization of Coordinate Structures in Combinatory Categorial Grammar , 2006 .

[67]  Michael White,et al.  Minimal Dependency Length in Realization Ranking , 2012, EMNLP.

[68]  Michael Elhadad,et al.  FUF: the Universal Unifier User Manual Version 5.2 , 1991 .

[69]  Mirella Lapata,et al.  Unsupervised Concept-to-text Generation with Hypergraphs , 2012, NAACL.

[70]  Robert L. Goldstone Returning to a New Home , 2005, Cogn. Sci..

[71]  David Schlangen,et al.  Collaborating on Utterances with a Spoken Dialogue System Using an ISU-based Approach to Incremental Dialogue Management , 2010, SIGDIAL Conference.

[72]  Michael Strube,et al.  Classification-Based Generation Using TAG , 2004, INLG.

[73]  Sali A. Tagliamonte,et al.  No momentary fancy! The zero ‘complementizer’ in English dialects , 2005, English Language and Linguistics.