Avoiding Repetition in Generated Text

We investigate two methods for enhancing variation in the output of a stochastic surface realiser: choosing from among the highest-scoring realisation candidates instead of taking the single highest-scoring result (e-best sampling), and penalising the words from earlier sentences in a discourse when generating later ones (anti-repetition scoring). In a human evaluation study, subjects were asked to compare texts generated with and without the variation enhancements. Strikingly, subjects judged the texts generated using these two methods to be better written and less repetitive than the texts generated with optimal n-gram scoring; at the same time, no significant difference in understandability was found between the two versions. In analysing the two methods, we show that the simpler e-best sampling method is considerably more prone to introducing dispreferred variants into the output, indicating that best results can be obtained using anti repetition scoring with strict or no e-best sampling.

[1]  Craige Roberts Modal subordination and pronominal anaphora in discourse , 1989 .

[2]  Allan Ramsay,et al.  Sarcasm, Deception, and Stating the Obvious: Planning Dialogue without Speech Acts , 2004, Artificial Intelligence Review.

[3]  Jim Hunter,et al.  Automatic Generation of Textual Summaries from Neonatal Intensive Care Data , 2007, AIME.

[4]  Nicholas R. Jennings,et al.  The Semantic Grid: Past, Present, and Future , 2005, Proceedings of the IEEE.

[5]  Chris Mellish,et al.  Natural Language Directed Inference in the Presentation of Ontologies , 2005, ENLG.

[6]  Ehud Reiter,et al.  Book Reviews: Building Natural Language Generation Systems , 2000, CL.


[8]  Martin Kay,et al.  Syntactic Process , 1979, ACL.

[9]  Tony Stockman,et al.  The Design and Evaluation of Auditory Access to Spreadsheets , 2004, ICAD.

[10]  Richard E. Mayer,et al.  Cross-Cultural Evaluation of Politeness in Tactics for Pedagogical Agents , 2005, AIED.

[11]  Kalina Bontcheva,et al.  Automatic Report Generation from Ontologies: The MIAKT Approach , 2004, NLDB.

[12]  Eric K. Ringger,et al.  A Robust System for Natural Spoken Dialogue , 1996, ACL.

[13]  P. Hewson,et al.  Accommodation of a scientific conception: Toward a theory of conceptual change , 1982 .

[14]  Simone Paolo Ponzetto,et al.  WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[15]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[16]  Nicolas Hernandez,et al.  Recognizing Textual Parallelisms with Edit Distance and Similarity Degree , 2006, EACL.

[17]  Claire Gardent,et al.  Generating Minimal Definite Descriptions , 2002, ACL.

[18]  Michael White CCG Chart Realization from Disjunctive Inputs , 2006, INLG.

[19]  Andrew Kehler,et al.  Coherence, reference, and the theory of grammar , 2002, CSLI lecture notes series.

[20]  Robert Dale,et al.  Generating referring expressions in a domain of objects and processes (language representation) , 1988 .

[21]  Ehud Reiter,et al.  Generating Readable Texts for Readers with Low Basic Skills , 2005, ENLG.

[22]  Ruslan Mitkov A Text Generation System For Explaining Concepts In Geometry , 1990, COLING.

[23]  Juan Rafael Zamorano-Mansilla,et al.  Text generators , error analysis and feedback , 2004 .

[24]  Veit Reuer Error Recognition and Feedback with Lexical Functional Grammar , 2003 .

[25]  Cécile Paris,et al.  The role of the user's domain knowledge in generation , 1991, Comput. Intell..

[26]  Mariët Theune,et al.  Performing aggregation and ellipsis using discourse structures , 2007 .

[27]  Irene Langkilde-Geary,et al.  An Empirical Verification of Coverage and Correctness for a General-Purpose Sentence Generator , 2002, INLG.

[28]  Matthew Stone,et al.  Speaking with hands: creating animated conversational characters from recordings of human performance , 2004, ACM Trans. Graph..

[29]  Robert Dale Generating recipes: an overview of epicure , 1990 .

[30]  Alison Cawsey,et al.  Generating Interactive Explanations , 1991, AAAI.

[31]  Andreas Stolcke,et al.  Enriching speech recognition with automatic detection of sentence boundaries and disfluencies , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[32]  Staffan Larsson,et al.  Information state and dialogue management in the TRINDI dialogue move engine toolkit , 2000, Natural Language Engineering.

[33]  Avrim Blum,et al.  Fast Planning Through Planning Graph Analysis , 1995, IJCAI.

[34]  Kurt VanLehn,et al.  Interactive Conceptual Tutoring in Atlas-Andes , 2002 .

[35]  Reva Freedman,et al.  Annotation of Tutorial Dialogue Goals for Natural Language Generation , 2006 .

[36]  Chris Mellish,et al.  Beyond Elaboration: The Interaction of Relations and Focus in Coherent Text , 2000 .

[37]  Rolf Schwitter,et al.  Controlled Natural Languages meets the Semantic Web , 2004 .

[38]  Elizabeth D. Liddy,et al.  Advances in Automatic Text Summarization , 2001, Information Retrieval.

[39]  Alain Polguère,et al.  Lexical Selection and Paraphrase in a Meaning-Text Generation Model , 1991 .

[40]  A. Kratzer The Notional Category of Modality , 2008 .

[41]  Heather H. Mitchell,et al.  Toward a Taxonomy of a Set of Discourse Markers in Dialog: A Theoretical and Computational Linguistic Account , 2003 .

[42]  Kalina Bontcheva,et al.  Open-source Tools for Creation, Maintenance, and Storage of Lexical Resources for Language Generation from Ontologies , 2004, LREC.

[43]  Michael White,et al.  Efficient Realization of Coordinate Structures in Combinatory Categorial Grammar , 2006 .

[44]  Emiel Krahmer,et al.  Squibs and Discussions: Real versus Template-Based Natural Language Generation: A False Opposition? , 2005, CL.

[45]  Johanna D. Moore,et al.  A Study of Feedback Strategies in Foreign Language Classrooms and Tutorials with Implications for Intelligent Computer-Assisted Language Learning Systems , 2007, Int. J. Artif. Intell. Educ..

[46]  Mirella Lapata,et al.  Automatic Evaluation of Text Coherence: Models and Representations , 2005, IJCAI.

[47]  Lori A. Westerkamp,et al.  Performance measures for summarizing confusion matrices: the AFRL COMPASE approach , 2002, SPIE Defense + Commercial Sensing.

[48]  O. M.O.D ILEX: An Architecture for a Dynamic Hypertext Generation System , 2001 .

[49]  Dirk Heylen,et al.  Emotional Characters for Automatic Plot Creation , 2004, TIDSE.

[50]  P. Edwards,et al.  A tree full of leaves: description logic and data documentation , 2006 .

[51]  Robert Dale,et al.  Algorithms for Generating Referring Expressions: Do They Do What People Do? , 2006, INLG.

[52]  M. Kay,et al.  Ambiguity management in natural language generation , 1997 .

[53]  Michael Elhadad,et al.  Using argumentation in text generation , 1995 .

[54]  Min-Yen Kan,et al.  Applying Natural Language Generation to Indicative Summarization , 2001, EWNLG@ACL.

[55]  Anja Belz,et al.  Comparing Automatic and Human Evaluation of NLG Systems , 2006, EACL.

[56]  Jill Burstein,et al.  Automated Essay Scoring : A Cross-disciplinary Perspective , 2003 .

[58]  Graham Wilcock Talking OWLs: Towards an Ontology Verbalizer , 2003 .

[59]  Clara Mancini,et al.  Cinematic hypertext. Investigating a new paradigm , 2005 .

[60]  Naoaki Okazaki,et al.  Improving Chronological Sentence Ordering by Precedence Relation , 2004, COLING.

[61]  Francis Chantree Ambiguity Management in Natural Language Generation , 2003 .

[62]  Mirella Lapata,et al.  Probabilistic Text Structuring: Experiments with Sentence Ordering , 2003, ACL.

[63]  Richard Power,et al.  Generating monologue and dialogue to present personalised medical information to patients , 2007, ENLG.

[64]  James C. Lester,et al.  Narrative prose generation , 2001, Artif. Intell..

[65]  Jean Carletta,et al.  Extractive summarization of meeting recordings , 2005, INTERSPEECH.

[66]  Mariët Theune,et al.  The automatic generation of narratives , 2007, CLIN 2007.

[67]  Richard Power,et al.  What You See Is What You Meant: direct knowledge editing with natural language feedback , 1998, ECAI.

[68]  Jeannett Martin,et al.  Writing Science: Literacy And Discursive Power , 1993 .

[69]  Leonard Talmy,et al.  Force Dynamics in Language and Cognition , 1987, Cogn. Sci..

[70]  Claus Zinn,et al.  Generating Tutorial Feedback with Affect , 2004, FLAIRS.

[71]  Richard Fikes,et al.  Learning and Executing Generalized Robot Plans , 1993, Artif. Intell..

[72]  John Carroll,et al.  An Efficient Chart Generator for (Semi-)Lexicalist Grammars , 2001 .

[73]  Karin Harbusch,et al.  Performance Grammar: a Declarative Definition , 2001, CLIN.

[74]  Allan Ramsay Weak Lexical Semantics and Multiple Views , 2001 .

[75]  Allan Ramsay,et al.  Theorem proving for untyped constructive λ-calculus : implementation and application , 2001 .

[76]  Richard Power,et al.  Multilingual Authoring Using Feedback Texts , 1998, COLING-ACL.


[78]  Raija Markkanen,et al.  Hedging and discourse : approaches to the analysis of a pragmatic phenomenon in academic texts , 1997 .

[79]  Gerard Kempen Interactive visualization of syntactic structure assembly for grammar-intensive first- and second-language instruction , 2004 .

[80]  Tom Werner Future and non-future modal sentences , 2007 .

[81]  Marilyn A. Walker,et al.  Improvising linguistic style: social and affective bases for agent personality , 1997, AGENTS '97.

[82]  Iryna Gurevych,et al.  Using the Structure of a Conceptual Network in Computing Semantic Relatedness , 2005, IJCNLP.

[83]  Liesbeth Degand,et al.  A contrastive study of Dutch and French causal connectives on the speaker involvement scale , 2003 .

[84]  Ehud Reiter,et al.  SumTime-Mousam: Configurable marine weather forecast generator , 2003 .

[85]  XTAG Research Group,et al.  A Lexicalized Tree Adjoining Grammar for English , 1998, ArXiv.

[86]  T. Gonen,et al.  Questions , 1927, Journal of Family Planning and Reproductive Health Care.

[87]  Jimmy J. Lin,et al.  Natural Language Annotations for the Semantic Web , 2002, OTM.

[88]  Emiel Krahmer,et al.  Efficient context-sensitive generation of referring expressions , 2002 .

[89]  Henk Zeevat,et al.  Particles: Presupposition triggers, context markers or speech act markers , 2004 .

[90]  Kathleen F. McCoy,et al.  Generating Anaphoric Expressions: Pronoun or Definite Description? , 1999 .

[91]  Ehud Reiter,et al.  Lessons from a failure: Generating tailored smoking cessation letters , 2003, Artif. Intell..

[92]  Kevin Knight,et al.  Generation that Exploits Corpus-Based Statistical Knowledge , 1998, ACL.

[93]  Allan Ramsay,et al.  Planning ramifications: when ramifications are the norm, not the problem , 2006 .

[94]  Ehud Reiter,et al.  An Architecture for Data-to-Text Systems , 2007, ENLG.

[95]  Mirella Lapata,et al.  Modeling Local Coherence: An Entity-Based Approach , 2005, ACL.

[96]  Daniel S. Paiva July A Survey of Applied Natural Language Generation Systems , 1998 .

[97]  Armin Fiedler Macroplanning with a Cognitive Architecture for the Adaptive Explanation of Proofs , 1998, INLG.

[98]  Matthew Marge,et al.  Evaluating Evaluation Methods for Generation in the Presence of Variation , 2005, CICLing.

[99]  Alexander I. Rudnicky,et al.  Stochastic natural language generation for spoken dialog systems , 2002, Comput. Speech Lang..

[100]  A. Knott,et al.  Using Linguistic Phenomena to Motivate a Set of Coherence Relations. , 1994 .

[101]  Regina Barzilay,et al.  Towards Multidocument Summarization by Reformulation: Progress and Prospects , 1999, AAAI/IAAI.

[102]  Irene Langkilde-Geary,et al.  Forest-Based Statistical Sentence Generation , 2000, ANLP.

[103]  N. Cocchiarella,et al.  Situations and Attitudes. , 1986 .

[104]  Daniel Marcu,et al.  Syntax-based Alignment of Multiple Translations: Extracting Paraphrases and Generating New Sentences , 2003, NAACL.

[105]  Scott Weinstein,et al.  Centering: A Framework for Modeling the Local Coherence of Discourse , 1995, CL.

[106]  Elisabeth André,et al.  Ein planbasierter Ansatz zur Generierung multimedialer Präsentationen , 1995, DISKI.

[107]  Sabine Geldof,et al.  Generating more natural route descriptions , 2002 .

[108]  B. Sherman,et al.  Semantics of Interrogatives , 2006 .

[109]  Chris Mellish,et al.  The semantic web as a Linguistic resource: Opportunities for natural language generation , 2005, Knowl. Based Syst..

[110]  Simon Buckingham Shum,et al.  Visualising discourse coherence in nonlinear documents , 2006, TAL.

[111]  Marieke Guy,et al.  Folksonomies: Tidying Up Tags? , 2006, D Lib Mag..

[112]  A. Papafragou Epistemic modality and truth conditions , 2006 .

[113]  K. VanLehn,et al.  Why Do Only Some Events Cause Learning During Human Tutoring? , 2003 .

[114]  Kathryn Riley,et al.  Parallels between visual and textual processing , 1998 .

[115]  Julia Hirschberg,et al.  Empirical Studies on the Disambiguation of Cue Phrases , 1993, Comput. Linguistics.

[116]  木村 和夫 Pragmatics , 1997, Language Teaching.

[117]  Vangelis Karkaletsis,et al.  Exploiting OWL Ontologies in the Multilingual Generation of Object Descriptions , 2005, ENLG.

[118]  Staffan Larsson,et al.  GoDiS- An Accommodating Dialogue System , 2000 .

[119]  Diane J. Litman,et al.  Cue Phrase Classification Using Machine Learning , 1996, J. Artif. Intell. Res..

[120]  Sabine Geldof Corpus-analysis for NLG , 2003, ENLG@EACL.

[121]  Emiel Krahmer,et al.  Graph-Based Generation of Referring Expressions , 2003, CL.

[122]  Manolis Mavrikis,et al.  Diagnosing and acting on student affect: the tutor’s perspective , 2008, User Modeling and User-Adapted Interaction.

[123]  Allan Ramsay,et al.  Models and Discourse Models , 2008 .

[124]  Shalom Lappin,et al.  An Algorithm for Pronominal Anaphora Resolution , 1994, CL.

[125]  Penelope Brown,et al.  Politeness: Some Universals in Language Usage , 1989 .

[126]  Peter Aczel,et al.  Non-well-founded sets , 1988, CSLI lecture notes series.

[127]  A. Cardoso,et al.  Cross-Domain Analogy in Automated Text Generation , 2006 .

[128]  John Seely Brown,et al.  Diagnostic Models for Procedural Bugs in Basic Mathematical Skills , 1978, Cogn. Sci..

[129]  Kaska Porayska-Pomsta,et al.  Influence of situational context on language production : modelling teachers' corrective responses , 2004 .

[130]  Arthur C. Graesser,et al.  Dialog Move Generation and Conversation Management in AutoTutor , 2000 .

[131]  George Lakoff,et al.  Hedges: A study in meaning criteria and the logic of fuzzy concepts , 1973, J. Philos. Log..

[132]  Stephan Oepen,et al.  High Efficiency Realization for a Wide-Coverage Unification Grammar , 2005, IJCNLP.

[133]  Trude Heift,et al.  Error Diagnosis and Error Correction in CALL. , 2003 .

[134]  Candace L. Sidner,et al.  An Artificial Discourse Language for Collaborative Negotiation , 1994, AAAI.

[135]  Frank van Harmelen,et al.  A semantic web primer , 2004 .

[136]  Chris Mellish,et al.  Domain Independent Sentence Generation from RDF Representations for the Semantic Web , 2006 .

[137]  Jon Oberlander,et al.  Data-Driven Generation of Emphatic Facial Displays , 2006, EACL.

[138]  Kalina Bontcheva Generating Tailored Textual Summaries from Ontologies , 2005, ESWC.

[139]  Douglas E. Appelt,et al.  Planning English Sentences , 1988, Cogn. Sci..

[140]  Mirella Lapata,et al.  Automatic Evaluation of Information Ordering: Kendall’s Tau , 2006, CL.

[141]  Martin Kay,et al.  Chart Generation , 1996, ACL.

[142]  Mattijs Ghijsen,et al.  Generating Socially Appropriate Tutorial Dialog , 2004, ADS.

[143]  Albert Gatt Generating Collective Spatial References , 2006 .

[144]  Henry S. Thompson,et al.  Towards A Computational Model Of Poetry Generation , 2000 .

[145]  Karin Harbusch,et al.  A Generation-Oriented Workbench for Performance Grammar: Capturing Linear Order Variability in German and Dutch , 2006, INLG.

[146]  Nancy Ide,et al.  International Standard for a Linguistic Annotation Framework , 2003, Natural Language Engineering.

[147]  Gerard Kempen Visual Grammar: Multimedia for grammar and spelling instruction in primary education , 1999 .

[148]  Robert Dale,et al.  Building Natural Language Generation Systems (Studies in Natural Language Processing) , 2006 .

[149]  Chris Mellish,et al.  Conversation in the museum: experiments in dynamic hypermedia with the intelligent labelling explorer , 1998, New Rev. Hypermedia Multim..

[150]  Dragomir R. Radev,et al.  Generating Natural Language Summaries from Multiple On-Line Sources , 1998, CL.

[151]  Robert Dale,et al.  Generating Referring Expressions Involving Relations , 1991, EACL.

[152]  Regina Barzilay,et al.  Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment , 2003, NAACL.

[153]  Wolf Günther Koch,et al.  Jaques Bertin’s theory of graphics and its development and influence on multimedia cartography , 2001 .

[154]  Claus Zinn,et al.  Using dialogue to learn math in the LeActiveMath project , 2006 .

[155]  Janyce Wiebe,et al.  Word-Sense Distinguishability and Inter-Coder Agreement , 1998, EMNLP.

[156]  Davide Fossati,et al.  Natural Language Generation for Intelligent Tutoring Systems: a case study , 2005, AIED.

[157]  Andrei Popescu-Belis,et al.  Towards Automatic Identification of Discourse Markers in Dialogs: The Case of Like , 2004, SIGDIAL Workshop.

[158]  Mariët Theune,et al.  The Narrator: NLG for digital storytelling , 2007, ENLG.

[159]  Philip R. Cohen,et al.  Accommodation, Meaning, and Implicature: Interdisciplinary Foundations for Pragmatics , 2003 .

[160]  Ralf Engel SPIN : A Semantic Parser for Spoken Dialog Systems , 2006 .

[161]  Perry J. Hardin,et al.  Comparing main diagonal entries in normalized confusion matrices: a bootstrapping approach , 1999, IEEE 1999 International Geoscience and Remote Sensing Symposium. IGARSS'99 (Cat. No.99CH36293).

[162]  Kathleen R. McKeown,et al.  Applying the Pyramid Method in DUC 2005 , 2005 .

[163]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[164]  Regina Barzilay,et al.  Extracting Paraphrases from a Parallel Corpus , 2001, ACL.

[165]  Steffen Staab,et al.  CREAM: creating relational metadata with a component-based, ontology-driven annotation framework , 2001, K-CAP '01.

[166]  Siobhan Chapman Logic and Conversation , 2005 .

[167]  Robert Dale,et al.  Computational Interpretations of the Gricean Maxims in the Generation of Referring Expressions , 1995, Cogn. Sci..

[168]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[169]  Michael Glass,et al.  Learning from a Computer Tutor with Natural Language Capabilities , 2003, Interact. Learn. Environ..

[170]  Véronique Moriceau Generating Intelligent Numerical Answers in a Question-Answering System , 2006, INLG.

[171]  Jon Oberlander,et al.  Source authoring for multilingual generation of personalised object descriptions , 2006, Natural Language Engineering.

[172]  Srinivas Bangalore,et al.  Exploiting a Probabilistic Hierarchical Model for Generation , 2000, COLING.

[173]  Diane J. Litman,et al.  Correlations between dialogue acts and learning in spoken tutoring dialogues , 2006, Natural Language Engineering.

[174]  Ehud Reiter,et al.  Knowledge Acquisition for Natural Language Generation , 2000, INLG.

[175]  Abraham Bernstein,et al.  GINO - A Guided Input Natural Language Ontology Editor , 2006, SEMWEB.

[176]  Frank W. Newell Style. Toward Clarity and Grace , 1991 .

[177]  Alison Cawsey Generating explanatory discourse , 1990 .

[178]  Chris Mellish,et al.  Using a Corpus of Sentence Orderings Defined by Many Experts to Evaluate Metrics of Coherence for Text Structuring , 2005, ENLG.

[179]  K. de Glopper Boekbespreking: U. Schuurs, Leren schrijven voor lezers. Het effect van drie vormen van probleemgericht schrijfonderwijs op de zinsbouwvaardigheid. , 1991 .

[180]  Massimo Poesio,et al.  Discourse Annotation and Semantic Annotation in the GNOME corpus , 2004, Proceedings of the 2004 ACL Workshop on Discourse Annotation - DiscAnnotation '04.

[181]  Roland Hindmarsh,et al.  Cambridge English Lexicon , 1980 .

[182]  Ehud Reiter,et al.  Generating descriptions that exploit a user's domain knowledge , 1990 .

[183]  Lynn M. Berk English Syntax: From Word to Discourse , 1999 .

[184]  Anna S. Law,et al.  A Comparison of Graphical and Textual Presentations of Time Series Data to Support Medical Decision Making in the Neonatal Intensive Care Unit , 2005, Journal of Clinical Monitoring and Computing.

[185]  Kees van Deemter,et al.  Generating referring expressions containing quantifiers , 2005 .

[186]  Catalina Hallett Generic Querying of Relational Databases using Natural Language Generation Techniques , 2006, INLG.

[187]  Amy Isard,et al.  Individuality and Alignment in Generated Dialogues , 2006, INLG.

[188]  Mary Ellen Foster,et al.  Techniques for Text Planning with XSLT , 2004, NLPXML@ACL.

[189]  Helmut Horacek,et al.  A Model for Adapting Explanations to the User's Likely Inferences , 2004, User Modeling and User-Adapted Interaction.

[190]  Kalina Bontcheva,et al.  The Semantic Web : A New Opportunity and Challenge for Human Language Technology , 2003 .

[191]  Hans-Joachim Novak,et al.  Generating a Coherent Text Describing a Traffic Scene , 1986, COLING.

[192]  John C. Mellon Transformational Sentence Combining: A Method for Enhancing the Development of Syntactic Fluency in English Composition , 1969 .

[193]  Justine Cassell,et al.  Negotiated Collusion: Modeling Social Language and its Relationship Effects in Intelligent Agents , 2003, User Modeling and User-Adapted Interaction.

[194]  Elke Teich,et al.  Selective Information Presentation in an Integrated Publication System: An Application of Genre-Driven Text Generation , 1995, Inf. Process. Manag..

[195]  Ivo Swartjes,et al.  A Fabula Model for Emergent Narrative , 2006, TIDSE.

[196]  H. W. Zeevat The syntax semantics interface of speech act markers , 2003 .

[197]  C. Halaschek-Wiener,et al.  Effective NL Paraphrasing of Ontologies on the Semantic Web , 2005 .

[198]  Ingrid Zukerman,et al.  Generating Concise Discourse that Addresses a Users Inferences , 1993, IJCAI.

[199]  Michael White,et al.  Designing an Extensible API for Integrating Language Modeling and Realization , 2005, ACL 2005.