Learning to Order Facts for Discourse Planning in Natural Language Generation

This paper presents a machine learning approach to discourse planning in natural language generation. More specifically, we address the problem of learning the most natural ordering of facts in discourse plans for a specific domain. We discuss our methodology and how it was instantiated using two different machine learning algorithms. A quantitative evaluation performed in the domain of museum exhibit descriptions indicates that our approach performs significantly better than manually constructed ordering rules. Being retrainable, the resulting planners can be ported easily to other similar domains, without requiring language technology expertise.

[1]  Ehud Reiter,et al.  Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[2]  Kathleen McKeown,et al.  Content Planner Construction via Evolutionary Algorithms and a Corpus-based Fitness Function , 2002, INLG.

[3]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[4]  Marilyn A. Walker,et al.  SPoT: A Trainable Sentence Planner , 2001, NAACL.

[5]  K. McKeown,et al.  Discourse Strategies for Generating Natural-Language Text , 1985, Artif. Intell..

[6]  Nikiforos Karamanis,et al.  Stochastic Text Structuring Using the Principle of Continuity , 2002, INLG.

[7]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[8]  Kathleen McKeown,et al.  Empirically Estimating Order Constraints for Content Planning in Generation , 2001, ACL.

[9]  Alexander I. Rudnicky,et al.  Stochastic Language Generation for Spoken Dialogue Systems , 2000 .

[10]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[11]  James Shaw,et al.  Ordering Among Premodifiers , 1999, ACL.

[12]  Chris Mellish,et al.  ILEX: an architecture for a dynamic hypertext generation system , 2001, Nat. Lang. Eng..

[13]  Adwait Ratnaparkhi,et al.  Trainable Methods for Surface Natural Language Generation , 2000, ANLP.

[14]  Rob Malouf,et al.  The Order of Prenominal Adjectives in Natural Language Generation , 2000, ACL.

[15]  C. Mellish,et al.  ILEX: an architecture for a dynamic hypertext generation system , 2001, Natural Language Engineering.

[16]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[17]  Ehud Reiter,et al.  Should Corpora Texts Be Gold Standards for NLG? , 2002, INLG.

[18]  Eduard H. Hovy,et al.  Automated Discourse Generation Using Discourse Structure Relations , 1993, Artif. Intell..

[19]  Amy Isard,et al.  Speaking the Users' Languages , 2003, IEEE Intell. Syst..

[20]  Constantine D. Spyropoulos,et al.  Symbolic Authoring for Multilingual Natural Language Generation , 2002, SETN.

[21]  Kevin Knight,et al.  Generation that Exploits Corpus-Based Statistical Knowledge , 1998, ACL.

[22]  Daniel Marcu,et al.  From Local to Global Coherence: A Bottom-Up Approach to Text Planning , 1997, AAAI/IAAI.

[23]  Regina Barzilay,et al.  Inferring Strategies for Sentence Ordering in Multidocument News Summarization , 2002, J. Artif. Intell. Res..

[24]  Min-Yen Kan,et al.  Corpus-trained Text Generation for Summarization , 2002, INLG.

[25]  Owen Rambow Domain Communication Knowledge , 1990, INLG.

[26]  Marilyn A. Walker,et al.  An Application of Reinforcement Learning to Dialogue Strategy Selection in a Spoken Dialogue System for Email , 2000, J. Artif. Intell. Res..

[27]  Srinivas Bangalore,et al.  Exploiting a Probabilistic Hierarchical Model for Generation , 2000, COLING.

[28]  Elena Not,et al.  Generating Multilingual Personalized Descriptions of Museum Exhibits - The M-PIRO Project , 2001, ArXiv.

[29]  Chris Mellish,et al.  Experiments Using Stochastic Search for Text Planning , 1998, INLG.