Corpus-trained Text Generation for Summarization

We explore how machine learning can be employed to learn rulesets for the traditional modules of content planning and surface realization. Our approach takes advantage of semantically annotated corpora to induce preferences for content planning and constraints on realizations of these plans. We applied this methodology to an annotated corpus of indicative summaries to derive constraint rules that can assist in generating summaries for new, unseen material.

[1]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[2]  James Shaw,et al.  Ordering Among Premodifiers , 1999, ACL.

[3]  Elizabeth D. Liddy,et al.  Advances in Automatic Text Summarization , 2001, Information Retrieval.

[4]  Hinrich Schütze,et al.  Automatic Detection of Text Genre , 1997, ACL.

[5]  Elizabeth Du,et al.  The discourse-level structure of empirical abstracts: an exploratory study , 1991, Inf. Process. Manag..

[6]  Dragomir R. Radev Learning Correlations between Linguistic Indicators and Semantic Constraints: Reuse of Context-Dependent Descriptions of Entities , 1998, ACL.

[7]  Chris Mellish,et al.  Instance-based natural language generation , 2001, HTL 2001.

[8]  Min-Yen Kan,et al.  Applying Natural Language Generation to Indicative Summarization , 2001, EWNLG@ACL.

[9]  Jussi Karlgren,et al.  Recognizing Text Genres With Simple Metrics Using Discriminant Analysis , 1994, COLING.

[10]  Srinivas Bangalore,et al.  Exploiting a Probabilistic Hierarchical Model for Generation , 2000, COLING.

[11]  Min-Yen Kan,et al.  Using librarian techniques in automatic text summarization for information retrieval , 2002, JCDL '02.

[12]  Padmini Srinivasan,et al.  An investigation of content representation using text grammars , 1993, TOIS.

[13]  Irene Langkilde-Geary,et al.  Forest-Based Statistical Sentence Generation , 2000, ANLP.

[14]  Michael Collins,et al.  A New Statistical Parser Based on Bigram Lexical Dependencies , 1996, ACL.

[15]  Helen R. Tibbo The art of abstracting , 1997 .

[16]  Dragomir R. Radev Learning Correlations between Linguistic Indicators and Semantic Constraints: Reuse of Context-Dependent Descriptions of Entities , 1998, COLING.

[17]  Kathleen McKeown,et al.  Empirically Estimating Order Constraints for Content Planning in Generation , 2001, ACL.

[18]  Regina Barzilay,et al.  Sentence Ordering in Multidocument Summarization , 2001, HLT.

[19]  Alexander I. Rudnicky,et al.  Stochastic Language Generation for Spoken Dialogue Systems , 2000 .

[20]  D. Biber A typology of English texts , 1989 .

[21]  Kathleen R. McKeown,et al.  Domain-specific informative and indicative summarization for information retrieval , 2001 .

[22]  Adwait Ratnaparkhi,et al.  Trainable Methods for Surface Natural Language Generation , 2000, ANLP.

[23]  Regina Barzilay,et al.  Extracting Paraphrases from a Parallel Corpus , 2001, ACL.