Title Generation with Quasi-Synchronous Grammar

The task of selecting information and rendering it appropriately appears in multiple contexts in summarization. In this paper we present a model that simultaneously optimizes selection and rendering preferences. The model operates over a phrase-based representation of the source document which we obtain by merging PCFG parse trees and dependency graphs. Selection preferences for individual phrases are learned discriminatively, while a quasi-synchronous grammar (Smith and Eisner, 2006) captures rendering preferences such as paraphrases and compressions. Based on an integer linear programming formulation, the model learns to generate summaries that satisfy both types of preferences, while ensuring that length, topic coverage and grammar constraints are met. Experiments on headline and image caption generation show that our method obtains state-of-the-art performance using essentially the same model for both tasks without any major modifications.

[1]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[2]  Mark Dras,et al.  Tree adjoining grammar and the reluctant paraphrasing of text , 1999 .

[3]  Hongyan Jing,et al.  Sentence Reduction for Automatic Text Summarization , 2000, ANLP.

[4]  Michele Banko,et al.  Headline Generation Based on Statistical Translation , 2000, ACL.

[5]  Kathleen McKeown,et al.  Cut and Paste Based Text Summarization , 2000, ANLP.

[6]  Hongyan Jing,et al.  Using Hidden Markov Modeling to Decompose Human-Written Summaries , 2002, Computational Linguistics.

[7]  Daniel Marcu,et al.  A Noisy-Channel Model for Document Compression , 2002, ACL.

[8]  Chin-Yew Lin Improving summarization performance by sentence compression: a pilot study , 2003, IRAL.

[9]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[10]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[11]  Richard M. Schwartz,et al.  Hedge Trimmer: A Parse-and-Trim Approach to Headline Generation , 2003, HLT-NAACL 2003.

[12]  Richard M. Schwartz,et al.  BBN/UMD at DUC-2004: Topiary , 2004 .

[13]  David A. Smith,et al.  Quasi-Synchronous Grammars: Alignment by Soft Projection of Syntactic Dependencies , 2006, WMT@HLT-NAACL.

[14]  Daniel Marcu,et al.  Practical structured learning techniques for natural language processing , 2006 .

[15]  Tobias Achterberg,et al.  Constraint integer programming , 2007 .

[16]  Daniel Marcu,et al.  Abstractive headline generation using WIDL-expressions , 2007, Inf. Process. Manag..

[17]  Noah A. Smith,et al.  What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA , 2007, EMNLP.

[18]  Jimmy J. Lin,et al.  Multi-candidate reduction: Sentence compression as a tool for document summarization tasks , 2007, Inf. Process. Manag..

[19]  J. Clarke,et al.  Global inference for sentence compression : an integer linear programming approach , 2008, J. Artif. Intell. Res..

[20]  Mirella Lapata,et al.  Sentence Compression Beyond Word Deletion , 2008, COLING.

[21]  Noah A. Smith,et al.  Paraphrase Identification as Probabilistic Quasi-Synchronous Recognition , 2009, ACL.

[22]  David A. Smith,et al.  Parser Adaptation and Projection with Quasi-Synchronous Grammar Features , 2009, EMNLP.

[23]  Martin Corley,et al.  Timing accuracy of Web experiments: A case study using the WebExp software package , 2009, Behavior research methods.

[24]  Ting Liu,et al.  Application-driven Statistical Paraphrase Generation , 2009, ACL.

[25]  Noah A. Smith,et al.  Summarization with a Joint Model for Sentence Extraction and Compression , 2009, ILP 2009.

[26]  Yansong Feng,et al.  Topic Models for Image Annotation and Text Illustration , 2010, HLT-NAACL.

[27]  Mirella Lapata,et al.  Automatic Generation of Story Highlights , 2010, ACL.

[28]  Yansong Feng,et al.  How Many Words Is a Picture Worth? Automatic Caption Generation for News Images , 2010, ACL.

[29]  Jacek Gondzio,et al.  Exploiting separability in large-scale linear support vector machine training , 2011, Comput. Optim. Appl..