Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming

Text simplification aims to rewrite text into simpler versions, and thus make information accessible to a broader audience. Most previous work simplifies sentences using handcrafted rules aimed at splitting long sentences, or substitutes difficult words using a predefined dictionary. This paper presents a data-driven model based on quasi-synchronous grammar, a formalism that can naturally capture structural mismatches and complex rewrite operations. We describe how such a grammar can be induced from Wikipedia and propose an integer linear programming model for selecting the most appropriate simplification from the space of possible rewrites generated by the grammar. We show experimentally that our method creates simplifications that significantly reduce the reading difficulty of the input, while maintaining grammaticality and preserving its meaning.

[1]  Kentaro Inui,et al.  Text Simplification for Reading Assistance: A Project Note , 2003, IWP@ACL.

[2]  Kathleen R. McKeown,et al.  Information fusion for multidocument summarization: paraphrasing and generation , 2003 .

[3]  Yansong Feng,et al.  Title Generation with Quasi-Synchronous Grammar , 2010, EMNLP.

[4]  James V. Mitchell The ninth mental measurements yearbook , 1985 .

[5]  Renata Pontin de Mattos Fortes,et al.  Facilita: reading assistance for low-literacy readers , 2009, SIGDOC '09.

[6]  Kevin Knight,et al.  A Syntax-based Statistical Translation Model , 2001, ACL.

[7]  Michael Strube,et al.  Decoding Wikipedia Categories for Knowledge Acquisition , 2008, AAAI.

[8]  David A. Smith,et al.  Parser Adaptation and Projection with Quasi-Synchronous Grammar Features , 2009, EMNLP.

[9]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[10]  Regina Barzilay,et al.  Automatically Generating Wikipedia Articles: A Structure-Aware Approach , 2009, ACL.

[11]  Advaith Siddharthan,et al.  Syntactic Simplification and Text Cohesion , 2006 .

[12]  Raman Chandrasekar,et al.  Motivations and Methods for Text Simplification , 1996, COLING.

[13]  Satoshi Sato,et al.  Verb Paraphrase based on Case Frame Alignment , 2002, ACL.

[14]  Tobias Achterberg,et al.  Constraint integer programming , 2007 .

[15]  Devlin Sl,et al.  Simplifying natural language for aphasic readers. , 1999 .

[16]  Daphne Koller,et al.  Sentence Simplification for Semantic Role Labeling , 2008, ACL.

[17]  Daniel Marcu,et al.  Text Simplification for Information-Seeking Applications , 2004, CoopIS/DOA/ODBASE.

[18]  Nitin Madnani,et al.  Fluency, Adequacy, or HTER? Exploring Different Human Judgments with a Tunable MT Metric , 2009, WMT@EACL.

[19]  Regina Barzilay,et al.  Sentence Alignment for Monolingual Comparable Corpora , 2003, EMNLP.

[20]  Siobhan Devlin,et al.  Simplifying Text for Language-Impaired Readers , 1999, EACL.

[21]  Stuart M. Shieber,et al.  Towards Robust Context-Sensitive Sentence Alignment for Monolingual Corpora , 2006, EACL.

[22]  Daniel S. Weld,et al.  Open Information Extraction Using Wikipedia , 2010, ACL.

[23]  Iryna Gurevych,et al.  A Monolingual Tree-based Translation Model for Sentence Simplification , 2010, COLING.

[24]  David A. Smith,et al.  Quasi-Synchronous Grammars: Alignment by Soft Projection of Syntactic Dependencies , 2006, WMT@HLT-NAACL.

[25]  Ting Liu,et al.  Application-driven Statistical Paraphrase Generation , 2009, ACL.

[26]  Simone Paolo Ponzetto,et al.  Knowledge Derived From Wikipedia For Computing Semantic Relatedness , 2007, J. Artif. Intell. Res..

[27]  Daniel Marcu,et al.  Summarization beyond sentence extraction: A probabilistic approach to sentence compression , 2002, Artif. Intell..

[28]  Mirella Lapata,et al.  Sentence Compression Beyond Word Deletion , 2008, COLING.

[29]  Noah A. Smith,et al.  What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA , 2007, EMNLP.

[30]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[31]  Cristian Danescu-Niculescu-Mizil,et al.  For the sake of simplicity: Unsupervised extraction of lexical simplifications from Wikipedia , 2010, NAACL.

[32]  Elif Yamangil,et al.  Mining Wikipedia Revision Histories for Improving Sentence Compression , 2008, ACL.