Book Review: Automatic Text Simplification by Horacio Saggion

Abstract Thanks to the availability of texts on the Web in recent years, increased knowledge and information have been made available to broader audiences. However, the way in which a text is written—its vocabulary, its syntax—can be difficult to read and understand for many people, especially those with poor literacy, cognitive or linguistic impairment, or those with limited knowledge of the language of the text. Texts containing uncommon words or long and complicated sentences can be difficult to read and understand by people as well as difficult to analyze by machines. Automatic text simplification is the process of transforming a text into another text which, ideally conveying the same message, will be easier to read and understand by a broader audience. The process usually involves the replacement of difficult or unknown phrases with simpler equivalents and the transformation of long and syntactically complex sentences into shorter and less complex ones. Automatic text simplification, a research topi...

[1]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[2]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[3]  Advaith Siddharthan,et al.  Syntactic Simplification and Text Cohesion , 2006 .

[4]  Gabriela Ferraro,et al.  Improving the comprehension of legal documentation: the case of patent claims , 2009, ICAIL.

[5]  Srinivas Bangalore,et al.  Performance Evaluation of Supertagging for Partial Parsing , 2000 .

[6]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[7]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[8]  Daniel Ferrés,et al.  YATS: Yet Another Text Simplifier , 2016, NLDB.

[9]  Chris Callison-Burch,et al.  Problems in Current Text Simplification Research: New Data Can Help , 2015, TACL.

[10]  Ani Nenkova,et al.  Revisiting Readability: A Unified Framework for Predicting Text Quality , 2008, EMNLP.

[11]  Raman Chandrasekar,et al.  Motivations and Methods for Text Simplification , 1996, COLING.

[12]  Galia Angelova,et al.  Recent Advances in Natural Language Processing IV: Selected papers from RANLP 2005 , 2007 .

[13]  Nitin Madnani,et al.  Fluency, Adequacy, or HTER? Exploring Different Human Judgments with a Tunable MT Metric , 2009, WMT@EACL.

[14]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[15]  Sanja Stajner,et al.  Making It Simplext , 2015, ACM Trans. Access. Comput..

[16]  Jason Eisner,et al.  Learning Non-Isomorphic Tree Mappings for Machine Translation , 2003, ACL.

[17]  Sanja Stajner,et al.  Translating sentences from 'original' to 'simplified' Spanish , 2014, Proces. del Leng. Natural.

[18]  Johan Bos,et al.  Linguistically Motivated Large-Scale NLP with C&C and Boxer , 2007, ACL.

[19]  Horacio Saggion,et al.  Simplifying words in context. Experiments with two lexical resources in Spanish , 2016, Comput. Speech Lang..

[20]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[21]  Allan Hanbury,et al.  The Influence of Pre-processing on the Estimation of Readability of Web Documents , 2015, CIKM.

[22]  Philipp Koehn,et al.  Findings of the 2009 Workshop on Statistical Machine Translation , 2009, WMT@EACL.

[23]  Yuan Ding,et al.  Machine Translation Using Probabilistic Synchronous Dependency Insertion Grammars , 2005, ACL.

[24]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[25]  Mirella Lapata,et al.  Models for Sentence Compression: A Comparison across Domains, Training Requirements and Evaluation Measures , 2006, ACL.

[26]  Horacio Saggion,et al.  Colouring Summaries BLEU , 2003 .

[27]  Yifan Peng,et al.  iSimp: A sentence simplification system for biomedicail text , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine.

[28]  Caroline Gasperin,et al.  Fostering Digital Inclusion and Accessibility: The PorSimples project for Simplification of Portuguese Texts , 2010, NAACL.

[29]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[30]  Son Bao Pham,et al.  Learning to Simplify Children Stories with Limited Data , 2014, ACIIDS.

[31]  Alice Davison,et al.  Limitations of Readability Formulas in Guiding Adaptations of Texts. Technical Report No. 162. , 1980 .

[32]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[33]  Kevyn Collins-Thompson,et al.  Computational Assessment of Text Readability: A Survey of Current and Future Research Running title: Computational Assessment of Text Readability , 2014 .

[34]  Lucia Specia Translating from Complex to Simplified Sentences , 2010, PROPOR.

[35]  Bertram C. Bruce,et al.  Why readability formulas fail , 1981, IEEE Transactions on Professional Communication.

[36]  Lijun Feng,et al.  Cognitively Motivated Features for Readability Assessment , 2009, EACL.

[37]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[38]  Siobhan Devlin,et al.  Helping aphasic people process online information , 2006, Assets '06.

[39]  Roberto Navigli,et al.  The English lexical substitution task , 2009, Lang. Resour. Evaluation.

[40]  Arthur C. Graesser,et al.  Coh-Metrix: Analysis of text on cohesion and language , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[41]  Daniel Marcu,et al.  Text Simplification for Information-Seeking Applications , 2004, CoopIS/DOA/ODBASE.

[42]  David Yarowsky,et al.  One Sense Per Discourse , 1992, HLT.

[43]  Siddhartha Jonnalagadda,et al.  Towards Effective Sentence Simplification for Automatic Processing of Biomedical Text , 2009, HLT-NAACL.

[44]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[45]  Ryen W. White,et al.  Personalizing web search results by reading level , 2011, CIKM '11.

[46]  Ricardo Baeza-Yates,et al.  Frequent Words Improve Readability and Short Words Improve Understandability for People with Dyslexia , 2013, INTERACT.

[47]  Seth Spaulding,et al.  A Spanish Readability Formula , 1956 .

[48]  P. Nation,et al.  A vocabulary-size test of controlled productive ability , 1999 .

[49]  Qiang Wu,et al.  Adapting boosting for information retrieval measures , 2010, Information Retrieval.

[50]  Advaith Siddharthan,et al.  An architecture for a text simplification system , 2002, Language Engineering Conference, 2002. Proceedings.

[51]  Leo Wanner,et al.  A development Environment for an MTT-Based Sentence Generator , 2000, INLG.

[52]  Magnus Sahlgren,et al.  The Word-Space Model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces , 2006 .

[53]  C. Norbury,et al.  Barking up the wrong tree? Lexical ambiguity resolution in children with language impairments and autistic spectrum disorders. , 2005, Journal of experimental child psychology.

[54]  Sara Tonelli,et al.  ERNESTA: A Sentence Simplification Tool for Children's Stories in Italian , 2013, CICLing.

[55]  Michael Strube,et al.  Sentence Fusion via Dependency Graph Compression , 2008, EMNLP.

[56]  Horacio Saggion,et al.  Making numerical information more accessible: The implementation of a Numerical Expression Simplification System for Spanish , 2014 .

[57]  Regina Barzilay,et al.  Sentence Alignment for Monolingual Comparable Corpora , 2003, EMNLP.

[58]  Ricardo Baeza-Yates,et al.  Simplify or help?: text simplification strategies for people with dyslexia , 2013, W4A.

[59]  Raquel Hervás,et al.  One Half or 50%? An Eye-Tracking Study of Number Representation Readability , 2013, INTERACT.

[60]  Ani Nenkova,et al.  Syntactic Simplification for Improving Content Selection in Multi-Document Summarization , 2004, COLING.

[61]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[62]  Bernd Bohnet Efficient Parsing of Syntactic and Semantic Dependency Structures , 2009, CoNLL Shared Task.

[63]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[64]  Delphine Bernhard,et al.  Question Generation for French: Collating Parsers and Paraphrasing Questions , 2012, Dialogue Discourse.

[65]  Kentaro Inui,et al.  Text Simplification for Reading Assistance: A Project Note , 2003, IWP@ACL.

[66]  Gustavo Henrique Paetzold Lexical simplification for non-native English speakers , 2016 .

[67]  Sanja Stajner,et al.  Automatic Text Simplification in Spanish: A Comparative Evaluation of Complementing Modules , 2013, CICLing.

[68]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[69]  Eneko Agirre,et al.  Personalizing PageRank for Word Sense Disambiguation , 2009, EACL.

[70]  M. Brysbaert,et al.  Age-of-acquisition ratings for 30,000 English words , 2012, Behavior research methods.

[71]  Walt Detmar Meurers,et al.  Readability assessment for text simplification: From analysing documents to identifying sentential simplifications , 2014 .

[72]  Beata Beigman Klebanov,et al.  Associative Lexical Cohesion as a factor in Text Complexity , 2014 .

[73]  Mari Ostendorf,et al.  Reading Level Assessment Using Support Vector Machines and Statistical Language Models , 2005, ACL.

[74]  Mirella Lapata,et al.  Modeling Local Coherence: An Entity-Based Approach , 2005, ACL.

[75]  Simonetta Montemagni,et al.  Assessing document and sentence readability in less resourced languages and across textual genres , 2014 .

[76]  Daniel Marcu,et al.  Summarization beyond sentence extraction: A probabilistic approach to sentence compression , 2002, Artif. Intell..

[77]  Inmaculada Fajardo,et al.  Easy-to-read texts for students with intellectual disability: linguistic factors affecting comprehension. , 2014, Journal of applied research in intellectual disabilities : JARID.

[78]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2007, ICML '07.

[79]  Kalina Bontcheva,et al.  Architectural elements of language engineering robustness , 2002, Natural Language Engineering.

[80]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[81]  Frank Van Eynde,et al.  Translating text into pictographs , 2015, Natural Language Engineering.

[82]  Victor M. Darriba,et al.  Computational Linguistics and Intelligent Text Processing , 2014, Lecture Notes in Computer Science.

[83]  Horacio Saggion,et al.  Text Simplification in Simplext. Making Text More Accessible , 2011, Proces. del Leng. Natural.

[84]  Benoit Lavoie,et al.  A Fast and Portable Realizer for Text Generation Systems , 1997, ANLP.

[85]  Ethel Ong,et al.  Simplifying Text in Medical Literature , 2008 .

[86]  Sanja Stajner,et al.  A Deeper Exploration of the Standard PB-SMT Approach to Text Simplification and its Evaluation , 2015, ACL.

[87]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[88]  Rebekah George Benjamin Reconstructing Readability: Recent Developments and Recommendations in the Analysis of Text Difficulty , 2012 .

[89]  Raman Chandrasekar,et al.  Automatic induction of rules for text simplification , 1997, Knowl. Based Syst..

[90]  Gabriela Ferraro Towards deep content extraction from specialized discourse : the case of verbal relations in patent claims , 2012 .