One Step Closer to Automatic Evaluation of Text Simplification Systems

This study explores the possibility of replacing the costly and time-consuming human evaluation of the grammaticality and meaning preservation of the output of text simplification (TS) systems with some automatic measures. The focus is on six widely used machine translation (MT) evaluation metrics and their correlation with human judgements of grammaticality and meaning preservation in text snippets. As the results show a significant correlation between them, we go further and try to classify simplified sentences into: (1) those which are acceptable; (2) those which need minimal post-editing; and (3) those which should be discarded. The preliminary results, reported in this paper, are promising.

[1]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[2]  Raman Chandrasekar,et al.  Motivations and Methods for Text Simplification , 1996, COLING.

[3]  Nitin Madnani,et al.  Fluency, Adequacy, or HTER? Exploring Different Human Judgments with a Tunable MT Metric , 2009, WMT@EACL.

[4]  Advaith Siddharthan,et al.  Syntactic Simplification and Text Cohesion , 2006 .

[5]  Sanja Stajner,et al.  Automatic Text Simplification in Spanish: A Comparative Evaluation of Complementing Modules , 2013, CICLing.

[6]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[7]  Siobhan Devlin,et al.  Simplifying Text for Language-Impaired Readers , 1999, EACL.

[8]  Cyril Goutte Automatic Evaluation of Machine Translation Quality , 2006 .

[9]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[10]  Mari Ostendorf,et al.  Text simplification for language learners: a corpus analysis , 2007, SLaTE.

[11]  Iryna Gurevych,et al.  A Monolingual Tree-based Translation Model for Sentence Simplification , 2010, COLING.

[12]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[13]  Renata Pontin de Mattos Fortes,et al.  A corpus analysis of simple account texts and the proposal of simplification strategies: first steps towards text simplification systems , 2008, SIGDOC '08.

[14]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[15]  Kentaro Inui,et al.  Text Simplification for Reading Assistance: A Project Note , 2003, IWP@ACL.

[16]  Lijun Feng,et al.  Automatic readability assessment for people with intellectual disabilities , 2009, ASAC.

[17]  Christian Smith,et al.  Towards a Rule Based System for Automatic Simplification of Texts , 2010 .

[18]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[19]  A. D. Ilarraza,et al.  First Approach to Automatic Text Simplification in Basque Marı́a , 2012 .

[20]  Renata Pontin de Mattos Fortes,et al.  Towards Brazilian Portuguese automatic text simplification systems , 2008, DocEng '08.

[21]  Lucia Specia,et al.  TINE: A Metric to Assess MT Adequacy , 2011, WMT@EMNLP.

[22]  Emiel Krahmer,et al.  Sentence Simplification by Monolingual Machine Translation , 2012, ACL.

[23]  Mirella Lapata,et al.  Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming , 2011, EMNLP.

[24]  Sara Tonelli,et al.  ERNESTA: A Sentence Simplification Tool for Children's Stories in Italian , 2013, CICLing.

[25]  Goran Glavaš,et al.  Event-centered simplication of news stories , 2013 .

[26]  Horacio Saggion,et al.  Text Simplification in Simplext. Making Text More Accessible , 2011, Proces. del Leng. Natural.

[27]  R. Mitkov,et al.  What can readability measures really tell us about text complexity , 2012 .

[28]  Alon Lavie,et al.  Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems , 2011, WMT@EMNLP.

[29]  Lucia Specia Translating from Complex to Simplified Sentences , 2010, PROPOR.

[30]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[31]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[32]  Lloyd A. Smith,et al.  Practical feature subset selection for machine learning , 1998 .

[33]  David Kauchak,et al.  Learning to Simplify Sentences Using Wikipedia , 2011, Monolingual@ACL.

[34]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[35]  Luz Rello,et al.  DysWebxia: a model to improve accessibility of the textual web for dyslexic users , 2012, ASAC.

[36]  Helmer Strik,et al.  Human language technology and communicative disabilities: requirements and possibilities for the future , 2012, Lang. Resour. Evaluation.

[37]  Mirella Lapata,et al.  WikiSimple: Automatic Simplification of Wikipedia Articles , 2011, AAAI.

[38]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.