An Empirical Study on Explainable Prediction of Text Complexity: Preliminaries for Text Simplification

Text simplification is concerned with reducing the language complexity and improving the readability of professional content so that the text is accessible to readers at different ages and educational levels. As a promising practice to improve the fairness and transparency of text information systems, the notion of text simplification has been mixed in existing literature, ranging all the way through assessing the complexity of single words to automatically generating simplified documents. We show that the general problem of text simplification can be formally decomposed into a compact pipeline of tasks to ensure the transparency and explanability of the process. In this paper, we present a systematic analysis of the first two steps in this pipeline: 1) predicting the complexity of a given piece of text, and 2) identifying complex components from the text considered to be complex. We show that these two tasks can be solved separately, using either lexical approaches or the state-of-the-art deep learning methods, or they can be solved jointly through an end-to-end, explainable machine learning predictor. We propose formal evaluation metrics for both tasks, through which we are able to compare the performance of the candidate approaches using multiple datasets from a diversity of domains.

[1]  Lucia Specia,et al.  SemEval 2016 Task 11: Complex Word Identification , 2016, *SEMEVAL.

[2]  Devlin Sl,et al.  Simplifying natural language for aphasic readers. , 1999 .

[3]  Chris Callison-Burch,et al.  Evaluating Sentence Compression: Pitfalls and Suggested Remedies , 2011, Monolingual@ACL.

[4]  Qing Zeng-Treitler,et al.  A semantic and syntactic text simplification tool for health content. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[5]  Chris Callison-Burch,et al.  Simple PPDB: A Paraphrase Database for Simplification , 2016, ACL.

[6]  Chris Callison-Burch,et al.  Problems in Current Text Simplification Research: New Data Can Help , 2015, TACL.

[7]  Noémie Elhadad,et al.  Mining a Lexicon of Technical Terms and Lay Equivalents , 2007, BioNLP@ACL.

[8]  Christoph Lofi,et al.  Evaluating Neural Text Simplification in the Medical Domain , 2019, WWW.

[9]  Klaus-Robert Müller,et al.  "What is relevant in a text document?": An interpretable machine learning approach , 2016, PloS one.

[10]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[11]  Advaith Siddharthan,et al.  Syntactic Simplification and Text Cohesion , 2006 .

[12]  Kai Zheng,et al.  Assessing the readability of ClinicalTrials.gov , 2016, J. Am. Medical Informatics Assoc..

[13]  Bhavana Dalvi,et al.  Pretrained Language Models for Sequential Sentence Classification , 2019, EMNLP/IJCNLP.

[14]  Maxine Eskénazi,et al.  Predicting the Relative Difficulty of Single Sentences With and Without Surrounding Context , 2016, EMNLP.

[15]  Jure Leskovec,et al.  Faithful and Customizable Explanations of Black Box Models , 2019, AIES.

[16]  E. S. Pearson,et al.  TESTS FOR RANK CORRELATION COEFFICIENTS. I , 1957 .

[17]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[18]  Gustavo Paetzold Reliable Lexical Simplification for Non-Native Speakers , 2015, HLT-NAACL.

[19]  David Kauchak,et al.  User Evaluation of the Effects of a Text Simplification Algorithm Using Term Familiarity on Perception, Understanding, Learning, and Information Retention , 2013, Journal of medical Internet research.

[20]  Regina Barzilay,et al.  Rationalizing Neural Predictions , 2016, EMNLP.

[21]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[22]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[23]  Joachim Bingel,et al.  Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs , 2017, IJCNLP.

[24]  Emiel Krahmer,et al.  Sentence Simplification by Monolingual Machine Translation , 2012, ACL.

[25]  Lucia Specia,et al.  Learning Simplifications for Specific Target Audiences , 2018, ACL.

[26]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[27]  Cynthia Rudin,et al.  Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model , 2015, ArXiv.

[28]  Ziang Xie,et al.  Neural Text Generation: A Practical Guide , 2017, ArXiv.

[29]  Pierre Zweigenbaum,et al.  ACL-IJCNLP 2009 BUCC 2009 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora , 2009 .

[30]  Matthew Shardlow,et al.  A Comparison of Techniques to Automatically Identify Complex Words. , 2013, ACL.

[31]  Antoine Bordes,et al.  Controllable Sentence Simplification , 2020, LREC.

[32]  Maria Kvist,et al.  Medical text simplification using synonym replacement: Adapting assessment of word difficulty to a compounding language , 2014, PITR@EACL.

[33]  Ramakanth Pasunuru,et al.  Dynamic Multi-Level Multi-Task Learning for Sentence Simplification , 2018, COLING.

[34]  Shakir Mohamed,et al.  Training language GANs from Scratch , 2019, NeurIPS.

[35]  Matthew Shardlow,et al.  A Survey of Automated Text Simplification , 2014 .

[36]  R. P. Fishburne,et al.  Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[37]  David Kauchak,et al.  Improving Text Simplification Language Modeling Using Unsimplified Text Data , 2013, ACL.

[38]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[39]  Qiaozhu Mei,et al.  Extractive Adversarial Networks: High-Recall Explanations for Identifying Personal Attacks in Social Media Posts , 2018, EMNLP.

[40]  W. W. Daniel Applied Nonparametric Statistics , 1979 .

[41]  Lucia Specia,et al.  Text Simplification as Tree Transduction , 2013, STIL.

[42]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[43]  C. Spearman The proof and measurement of association between two things. By C. Spearman, 1904. , 1987, The American journal of psychology.

[44]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[45]  Alan W. Black,et al.  Equity Beyond Bias in Language Technologies for Education , 2019, BEA@ACL.

[46]  Ricardo Baeza-Yates,et al.  Simplify or help?: text simplification strategies for people with dyslexia , 2013, W4A.

[47]  Mirella Lapata,et al.  Sentence Simplification with Deep Reinforcement Learning , 2017, EMNLP.

[48]  Chris Callison-Burch,et al.  Optimizing Statistical Machine Translation for Text Simplification , 2016, TACL.