A Word-Complexity Lexicon and A Neural Readability Ranking Model for Lexical Simplification

Current lexical simplification approaches rely heavily on heuristics and corpus level features that do not always align with human judgment. We create a human-rated word-complexity lexicon of 15,000 English words and propose a novel neural readability ranking model with a Gaussian-based feature vectorization layer that utilizes these human ratings to measure the complexity of any given word or phrase. Our model performs better than the state-of-the-art systems for different lexical simplification tasks and evaluation datasets. Additionally, we also produce SimplePPDB++, a lexical resource of over 10 million simplifying paraphrase rules, by applying our model to the Paraphrase Database (PPDB).

[1]  Lucia Specia,et al.  A Report on the Complex Word Identification Shared Task 2018 , 2018, BEA@NAACL-HLT.

[2]  Christopher M. Danforth,et al.  Temporal Patterns of Happiness and Information in a Global Social Network: Hedonometrics and Twitter , 2011, PloS one.

[3]  Horacio Saggion,et al.  Can Spanish Be Simpler? LexSiS: Lexical Simplification for Spanish , 2012, COLING.

[4]  Chris Callison-Burch,et al.  PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification , 2015, ACL.

[5]  Ramakanth Pasunuru,et al.  Dynamic Multi-Level Multi-Task Learning for Sentence Simplification , 2018, COLING.

[6]  Anoop Sarkar,et al.  Improving Statistical Machine Translation with a Multilingual Paraphrase Database , 2015, EMNLP.

[7]  Shiliang Zhang,et al.  Neural Networks Models for Entity Discovery and Linking , 2016, ArXiv.

[8]  Lucia Specia,et al.  Unsupervised Lexical Simplification for Non-Native Speakers , 2016, AAAI.

[9]  Lucia Specia,et al.  Lexical Simplification with Neural Ranking , 2017, EACL.

[10]  Claire Cardie,et al.  SemEval-2014 Task 10: Multilingual Semantic Textual Similarity , 2014, *SEMEVAL.

[11]  Goran Glavas,et al.  Simplifying Lexical Simplification: Do We Need Simplified Corpora? , 2015, ACL.

[12]  John Lee,et al.  Personalizing Lexical Simplification , 2018, COLING.

[13]  Lucia Specia,et al.  SemEval 2016 Task 11: Complex Word Identification , 2016, *SEMEVAL.

[14]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[15]  James P. Bagrow,et al.  Human language reveals a universal positivity bias , 2014, Proceedings of the National Academy of Sciences.

[16]  Napoleon Katsos,et al.  Reformulating Discourse Connectives for Non-Expert Readers , 2010, NAACL.

[17]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[18]  David Kauchak,et al.  Learning a Lexical Simplifier Using Wikipedia , 2014, ACL.

[19]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20]  Maxine Eskénazi,et al.  An Open Corpus of Everyday Documents for Simplification Tasks , 2014, PITR@EACL.

[21]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[22]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[23]  Ricardo Baeza-Yates,et al.  The Impact of Lexical Simplification by Verbal Paraphrases for People with and without Dyslexia , 2013, CICLing.

[24]  Lucia Specia,et al.  SV000gg at SemEval-2016 Task 11: Heavy Gauge Complex Word Identification with System Voting , 2016, SemEval@NAACL-HLT.

[25]  Chris Callison-Burch,et al.  Simplification Using Paraphrases and Context-Based Lexical Substitution , 2018, NAACL.

[26]  David Kauchak,et al.  Learning to Simplify Sentences Using Wikipedia , 2011, Monolingual@ACL.

[27]  Satoshi Sato,et al.  Verb Paraphrase based on Case Frame Alignment , 2002, ACL.

[28]  Dan Klein,et al.  An Empirical Investigation of Statistical Significance in NLP , 2012, EMNLP.

[29]  Lucia Specia,et al.  SemEval-2012 Task 1: English Lexical Simplification , 2012, *SEMEVAL.

[30]  Chris Callison-Burch,et al.  PPDB: The Paraphrase Database , 2013, NAACL.

[31]  Noémie Elhadad,et al.  Putting it Simply: a Context-Aware Approach to Lexical Simplification , 2011, ACL.

[32]  Matthew Shardlow,et al.  The CW Corpus: A New Resource for Evaluating the Identification of Complex Words , 2013, PITR@ACL.

[33]  Siobhan Devlin,et al.  Simplifying Text for Language-Impaired Readers , 1999, EACL.

[34]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[35]  Ani Nenkova,et al.  Inducing Lexical Style Properties for Paraphrase and Genre Differentiation , 2015, HLT-NAACL.

[36]  Chris Callison-Burch,et al.  Problems in Current Text Simplification Research: New Data Can Help , 2015, TACL.

[37]  Sergiu Nisioi,et al.  Exploring Neural Text Simplification Models , 2017, ACL.

[38]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[39]  Shashi Narayan,et al.  Unsupervised Sentence Simplification Using Deep Semantics , 2015, INLG.

[40]  Lucia Specia,et al.  UOW-SHEF: SimpLex – Lexical Simplicity Ranking based on Contextual and Psycholinguistic Features , 2012, *SEMEVAL.

[41]  Matthew Shardlow,et al.  A Comparison of Techniques to Automatically Identify Complex Words. , 2013, ACL.

[42]  Christian Biemann,et al.  CWIG3G2 - Complex Word Identification Task across Three Text Genres and Two User Groups , 2017, IJCNLP.

[43]  Chris Callison-Burch,et al.  Simple PPDB: A Paraphrase Database for Simplification , 2016, ACL.

[44]  Saif Mohammad,et al.  CROWDSOURCING A WORD–EMOTION ASSOCIATION LEXICON , 2013, Comput. Intell..

[45]  Lucia Specia,et al.  LEXenstein: A Framework for Lexical Simplification , 2015, ACL.

[46]  Mirella Lapata,et al.  Sentence Simplification with Deep Reinforcement Learning , 2017, EMNLP.

[47]  Hiroshi Matsumoto,et al.  Selecting Proper Lexical Paraphrase for Children , 2013, ROCLING/IJCLCLP.

[48]  Matthew Shardlow,et al.  Out in the Open: Finding and Categorising Errors in the Lexical Simplification Pipeline , 2014, LREC.

[49]  Hong Yu,et al.  Sentence Simplification with Memory-Augmented Neural Networks , 2018, NAACL.

[50]  Gourab Kundu,et al.  Neural Cross-Lingual Entity Linking , 2017, AAAI.

[51]  Mari Ostendorf,et al.  Text simplification for language learners: a corpus analysis , 2007, SLaTE.

[52]  Noémie Elhadad,et al.  Mining a Lexicon of Technical Terms and Lay Equivalents , 2007, BioNLP@ACL.

[53]  Gustavo Henrique Paetzold,et al.  A survey of lexical simplification , 2018, Emerging Trends in Engineering, Science and Technology for Society, Energy and Environment.

[54]  Hiroshi Nakagawa,et al.  Personalized reading support for second-language web documents , 2013, TIST.

[55]  Chris Callison-Burch,et al.  Optimizing Statistical Machine Translation for Text Simplification , 2016, TACL.