Lexical Simplification with Neural Ranking

We present a new Lexical Simplification approach that exploits Neural Networks to learn substitutions from the Newsela corpus - a large set of professionally produced simplifications. We extract candidate substitutions by combining the Newsela corpus with a retrofitted context-aware word embeddings model and rank them using a new neural regression model that learns rankings from annotated data. This strategy leads to the highest Accuracy, Precision and F1 scores to date in standard datasets for the task.

[1]  Devlin Sl,et al.  Simplifying natural language for aphasic readers. , 1999 .

[2]  Chris Callison-Burch,et al.  Problems in Current Text Simplification Research: New Data Can Help , 2015, TACL.

[3]  Lucia Specia,et al.  SemEval 2016 Task 11: Complex Word Identification , 2016, *SEMEVAL.

[4]  Lucia Specia,et al.  SV000gg at SemEval-2016 Task 11: Heavy Gauge Complex Word Identification with System Voting , 2016, SemEval@NAACL-HLT.

[5]  Lucia Specia,et al.  LEXenstein: A Framework for Lexical Simplification , 2015, ACL.

[6]  A. Rudell Frequency of word usage and perceived word difficulty: Ratings of Kučera and Francis words , 1993 .

[7]  Lucia Specia,et al.  Unsupervised Lexical Simplification for Non-Native Speakers , 2016, AAAI.

[8]  A. Kilgarriff *SEM 2012: The First Joint Conference on Lexical and Computational Semantics , 2012 .

[9]  Noémie Elhadad,et al.  Putting it Simply: a Context-Aware Approach to Lexical Simplification , 2011, ACL.

[10]  Lucia Specia,et al.  Vicinity-Driven Paragraph and Sentence Alignment for Comparable Corpora , 2016, ArXiv.

[11]  Tom M. Mitchell,et al.  Using the Future to Sort Out the Present: Rankprop and Multitask Learning for Medical Risk Evaluation , 1995, NIPS.

[12]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[13]  Hiroshi Matsumoto,et al.  Selecting Proper Lexical Paraphrase for Children , 2013, ROCLING/IJCLCLP.

[14]  Matthew Shardlow,et al.  Out in the Open: Finding and Categorising Errors in the Lexical Simplification Pipeline , 2014, LREC.

[15]  David Kauchak,et al.  Improving Text Simplification Language Modeling Using Unsimplified Text Data , 2013, ACL.

[16]  Alon Lavie,et al.  Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems , 2011, WMT@EMNLP.

[17]  Lucia Specia,et al.  Collecting and Exploring Everyday Language for Predicting Psycholinguistic Properties of Words , 2016, COLING.

[18]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[19]  Goran Glavas,et al.  Simplifying Lexical Simplification: Do We Need Simplified Corpora? , 2015, ACL.

[20]  Gustavo Henrique Paetzold Lexical simplification for non-native English speakers , 2016 .

[21]  Lucia Specia,et al.  Benchmarking Lexical Simplification Systems , 2016, LREC.

[22]  Marc Brysbaert,et al.  Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English , 2009, Behavior research methods.

[23]  Lucia Specia,et al.  SemEval-2012 Task 1: English Lexical Simplification , 2012, *SEMEVAL.

[24]  David Kauchak,et al.  Learning a Lexical Simplifier Using Wikipedia , 2014, ACL.

[25]  Ming Zhou,et al.  Ranking with Recursive Neural Networks and Its Application to Multi-Document Summarization , 2015, AAAI.

[26]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[27]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[28]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.