Incorporating Multiword Expressions in Phrase Complexity Estimation

Multiword expressions (MWEs) were shown to be useful in a number of NLP tasks. However, research on the use of MWEs in lexical complexity assessment and simplification is still an under-explored area. In this paper, we propose a text complexity assessment system for English, which incorporates MWE identification. We show that detecting MWEs using state-of-the-art systems improves predicting complexity on an established lexical complexity dataset.

[1]  Ekaterina Kochmar,et al.  Detecting Multiword Expression Type Helps Lexical Complexity Assessment , 2020, LREC.

[2]  Shiva Taslimipoor,et al.  Bridging the Gap: Attending to Discontinuity in Identification of Multiword Expressions , 2019, NAACL-HLT.

[3]  Thomas François,et al.  Assisted Lexical Simplification for French Native Children with Reading Difficulties , 2018 .

[4]  Lucia Specia,et al.  A Report on the Complex Word Identification Shared Task 2018 , 2018, BEA@NAACL-HLT.

[5]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[6]  B. Everitt,et al.  Statistical methods for rates and proportions , 1973 .

[7]  Noah A. Smith,et al.  Comprehensive Annotation of Multiword Expressions in a Social Web Corpus , 2014, LREC.

[8]  George R. S. Weir,et al.  Measuring readability for Japanese learners of english , 2007 .

[9]  Timothy Baldwin,et al.  Bayesian Text Segmentation for Index Term Identification and Keyphrase Extraction , 2012, COLING.

[10]  Behrang Q. Zadeh,et al.  The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions , 2017, MWE@EACL.

[11]  Simon Krek,et al.  Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions , 2018, COLING 2018.

[12]  Patrick Watrin,et al.  On the Contribution of MWE-based Features to a Readability Formula for French as a Foreign Language , 2011, RANLP.

[13]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[14]  Carlos Ramisch,et al.  Survey: Multiword Expression Processing: A Survey , 2017, CL.

[15]  Matthew Shardlow,et al.  A Comparison of Techniques to Automatically Identify Complex Words. , 2013, ACL.

[16]  Marie Candito,et al.  The ATILF-LLF System for Parseme Shared Task: a Transition-based Verbal Multiword Expression Tagger , 2017, MWE@EACL.

[17]  A G N,et al.  Bibliographical References , 1965 .

[18]  Mark Davies The 385+ million word Corpus of Contemporary American English (1990―2008+): Design, architecture, and linguistic insights , 2009 .

[19]  Christian Biemann,et al.  CWIG3G2 - Complex Word Identification Task across Three Text Genres and Two User Groups , 2017, IJCNLP.

[20]  Ekaterina Kochmar,et al.  CAMB at CWI Shared Task 2018: Complex Word Identification with Ensemble-Based Voting , 2018, BEA@NAACL-HLT.

[21]  Ekaterina Kochmar,et al.  Complex Word Identification as a Sequence Labelling Task , 2019, ACL.

[22]  Noah A. Smith,et al.  A Corpus and Model Integrating Multiword Expressions and Supersenses , 2015, NAACL.

[23]  Marine Carpuat,et al.  Task-based Evaluation of Multiword Expressions: a Pilot Study in Statistical Machine Translation , 2010, NAACL.

[24]  Alun D. Preece,et al.  The role of idioms in sentiment analysis , 2015, Expert Syst. Appl..