Manchester Metropolitan at SemEval-2021 Task 1: Convolutional Networks for Complex Word Identification

We present two convolutional neural networks for predicting the complexity of words and phrases in context on a continuous scale. Both models utilize word and character embeddings alongside lexical features as inputs. Our system displays reasonable results with a Pearson correlation of 0.7754 on the task as a whole. We highlight the limitations of this method in properly assessing the context of the target text, and explore the effectiveness of both systems across a range of genres. Both models were submitted as part of LCP 2021, which focuses on the identification of complex words and phrases as a context dependent, regression based task.

[1]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[2]  Christian Biemann,et al.  CWIG3G2 - Complex Word Identification Task across Three Text Genres and Two User Groups , 2017, IJCNLP.

[3]  Marcos Zampieri,et al.  Predicting Lexical Complexity in English Texts , 2021, ArXiv.

[4]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[5]  Ekaterina Kochmar,et al.  CAMB at CWI Shared Task 2018: Complex Word Identification with Ensemble-Based Voting , 2018, BEA@NAACL-HLT.

[6]  Mamoru Komachi,et al.  Complex Word Identification Based on Frequency in a Learner Corpus , 2018, BEA@NAACL-HLT.

[7]  Lucia Specia,et al.  A Report on the Complex Word Identification Shared Task 2018 , 2018, BEA@NAACL-HLT.

[8]  Ricardo Baeza-Yates,et al.  Simplify or help?: text simplification strategies for people with dyslexia , 2013, W4A.

[9]  Noémie Elhadad Comprehending Technical Texts: Predicting and Defining Unfamiliar Terms , 2006, AMIA.

[10]  Marc Brysbaert,et al.  Subtlex-UK: A New and Improved Word Frequency Database for British English , 2014, Quarterly journal of experimental psychology.

[11]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[12]  William H. DuBay The Principles of Readability. , 2004 .

[13]  Josef van Genabith,et al.  MacSaar at SemEval-2016 Task 11: Zipfian and Character Features for ComplexWord Identification , 2016, *SEMEVAL.

[14]  Matthew Shardlow,et al.  A Comparison of Techniques to Automatically Identify Complex Words. , 2013, ACL.

[15]  Ekaterina Kochmar,et al.  Complex Word Identification as a Sequence Labelling Task , 2019, ACL.

[16]  Kim Cheng Sheang Multilingual Complex Word Identification: Convolutional Neural Networks with Morphological and Linguistic Features , 2019, RANLP.

[17]  M. Brysbaert,et al.  Age-of-acquisition ratings for 30,000 English words , 2012, Behavior research methods.

[18]  Marcos Zampieri,et al.  CompLex - A New Corpus for Lexical Complexity Predicition from Likert Scale Data , 2020, READI.

[19]  Michael Wilson MRC Psycholinguistic Database , 2001 .

[20]  Alexander F. Gelbukh,et al.  Complex Word Identification: Convolutional Neural Network vs. Feature Engineering , 2018, BEA@NAACL-HLT.