论文信息 - Manchester Metropolitan at SemEval-2021 Task 1: Convolutional Networks for Complex Word Identification

Manchester Metropolitan at SemEval-2021 Task 1: Convolutional Networks for Complex Word Identification

We present two convolutional neural networks for predicting the complexity of words and phrases in context on a continuous scale. Both models utilize word and character embeddings alongside lexical features as inputs. Our system displays reasonable results with a Pearson correlation of 0.7754 on the task as a whole. We highlight the limitations of this method in properly assessing the context of the target text, and explore the effectiveness of both systems across a range of genres. Both models were submitted as part of LCP 2021, which focuses on the identification of complex words and phrases as a context dependent, regression based task.

Matthew Shardlow | Robert Flynn

[1] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[2] Christian Biemann,et al. CWIG3G2 - Complex Word Identification Task across Three Text Genres and Two User Groups , 2017, IJCNLP.

[3] Marcos Zampieri,et al. Predicting Lexical Complexity in English Texts , 2021, ArXiv.

[4] George Kingsley Zipf,et al. Human behavior and the principle of least effort , 1949 .

[5] Ekaterina Kochmar,et al. CAMB at CWI Shared Task 2018: Complex Word Identification with Ensemble-Based Voting , 2018, BEA@NAACL-HLT.

[6] Mamoru Komachi,et al. Complex Word Identification Based on Frequency in a Learner Corpus , 2018, BEA@NAACL-HLT.

[7] Lucia Specia,et al. A Report on the Complex Word Identification Shared Task 2018 , 2018, BEA@NAACL-HLT.

[8] Ricardo Baeza-Yates,et al. Simplify or help?: text simplification strategies for people with dyslexia , 2013, W4A.

[9] Noémie Elhadad. Comprehending Technical Texts: Predicting and Defining Unfamiliar Terms , 2006, AMIA.

[10] Marc Brysbaert,et al. Subtlex-UK: A New and Improved Word Frequency Database for British English , 2014, Quarterly journal of experimental psychology.

[11] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.