论文信息 - RS_GV at SemEval-2021 Task 1: Sense Relative Lexical Complexity Prediction - 字舞流文

RS_GV at SemEval-2021 Task 1: Sense Relative Lexical Complexity Prediction

We present the technical report of the system called RS_GV at SemEval-2021 Task 1 on lexical complexity prediction of English words. RS_GV is a neural network using hand-crafted linguistic features in combination with character and word embeddings to predict target words’ complexity. For the generation of the hand-crafted features, we set the target words in relation to their senses. RS_GV predicts the complexity well of biomedical terms but it has problems with the complexity prediction of very complex and very simple target words.

Regina Stodden | Gayatri Venugopal

[1] David Kauchak,et al. Learning a Lexical Simplifier Using Wikipedia , 2014, ACL.

[2] Christiane Fellbaum,et al. Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[3] Hugo Mailhot,et al. MorphoLex: A derivational morphological database for 70,000 English words , 2018, Behavior research methods.

[4] Christopher D. Manning,et al. Stanza: A Python Natural Language Processing Toolkit for Many Human Languages , 2020, ACL.

[5] Roland Vollgraf,et al. Pooled Contextualized Embeddings for Named Entity Recognition , 2019, NAACL.

[6] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[7] Michael Wilson,et al. MRC psycholinguistic database: Machine-usable dictionary, version 2.00 , 1988 .

[8] Matthew Shardlow,et al. The CW Corpus: A New Resource for Evaluating the Identification of Complex Words , 2013, PITR@ACL.

[9] E A Smith,et al. Automated readability index. , 1967, AMRL-TR. Aerospace Medical Research Laboratories.

[10] M. Coleman,et al. A computer readability formula designed for machine scoring. , 1975 .

[11] Roland Vollgraf,et al. FLAIR: An Easy-to-Use Framework for State-of-the-Art NLP , 2019, NAACL.

[12] Marcos Zampieri,et al. Predicting Lexical Complexity in English Texts , 2021, ArXiv.

[13] Antony J. Williams,et al. Beautiful Data: The Stories Behind Elegant Data Solutions , 2009 .

[14] Marcos Zampieri,et al. SemEval-2021 Task 1: Lexical Complexity Prediction , 2021, SEMEVAL.

[15] R. Gunning. The Technique of Clear Writing. , 1968 .

[16] Horacio Saggion,et al. LaSTUS/TALN at Complex Word Identification (CWI) 2018 Shared Task , 2018, BEA@NAACL-HLT.

[17] Elnaz Davoodi,et al. CLaC at SemEval-2016 Task 11: Exploring linguistic and psycho-linguistic Features for Complex Word Identification , 2016, SemEval@NAACL-HLT.

[18] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[19] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[20] Nathan Hartmann,et al. NILC at CWI 2018: Exploring Feature Engineering and Feature Learning , 2018, BEA@NAACL-HLT.

[21] R. P. Fishburne,et al. Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[22] Matthew Shardlow,et al. Out in the Open: Finding and Categorising Errors in the Lexical Simplification Pipeline , 2014, LREC.

[23] Roland Vollgraf,et al. Contextual String Embeddings for Sequence Labeling , 2018, COLING.

[24] Ekaterina Kochmar,et al. CAMB at CWI Shared Task 2018: Complex Word Identification with Ensemble-Based Voting , 2018, BEA@NAACL-HLT.

[25] Serge Sharoff,et al. Open-source Corpora: Using the net to fish for linguistic data , 2006 .

[26] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.

[27] Wiebke Wagner,et al. Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit , 2010, Lang. Resour. Evaluation.

[28] Dirk De Hertog,et al. Deep Learning Architecture for Complex Word Identification , 2018, BEA@NAACL-HLT.

[29] Ron Daniel,et al. BioFLAIR: Pretrained Pooled Contextualized Embeddings for Biomedical Sequence Labeling Tasks , 2019, ArXiv.

[30] Marcos Zampieri,et al. CompLex - A New Corpus for Lexical Complexity Predicition from Likert Scale Data , 2020, READI.

[31] Ekaterina Kochmar,et al. Recursive Context-Aware Lexical Simplification , 2019, EMNLP.

[32] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[33] Pushpak Bhattacharyya,et al. The Whole is Greater than the Sum of its Parts: Towards the Effectiveness of Voting Ensemble Classifiers for Complex Word Identification , 2018, BEA@NAACL-HLT.

[34] David Kauchak,et al. Improving Text Simplification Language Modeling Using Unsimplified Text Data , 2013, ACL.

[35] Lucia Specia,et al. SemEval 2016 Task 11: Complex Word Identification , 2016, *SEMEVAL.