SemEval-2021 Task 1: Lexical Complexity Prediction

This paper presents the results and main findings of SemEval-2021 Task 1 - Lexical Complexity Prediction. We provided participants with an augmented version of the CompLex Corpus (Shardlow et al. 2020). CompLex is an English multi-domain corpus in which words and multi-word expressions (MWEs) were annotated with respect to their complexity using a five point Likert scale. SemEval-2021 Task 1 featured two Sub-tasks: Sub-task 1 focused on single words and Sub-task 2 focused on MWEs. The competition attracted 198 teams in total, of which 54 teams submitted official runs on the test data to Sub-task 1 and 37 to Sub-task 2.

[1]  Emmanuele Chersoni,et al.  PolyU CBS-Comp at SemEval-2021 Task 1: Lexical Complexity Prediction (LCP) , 2021, SEMEVAL.

[2]  Gustavo Henrique Paetzold UTFPR at SemEval-2021 Task 1: Complexity Prediction by Combining BERT Vectors and Classic Features , 2021, SEMEVAL.

[3]  Armand Rotaru ANDI at SemEval-2021 Task 1: Predicting complexity in context using distributional models, behavioural norms, and lexical resources , 2021, SEMEVAL.

[4]  Malak Abdullah,et al.  JUST-BLUE at SemEval-2021 Task 1: Predicting Lexical Complexity using BERT and RoBERTa Pre-trained Language Models , 2021, SEMEVAL.

[5]  Kyle Gorman,et al.  We Need to Talk about Standard Splits , 2019, ACL.

[6]  Ichiro Kobayashi,et al.  OCHADAI-KYOTO at SemEval-2021 Task 1: Enhancing Model Generalization and Robustness for Lexical Complexity Prediction , 2021, SEMEVAL.

[7]  Julie Medero,et al.  HMC at SemEval-2016 Task 11: Identifying Complex Words Using Depth-limited Decision Trees , 2016, *SEMEVAL.

[8]  Prafulla Kumar Choubey,et al.  Garuda & Bhasha at SemEval-2016 Task 11: Complex Word Identification Using Aggregated Learning Models , 2016, *SEMEVAL.

[9]  Zheng Yuan,et al.  Cambridge at SemEval-2021 Task 1: An Ensemble of Feature-Based and Neural Models for Lexical Complexity Prediction , 2021, SEMEVAL.

[10]  Zhipeng Luo,et al.  DeepBlueAI at SemEval-2021 Task 1: Lexical Complexity Prediction with A Deep Ensemble Approach , 2021, SEMEVAL.

[11]  David Alfter,et al.  SB@GU at the Complex Word Identification 2018 Shared Task , 2018, BEA@NAACL-HLT.

[12]  Christian Biemann,et al.  CWIG3G2 - Complex Word Identification Task across Three Text Genres and Two User Groups , 2017, IJCNLP.

[13]  Mamoru Komachi,et al.  Complex Word Identification Based on Frequency in a Learner Corpus , 2018, BEA@NAACL-HLT.

[14]  Lucia Specia,et al.  Complex Word Identification: Challenges in Data Annotation and System Performance , 2017, NLP-TEA@IJCNLP.

[15]  Anders Sogaard,et al.  We Need To Talk About Random Splits , 2020, EACL.

[16]  Shervin Malmasi,et al.  LTG at SemEval-2016 Task 11: Complex Word Identification with Classifier Ensembles , 2016, *SEMEVAL.

[17]  Mark Steedman,et al.  A massively parallel corpus: the Bible in 100 languages , 2014, Lang. Resour. Evaluation.

[18]  Ethan A. Chi,et al.  Stanford MLab at SemEval-2021 Task 1: Tree-Based Modelling of Lexical Complexity using Word Embeddings , 2021, SEMEVAL.

[19]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[20]  J. Chall,et al.  A FORMULA FOR PREDICTING READABILITY , 1948 .

[21]  Marcos Zampieri,et al.  CompLex - A New Corpus for Lexical Complexity Predicition from Likert Scale Data , 2020, READI.

[22]  Ashutosh Modi,et al.  IITK@LCP at SemEval-2021 Task 1: Classification for Lexical Complexity Regression Task , 2021, SEMEVAL.

[23]  Gillin Nat Sensible at SemEval-2016 Task 11: Neural Nonsense Mangled in Ensemble Mess , 2016, SemEval@NAACL-HLT.

[24]  Yves Bestgen LAST at SemEval-2021 Task 1: Improving Multi-Word Complexity Prediction Using Bigram Association Measures , 2021, SEMEVAL.

[25]  Ekaterina Kochmar,et al.  Recursive Context-Aware Lexical Simplification , 2019, EMNLP.

[26]  Lucia Specia,et al.  SemEval 2016 Task 11: Complex Word Identification , 2016, *SEMEVAL.

[27]  Josef van Genabith,et al.  MacSaar at SemEval-2016 Task 11: Zipfian and Character Features for ComplexWord Identification , 2016, *SEMEVAL.

[28]  Caroline Gasperin,et al.  Fostering Digital Inclusion and Accessibility: The PorSimples project for Simplification of Portuguese Texts , 2010, NAACL.

[29]  Krzysztof Wrobel PLUJAGH at SemEval-2016 Task 11: Simple System for Complex Word Identification , 2016, SemEval@NAACL-HLT.

[30]  Lucia Specia,et al.  SV000gg at SemEval-2016 Task 11: Heavy Gauge Complex Word Identification with System Voting , 2016, SemEval@NAACL-HLT.

[31]  Chaya Liebeskind,et al.  JCT at SemEval-2021 Task 1: Context-aware Representation for Lexical Complexity Prediction , 2021, SEMEVAL.

[32]  Nathan Hartmann,et al.  NILC at CWI 2018: Exploring Feature Engineering and Feature Learning , 2018, BEA@NAACL-HLT.

[33]  Matthew Shardlow,et al.  Manchester Metropolitan at SemEval-2021 Task 1: Convolutional Networks for Complex Word Identification , 2021, SEMEVAL.

[34]  Marcos Zampieri,et al.  LCP-RIT at SemEval-2021 Task 1: Exploring Linguistic Features for Lexical Complexity Prediction , 2021, SEMEVAL.

[35]  Maja Popovic Complex Word Identification Using Character n-grams , 2018, BEA@NAACL-HLT.

[36]  Fernando Alva-Manchego,et al.  IAPUCP at SemEval-2021 Task 1: Stacking Fine-Tuned Transformers is Almost All You Need for Lexical Complexity Prediction , 2021, SEMEVAL.

[37]  Abu Nowshed Chy,et al.  CSECU-DSG at SemEval-2021 Task 1: Fusion of Transformer Models for Lexical Complexity Prediction , 2021, SEMEVAL.

[38]  Radu Tudor Ionescu,et al.  UnibucKernel: A kernel-based learning method for complex word identification , 2018, BEA@NAACL-HLT.

[39]  Onur Kuru,et al.  AI-KU at SemEval-2016 Task 11: Word Embeddings and Substring Features for Complex Word Identification , 2016, *SEMEVAL.

[40]  Horacio Saggion,et al.  TALN at SemEval-2016 Task 11: Modelling Complex Words by Contextual, Lexical and Semantic Features , 2016, *SEMEVAL.

[41]  K. P. Soman,et al.  AmritaCEN at SemEval-2016 Task 11: Complex Word Identification using Word Embedding , 2016, SemEval@NAACL-HLT.

[42]  Matthew Shardlow,et al.  A Comparison of Techniques to Automatically Identify Complex Words. , 2013, ACL.

[43]  Shiva Taslimipoor,et al.  SeCoDa: Sense Complexity Dataset , 2020, LREC.

[44]  Dumitru-Clementin Cercel,et al.  UPB at SemEval-2021 Task 1: Combining Deep Learning and Hand-Crafted Features for Lexical Complexity Prediction , 2021, SEMEVAL.

[45]  Braja Gopal Patra,et al.  JU_NLP at SemEval-2016 Task 11: Identifying Complex Words in a Sentence , 2016, SemEval@NAACL-HLT.

[46]  Regina Stodden,et al.  RS_GV at SemEval-2021 Task 1: Sense Relative Lexical Complexity Prediction , 2021, SEMEVAL.

[47]  Horacio Saggion,et al.  LaSTUS/TALN at Complex Word Identification (CWI) 2018 Shared Task , 2018, BEA@NAACL-HLT.

[48]  Sabine Bartsch,et al.  TUDA-CCL at SemEval-2021 Task 1: Using Gradient-boosted Regression Tree Ensembles Trained on a Heterogeneous Feature Set for Predicting Lexical Complexity , 2021, SEMEVAL.

[49]  Alexander F. Gelbukh,et al.  Complex Word Identification: Convolutional Neural Network vs. Feature Engineering , 2018, BEA@NAACL-HLT.

[50]  Ali Hakimi Parizi,et al.  UNBNLP at SemEval-2021 Task 1: Predicting lexical complexity with masked language models and character-level encoders , 2021, SEMEVAL.

[51]  Xiaobing Zhou,et al.  hub at SemEval-2021 Task 1: Fusion of Sentence and Word Frequency to Predict Lexical Complexity , 2021, SEMEVAL.

[52]  Lian-Xin Jiang,et al.  RG PA at SemEval-2021 Task 1: A Contextual Attention-based Model with RoBERTa for Lexical Complexity Prediction , 2021, SEMEVAL.

[53]  Soroush Vosoughi,et al.  BigGreen at SemEval-2021 Task 1: Lexical Complexity Prediction with Assembly Models , 2021, SEMEVAL.

[54]  Katja Voskoboinik katildakat at SemEval-2021 Task 1: Lexical Complexity Prediction of Single Words and Multi-Word Expressions in English , 2021, SEMEVAL.

[55]  Lucia Specia,et al.  Lexical Simplification with Neural Ranking , 2017, EACL.

[56]  Irene Russo archer at SemEval-2021 Task 1: Contextualising Lexical Complexity , 2021, SEMEVAL.

[57]  Wei Xu,et al.  A Word-Complexity Lexicon and A Neural Readability Ranking Model for Lexical Simplification , 2018, EMNLP.

[58]  Ekaterina Kochmar,et al.  CAMB at CWI Shared Task 2018: Complex Word Identification with Ensemble-Based Voting , 2018, BEA@NAACL-HLT.

[59]  Mironas Bitinis,et al.  CLULEX at SemEval-2021 Task 1: A Simple System Goes a Long Way , 2021, SEMEVAL.

[60]  Ismail Berrada,et al.  CS-UM6P at SemEval-2021 Task 1: A Deep Learning Model-based Pre-trained Transformer Encoder for Lexical Complexity , 2021, SEMEVAL.

[61]  Dirk De Hertog,et al.  Deep Learning Architecture for Complex Word Identification , 2018, BEA@NAACL-HLT.

[62]  Joachim Bingel,et al.  CoastalCPH at SemEval-2016 Task 11: The importance of designing your Neural Networks right , 2016, *SEMEVAL.

[63]  David Kauchak Pomona at SemEval-2016 Task 11: Predicting Word Complexity Based on Corpus Frequency , 2016, SemEval@NAACL-HLT.

[64]  Alejandro Mosquera Alejandro Mosquera at SemEval-2021 Task 1: Exploring Sentence and Word Features for Lexical Complexity Prediction , 2021, SEMEVAL.

[65]  Michal Konkol,et al.  UWB at SemEval-2016 Task 11: Exploring Features for Complex Word Identification , 2016, *SEMEVAL.

[66]  Pushpak Bhattacharyya,et al.  The Whole is Greater than the Sum of its Parts: Towards the Effectiveness of Voting Ensemble Classifiers for Complex Word Identification , 2018, BEA@NAACL-HLT.

[67]  Giuseppe Vettigli,et al.  CompNA at SemEval-2021 Task 1: Prediction of lexical complexity analyzing heterogeneous features , 2021, SEMEVAL.

[68]  Ricardo Baeza-Yates,et al.  A plug-in to aid online reading in Spanish , 2015, W4A.

[69]  Lucia Specia,et al.  A Report on the Complex Word Identification Shared Task 2018 , 2018, BEA@NAACL-HLT.

[70]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[71]  Timothy Baldwin,et al.  Melbourne at SemEval 2016 Task 11: Classifying Type-level Word Complexity using Random Forests with Corpus and Word List Features , 2016, SemEval@NAACL-HLT.