论文信息 - JUST-BLUE at SemEval-2021 Task 1: Predicting Lexical Complexity using BERT and RoBERTa Pre-trained Language Models

JUST-BLUE at SemEval-2021 Task 1: Predicting Lexical Complexity using BERT and RoBERTa Pre-trained Language Models

Predicting the complexity level of a word or a phrase is considered a challenging task. It is even recognized as a crucial step in numerous NLP applications, such as text rearrangements and text simplification. Early research treated the task as a binary classification task, where the systems anticipated the existence of a word’s complexity (complex versus uncomplicated). Other studies had been designed to assess the level of word complexity using regression models or multi-labeling classification models. Deep learning models show a significant improvement over machine learning models with the rise of transfer learning and pre-trained language models. This paper presents our approach that won the first rank in the SemEval-task1 (sub stask1). We have calculated the degree of word complexity from 0-1 within a text. We have been ranked first place in the competition using the pre-trained language models Bert and RoBERTa, with a Pearson correlation score of 0.788.

[1] Lucia Specia,et al. A Report on the Complex Word Identification Shared Task 2018 , 2018, BEA@NAACL-HLT.

[2] Trevor Brothers,et al. Anticipating syntax during reading: Evidence from the boundary change paradigm. , 2016, Journal of experimental psychology. Learning, memory, and cognition.

[3] Lucia Specia,et al. SemEval 2016 Task 11: Complex Word Identification , 2016, *SEMEVAL.

[4] Ekaterina Kochmar,et al. Complex Word Identification as a Sequence Labelling Task , 2019, ACL.

[5] Mohammad Al-Smadi,et al. JUSTMasters at SemEval-2020 Task 3: Multilingual Deep Learning Model to Predict the Effect of Context in Word Similarity , 2020, SemEval@COLING.

[6] Mohammed Bahja. Natural Language Processing Applications in Business , 2020 .

[7] Marcos Zampieri,et al. CompLex - A New Corpus for Lexical Complexity Predicition from Likert Scale Data , 2020, READI.

[8] Matthew Shardlow,et al. A Comparison of Techniques to Automatically Identify Complex Words. , 2013, ACL.

[9] Mahmoud Hammad,et al. MLEngineer at SemEval-2020 Task 7: BERT-Flair Based Humor Detection Model (BFHumor) , 2020, SemEval@COLING.

[10] Marcos Zampieri,et al. SemEval-2021 Task 1: Lexical Complexity Prediction , 2021, SEMEVAL.

[11] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[12] Dirk De Hertog,et al. Deep Learning Architecture for Complex Word Identification , 2018, BEA@NAACL-HLT.

[13] Lucia Specia,et al. Complex Word Identification: Challenges in Data Annotation and System Performance , 2017, NLP-TEA@IJCNLP.

[14] Richard Alan Peters,et al. A Review of Deep Learning with Special Emphasis on Architectures, Applications and Recent Trends , 2019, Knowl. Based Syst..

[15] Filip Grali'nski,et al. ApplicaAI at SemEval-2020 Task 11: On RoBERTa-CRF, Span CLS and Whether Self-Training Helps Them , 2020, SemEval@COLING.