UPB at SemEval-2020 Task 6: Pretrained Language Models for Definition Extraction

This work presents our contribution in the context of the 6th task of SemEval-2020: Extracting Definitions from Free Text in Textbooks (DeftEval). This competition consists of three subtasks with different levels of granularity: (1) classification of sentences as definitional or non-definitional,(2) labeling of definitional sentences, and (3) relation classification. We use various pretrained language models (i.e., BERT, XLNet, RoBERTa, SciBERT, and ALBERT) to solve each of the three subtasks of the competition. Specifically, for each language model variant, we experiment by both freezing its weights and fine-tuning them. We also explore a multi-task architecture that was trained to jointly predict the outputs for the second and the third subtasks. Our best performing model evaluated on the DeftEval dataset obtains the 32nd place for the first subtask and the 37th place for the second subtask. The code is available for further research at: this https URL.

[1]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[2]  Smaranda Muresan,et al.  Evaluation of the DEFINDER system for fully automatic glossary construction , 2001, AMIA.

[3]  Valentin-Alexandru Vladuta,et al.  A Character Prediction Approach in a Security Context using a Recurrent Neural Network , 2018, 2018 International Symposium on Electronics and Telecommunications (ISETC).

[4]  Steven Schockaert,et al.  Syntactically Aware Neural Architectures for Definition Extraction , 2018, NAACL.

[5]  Chuhan Wu,et al.  Neural Metaphor Detecting with CNN-LSTM Model , 2018, Fig-Lang@NAACL-HLT.

[6]  Franck Dernoncourt,et al.  Learning Emphasis Selection for Written Text in Visual Media from Crowd-Sourced Label Distributions , 2019, ACL.

[7]  Peng Jiang,et al.  Automatic extraction of definitions , 2009, 2009 2nd IEEE International Conference on Computer Science and Information Technology.

[8]  Noah A. Smith,et al.  To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks , 2019, RepL4NLP@ACL.

[9]  Iz Beltagy,et al.  SciBERT: A Pretrained Language Model for Scientific Text , 2019, EMNLP.

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  Stefan Daniel Dumitrescu,et al.  Introducing RONEC - the Romanian Named Entity Corpus , 2020, LREC.

[12]  Franck Dernoncourt,et al.  A Joint Model for Definition Extraction with Syntactic Connection and Semantic Consistency , 2020, AAAI.

[13]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[14]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[15]  Paola Velardi,et al.  Learning Word-Class Lattices for Definition and Hypernym Extraction , 2010, ACL.

[16]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[17]  Yiming Yang,et al.  DEFT: A corpus for definition extraction in free- and semi-structured text , 2019, LAW@ACL.

[18]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Cornelia Caragea,et al.  Bi-LSTM-CRF Sequence Labeling for Keyphrase Extraction from Scholarly Documents , 2019, WWW.

[20]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[21]  Kunihiko Fukushima,et al.  Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Visual Pattern Recognition , 1982 .

[22]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[23]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[24]  Xiang Ren,et al.  Empower Sequence Labeling with Task-Aware Neural Language Model , 2017, AAAI.

[25]  Gosse Bouma,et al.  Learning to Identify Definitions using Syntactic Features , 2006, Learning Structured Information@EACL.

[26]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[27]  Thamar Solorio,et al.  Language Identification and Analysis of Code-Switched Social Media Text , 2018, CodeSwitch@ACL.

[28]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[29]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[30]  Jun Zhao,et al.  Large Scaled Relation Extraction With Reinforcement Learning , 2018, AAAI.

[31]  Ekaterina Kochmar,et al.  Complex Word Identification as a Sequence Labelling Task , 2019, ACL.

[32]  Preslav Nakov,et al.  Findings of the NLP4IF-2019 Shared Task on Fine-Grained Propaganda Detection , 2019, EMNLP.

[33]  Jr. G. Forney,et al.  Viterbi Algorithm , 1973, Encyclopedia of Machine Learning.