RGCL at SemEval-2020 Task 6: Neural Approaches to Definition Extraction

This paper presents the RGCL team submission to SemEval 2020 Task 6: DeftEval, subtasks 1 and 2. The system classifies definitions at the sentence and token levels. It utilises state-of-the-art neural network architectures, which have some task-specific adaptations, including an automatically extended training set. Overall, the approach achieves acceptable evaluation scores, while maintaining flexibility in architecture selection.

[1]  Qiang Liu,et al.  Comparative Study of CNN and RNN for Deep Learning Based Intrusion Detection System , 2018, ICCCS.

[2]  Pierre Zweigenbaum,et al.  Mining defining contexts to help structuring differential ontologies , 2005 .

[3]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[4]  Gerardo Sierra,et al.  Definitional verbal patterns for semantic relation extraction , 2008 .

[5]  Marcos Zampieri,et al.  BRUMS at HASOC 2019: Deep Learning Models for Multilingual Hate Speech and Offensive Language Identification , 2019, FIRE.

[6]  Adam Przepiórkowski,et al.  Definition Extraction with Balanced Random Forests , 2008, GoTAL.

[7]  Ruslan Mitkov,et al.  Semantic Textual Similarity with Siamese Neural Networks , 2019, RANLP.

[8]  Xuanjing Huang,et al.  How to Fine-Tune BERT for Text Classification? , 2019, CCL.

[9]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[10]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[13]  Yiming Yang,et al.  DEFT: A corpus for definition extraction in free- and semi-structured text , 2019, LAW@ACL.

[14]  Maria Leonor Pacheco,et al.  of the Association for Computational Linguistics: , 2001 .

[15]  Veselin Stoyanov,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[16]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[17]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[18]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[19]  Pierre Zweigenbaum,et al.  Detecting Semantic Relations between Terms in Definitions , 2004 .

[20]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[21]  Jonas Mueller,et al.  Siamese Recurrent Architectures for Learning Sentence Similarity , 2016, AAAI.

[22]  Gerardo Sierra,et al.  Description and evaluation of a definition extraction system for Spanish language , 2009 .

[23]  António Branco,et al.  Language Independent System for Definition Extraction: First Results Using Learning Algorithms , 2009 .

[24]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[26]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[27]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.