论文信息 - Ablations over transformer models for biomedical relationship extraction

Ablations over transformer models for biomedical relationship extraction

Background: Masked language modelling approaches have enjoyed success in improving benchmark performance across many general and biomedical domain natural language processing tasks, including biomedical relationship extraction (RE). However, the recent surge in both the number of novel architectures and the volume of training data they utilise may lead us to question whether domain specific pretrained models are necessary. Additionally, recent work has proposed novel classification heads for RE tasks, further improving performance. Here, we perform ablations over several pretrained models and classification heads to try to untangle the perceived benefits of each. Methods: We use a range of string preprocessing strategies, combined with Bidirectional Encoder Representations from Transformers (BERT), BioBERT and RoBERTa architectures to perform ablations over three RE datasets pertaining to drug-drug and chemical protein interactions, and general domain relationship extraction. We explore the use of the RBERT classification head, compared to a simple linear classification layer across all architectures and datasets. Results: We observe a moderate performance benefit in using the BioBERT pretrained model over the BERT base cased model, although there appears to be little difference when comparing BioBERT to RoBERTa large. In addition, we observe a substantial benefit of using the RBERT head on the general domain RE dataset, but this is not consistently reflected in the biomedical RE datasets. Finally, we discover that randomising the token order of training data does not result in catastrophic performance degradation in our selected tasks. Conclusions: We find a recent general domain pretrained model performs approximately the same as a biomedical specific one, suggesting that domain specific models may be of limited use given the tendency of recent model pretraining regimes to incorporate ever broader sets of data. In addition, we suggest that care must be taken in RE model training, to prevent fitting to non-syntactic features of datasets.

[1] Yifan He,et al. Enriching Pre-trained Language Model with Entity Information for Relation Classification , 2019, CIKM.

[2] Zhiyong Lu,et al. Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets , 2019, BioNLP@ACL.

[3] Zhiyong Lu,et al. Community challenges in biomedical text mining over 10 years: success, failure and the future , 2016, Briefings Bioinform..

[4] Jaewoo Kang,et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[5] Hung-Yu Kao,et al. Probing Neural Network Comprehension of Natural Language Arguments , 2019, ACL.

[6] Jaewoo Kang,et al. Chemical–gene relation extraction using recursive neural network , 2018, Database J. Biol. Databases Curation.

[7] Herrero-ZazoMaría,et al. The DDI corpus , 2013 .

[8] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[9] Paloma Martínez,et al. The DDI corpus: An annotated corpus with pharmacological substances and drug-drug interactions , 2013, J. Biomed. Informatics.

[10] Sameer Singh,et al. Universal Adversarial Triggers for Attacking and Analyzing NLP , 2019, EMNLP.

[11] Dipanjan Das,et al. BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.