论文信息 - Biomedical relation extraction with pre-trained language representations and minimal task-specific architecture - 字舞流文

Biomedical relation extraction with pre-trained language representations and minimal task-specific architecture

This paper presents our participation in the AGAC Track from the 2019 BioNLP Open Shared Tasks. We provide a solution for Task 3, which aims to extract "gene - function change - disease" triples, where "gene" and "disease" are mentions of particular genes and diseases respectively and "function change" is one of four pre-defined relationship types. Our system extends BERT (Devlin et al., 2018), a state-of-the-art language model, which learns contextual language representations from a large unlabelled corpus and whose parameters can be fine-tuned to solve specific tasks with minimal additional architecture. We encode the pair of mentions and their textual context as two consecutive sequences in BERT, separated by a special symbol. We then use a single linear layer to classify their relationship into five classes (four pre-defined, as well as 'no relation'). Despite considerable class imbalance, our system significantly outperforms a random baseline while relying on an extremely simple setup with no specially engineered features.

Theodosia Togia | Ashok Thillaisundaram | Theodosia Togia | Ashok Thillaisundaram

[1] Andrew McCallum,et al. Simultaneously Self-Attending to All Mentions for Full-Abstract Biological Relation Extraction , 2018, NAACL.

[2] Guodong Zhou,et al. Chemical-induced disease relation extraction with various linguistic features , 2016, Database J. Biol. Databases Curation.

[3] Yijia Zhang,et al. A hybrid model based on neural networks for biomedical relation extraction , 2018, J. Biomed. Informatics.

[4] Sung-Pil Choi,et al. Extraction of protein–protein interactions (PPIs) from the literature by deep convolutional neural networks with various feature embeddings , 2018, J. Inf. Sci..

[5] Jaewoo Kang,et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[6] Karin M. Verspoor,et al. Convolutional neural networks for chemical-disease relation extraction are improved with character-based word embeddings , 2018, BioNLP.

[7] Kyle Lo,et al. SciBERT: Pretrained Contextualized Embeddings for Scientific Text , 2019, ArXiv.

[8] Fei Li,et al. A neural joint model for entity and relation extraction from biomedical text , 2017, BMC Bioinformatics.

[9] Iz Beltagy,et al. SciBERT: A Pretrained Language Model for Scientific Text , 2019, EMNLP.

[10] Yifan Peng,et al. Improving chemical disease relation extraction with rich features and weakly labeled data , 2016, Journal of Cheminformatics.

[11] Pushpak Bhattacharyya,et al. Relation Extraction : A Survey , 2017, ArXiv.

[12] Jeffrey Ling,et al. Matching the Blanks: Distributional Similarity for Relation Learning , 2019, ACL.

[13] Yifan Peng,et al. Chemical-protein relation extraction with ensembles of SVM, CNN, and RNN models , 2018, ArXiv.

[14] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[15] Wilson L. Taylor,et al. “Cloze Procedure”: A New Tool for Measuring Readability , 1953 .

[16] Philippe Cudré-Mauroux,et al. Relation Extraction Using Distant Supervision , 2018, ACM Comput. Surv..

[17] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[18] Omer Levy,et al. What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.

[19] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .

[20] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[21] Anália Lourenço,et al. Overview of the BioCreative VI chemical-protein interaction Track , 2017 .

[22] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[23] Leonhard Hennig,et al. Improving Relation Extraction by Pre-trained Language Representations , 2019, AKBC.

[24] Zhiyong Lu,et al. PubTator: a web-based text mining tool for assisting biocuration , 2013, Nucleic Acids Res..

[25] Zhiyong Lu,et al. BioCreative V CDR task corpus: a resource for chemical disease relation extraction , 2016, Database J. Biol. Databases Curation.

[26] Yoav Goldberg,et al. Assessing BERT's Syntactic Abilities , 2019, ArXiv.