What matters in a transferable neural network model for relation classification in the biomedical domain?

A lack of sufficient labeled data often limits the applicability of advanced machine learning algorithms to real life problems. However, the efficient use of transfer learning (TL) has been shown to be very useful across domains. TL make use of valuable knowledge learned in one task (source task), where sufficient data is available, in order to improve performance on the task of interest (target task). In the biomedical and clinical domain, a lack of sufficient training data means that machine learning models cannot be fully exploited. In this work, we present two unified recurrent neural models leading to three transfer learning frameworks for relation classification tasks. We systematically investigate the effectiveness of the proposed frameworks in transferring knowledge from a source task to a target task when the characteristics of the source data vary, such as similarity or relatedness between the source and target tasks, and the size of training data for the source task. Our empirical results show that the proposed frameworks, in general, improve the model performance. However, these improvements do depend on characteristics of source and target tasks. This dependence then finally determine the choice of a particular TL framework.

[1]  Fei Li,et al.  A Bi-LSTM-RNN Model for Relation Classification Using Low-Cost Sequence Features , 2016, ArXiv.

[2]  Rui Yan,et al.  How Transferable are Neural Networks in NLP Applications? , 2016, EMNLP.

[3]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[4]  Sunil Kumar Sahu,et al.  Evaluating distributed word representations for capturing semantics of biomedical concepts , 2015, BioNLP@IJCNLP.

[5]  Juliane Fluck,et al.  SCAI: Extracting drug-drug interactions using a rich feature vector , 2013, SemEval@NAACL-HLT.

[6]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[7]  Sunil Kumar Sahu,et al.  Relation extraction from clinical texts using domain invariant convolutional neural network , 2016, BioNLP@ACL.

[8]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[9]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[10]  Alberto Lavelli,et al.  FBK-irst : A Multi-Phase Kernel Based Approach for Drug-Drug Interaction Detection and Classification that Exploits Linguistic Information , 2013, *SEMEVAL.

[11]  Atsuto Maki,et al.  From generic to specific deep representations for visual recognition , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[12]  Eric Nichols,et al.  Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[13]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[14]  Paloma Martínez,et al.  SemEval-2013 Task 9 : Extraction of Drug-Drug Interactions from Biomedical Texts (DDIExtraction 2013) , 2013, *SEMEVAL.

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Isabel Segura-Bedmar,et al.  The 1st DDIExtraction-2011 challenge task: Extraction of Drug-Drug Interactions from biomedical texts , 2011 .

[17]  Jari Björne,et al.  UTurku: Drug Named Entity Recognition and Drug-Drug Interaction Extraction Using SVM Classification and Domain Knowledge , 2013, *SEMEVAL.

[18]  Hongfei Lin,et al.  Drug drug interaction extraction from biomedical literature using syntax convolutional neural network , 2016, Bioinform..

[19]  Deniz Yuret,et al.  Transfer Learning for Low-Resource Neural Machine Translation , 2016, EMNLP.

[20]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[21]  Majid Rastegar-Mojarad,et al.  UWM-TRIADS: Classifying Drug-Drug Interactions with Two-Stage SVM and Post-Processing , 2013, SemEval@NAACL-HLT.

[22]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[23]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[24]  Mariana L. Neves,et al.  WBI-DDI: Drug-Drug Interaction Extraction using Majority Voting , 2013, *SEMEVAL.

[25]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Sunil Kumar Sahu,et al.  Drug-Drug Interaction Extraction from Biomedical Text Using Long Short Term Memory Network , 2017, J. Biomed. Informatics.

[27]  Sampo Pyysalo,et al.  Event extraction across multiple levels of biological organization , 2012, Bioinform..

[28]  Juliane Fluck,et al.  Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports , 2012, J. Biomed. Informatics.

[29]  Shuying Shen,et al.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text , 2011, J. Am. Medical Informatics Assoc..

[30]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[31]  Dong Wang,et al.  Relation Classification via Recurrent Neural Network , 2015, ArXiv.

[32]  Ruslan Salakhutdinov,et al.  Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks , 2016, ICLR.

[33]  Luca Toldo,et al.  Extraction of potential adverse drug events from medical case reports , 2012, Journal of biomedical semantics.

[34]  Ming Yang,et al.  Bidirectional Long Short-Term Memory Networks for Relation Classification , 2015, PACLIC.

[35]  Wei Shi,et al.  Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification , 2016, ACL.