A Multi-View Deep Neural Network Model for Chemical-Disease Relation Extraction From Imbalanced Datasets

Understanding the chemical-disease relations (CDR) is a crucial task in various biomedical domains. Manual mining of these information from biomedical literature is costly and time-consuming. To address these issues, various researches have been carried out to design an efficient automatic tool. In this paper, we propose a multi-view based deep neural network model for CDR task. Typically, multiple representations (or views) of the datasets are not available for this task. So, we train multiple conceptually different deep neural network models on the dataset to generate different abstract features, treated as different views. A novel loss function, “Penalized LF”, is defined to address the problem of imbalance dataset. The proposed loss function is generic in nature. The model is designed as a combination of Convolution Neural Network (CNN) and Bidirectional Long Short Term Memory (Bi-LSTM) network along with a Multi-Layer Perceptron (MLP). To show the efficacy of our proposed model, we have compared it with six baseline models and other state-of-the-art techniques, on “chemicals-and-disease-DFE” dataset, a free text dataset created by Li et al. from BioCreative V Chemical Disease Relation dataset. Results show that the proposed model attains highest $F1-score$ for individual classes, proving its efficiency in handling class imbalance problem in the dataset. To further demonstrate the efficacy of the proposed model, we have presented results on BioCreative V dataset and two Protein-Protein Interaction Identification (PPI) datasets, viz., AiMed and BioInfer. All these results are also compared with the state-of-the-art models.

[1]  Lei Hua,et al.  A Shortest Dependency Path Based Convolutional Neural Network for Protein-Protein Relation Extraction , 2016, BioMed research international.

[2]  Saurav Mallik,et al.  An evaluation of supervised methods for identifying differentially methylated regions in Illumina methylation arrays , 2018, Briefings Bioinform..

[3]  Guodong Zhou,et al.  Chemical-induced disease relation extraction via convolutional neural network , 2017, Database J. Biol. Databases Curation.

[4]  Jun'ichi Tsujii,et al.  Evaluating Impact of Re-training a Lexical Disambiguation Model on Domain Adaptation of an HPSG Parser , 2007, Trends in Parsing Technology.

[5]  Xiaolong Wang,et al.  Chemical-induced disease extraction via convolutional neural networks with attention , 2017, 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[6]  Zhiyong Lu,et al.  Understanding PubMed® user search behavior through log analysis , 2009, Database J. Biol. Databases Curation.

[7]  Rohit J. Kate,et al.  Comparative experiments on learning information extractors for proteins and their interactions , 2005, Artif. Intell. Medicine.

[8]  Guodong Zhou,et al.  Chemical-induced disease relation extraction with various linguistic features , 2016, Database J. Biol. Databases Curation.

[9]  Yifan Peng,et al.  Improving chemical disease relation extraction with rich features and weakly labeled data , 2016, Journal of Cheminformatics.

[10]  Erik M. van Mulligen,et al.  Extraction of chemical-induced diseases using prior knowledge and textual information , 2016, Database J. Biol. Databases Curation.

[11]  Xiao Sun,et al.  Multichannel Convolutional Neural Network for Biological Relation Extraction , 2016, BioMed research international.

[12]  Sriparna Saha,et al.  A Unified Multi-view Clustering Algorithm Using Multi-objective Optimization Coupled with Generative Model , 2020, ACM Trans. Knowl. Discov. Data.

[13]  Yanchun Liang,et al.  Deep Residual Convolutional Neural Network for Protein-Protein Interaction Extraction , 2019, IEEE Access.

[14]  Yifan Peng,et al.  Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task , 2016, Database J. Biol. Databases Curation.

[15]  Pushpak Bhattacharyya,et al.  Feature assisted stacked attentive shortest dependency path based Bi-LSTM model for protein-protein interaction , 2019, Knowl. Based Syst..

[16]  Mikael Bodén,et al.  A guide to recurrent neural networks and backpropagation , 2001 .

[17]  Hayit Greenspan,et al.  A multi-view deep learning architecture for classification of breast microcalcifications , 2016, 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI).

[18]  Jeyakumar Natarajan,et al.  Distributed smoothed tree kernel for protein-protein interaction extraction from the biomedical literature , 2017, PloS one.

[19]  Yaoyun Zhang,et al.  CD-REST: a system for extracting chemical-induced disease relation in literature , 2016, Database J. Biol. Databases Curation.

[20]  Anders Holst The DALLAS project. Report from the NUTEK-supported project AIS-8: Application of Data Analysis with Learning Systems, 1999-2001 , 2002 .

[21]  Shixian Ning,et al.  Combining Context and Knowledge Representations for Chemical-Disease Relation Extraction , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[22]  Sophia Ananiadou,et al.  Text-mining-assisted biocuration workflows in Argo , 2014, Database J. Biol. Databases Curation.

[23]  Daniel M. Lowe,et al.  Efficient chemical-disease identification and relationship extraction using Wikipedia to improve recall , 2016, Database J. Biol. Databases Curation.

[24]  Jari Björne,et al.  BioInfer: a corpus for information extraction in the biomedical domain , 2007, BMC Bioinformatics.

[25]  Long Chen,et al.  Exploiting syntactic and semantics information for chemical–disease relation extraction , 2016, Database J. Biol. Databases Curation.

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[28]  Tapio Salakoski,et al.  Distributional Semantics Resources for Biomedical Text Processing , 2013 .

[29]  Xiaogang Wang,et al.  Multi-View Perceptron: a Deep Model for Learning Face Identity and View Representations , 2014, NIPS.

[30]  Haohan Wang,et al.  Deep Learning for Genomics: A Concise Overview , 2018, ArXiv.

[31]  Shixian Ning,et al.  Chemical-induced disease relation extraction with dependency information and prior knowledge , 2018, J. Biomed. Informatics.

[32]  Jun'ichi Tsujii,et al.  Protein-protein interaction extraction by leveraging multiple kernels and parsers , 2009, Int. J. Medical Informatics.

[33]  Thomas C. Wiegers,et al.  A CTD–Pfizer collaboration: manual curation of 88 000 scientific articles text mined for drug–disease and drug–phenotype interactions , 2013, Database J. Biol. Databases Curation.

[34]  Hongfei Lin,et al.  A protein-protein interaction extraction approach based on deep neural network , 2016, Int. J. Data Min. Bioinform..

[35]  Karin M. Verspoor,et al.  Convolutional neural networks for chemical-disease relation extraction are improved with character-based word embeddings , 2018, BioNLP.

[36]  É MikaelBoden A guide to recurrent neural networks and backpropagation , 2001 .

[37]  Sung-Pil Choi,et al.  Extraction of protein–protein interactions (PPIs) from the literature by deep convolutional neural networks with various feature embeddings , 2018, J. Inf. Sci..

[38]  Zhiyong Lu,et al.  PubMed and beyond: a survey of web tools for searching biomedical literature , 2011, Database J. Biol. Databases Curation.

[39]  ChoiSung-Pil Extraction of protein-protein interactions PPIs from the literature by deep convolutional neural networks with various feature embeddings , 2018 .

[40]  Nigel Collier,et al.  Improving chemical-induced disease relation extraction with learned features based on convolutional neural network , 2017, 2017 9th International Conference on Knowledge and Systems Engineering (KSE).

[41]  Subhransu Maji,et al.  Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[42]  Huiwei Zhou,et al.  Chemical-disease Relations Extraction Based on The Shortest Dependency Path Tree , 2015 .

[43]  Thomas C. Wiegers,et al.  Comparative Toxicogenomics Database: a knowledgebase and discovery tool for chemical–gene–disease networks , 2008, Nucleic Acids Res..

[44]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[45]  Duc Minh Nguyen,et al.  Multiview Deep Learning for Predicting Twitter Users' Location , 2017, ArXiv.

[46]  Tong Shu Li,et al.  A crowdsourcing workflow for extracting chemical-induced disease relations from free text , 2016, Database J. Biol. Databases Curation.