Drug Target Interaction Prediction with Non-random Missing Labels

Drug-Target Interaction (DTI) prediction plays an important role in drug discovery and drug repurposing. DTI prediction is usually modeled as a binary classification problem. Unlike previous studies which label unknown DTIs as negative samples, we assume the unknown DTIs are labels that are missing not at random. For example, negative DTI labels are more likely to be missing because biomedical researchers prioritize to study DTIs that are more likely to be positive. We introduce a novel probabilistic model, Factorization with Non-random Missing Labels (FNML), for DTI prediction. FNML models the generative process for the DTI labels (i.e. the labels are positive or negative) and responses (i.e. the labels are observed or missing). In particular, the probability of observing or missing a label is associated with the sign of the label. We also conduct comprehensive experiments to validate the robust performance of the proposed models.

[1]  Thomas C. Wiegers,et al.  The Comparative Toxicogenomics Database: update 2013 , 2012, Nucleic Acids Res..

[2]  Hao Ding,et al.  Similarity-based machine learning methods for predicting drug-target interactions: a brief review , 2014, Briefings Bioinform..

[3]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[4]  Yongdong Zhang,et al.  Drug-target interaction prediction: databases, web servers and computational models , 2016, Briefings Bioinform..

[5]  Bonnie Berger,et al.  Diffusion Component Analysis: Unraveling Functional Topology in Biological Networks , 2015, RECOMB.

[6]  David S. Wishart,et al.  DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs , 2010, Nucleic Acids Res..

[7]  P. Bork,et al.  A side effect resource to capture phenotypic effects of drugs , 2010, Molecular systems biology.

[8]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[9]  Xiaobo Zhou,et al.  Semi-supervised drug-protein interaction prediction from heterogeneous biological spaces , 2010, BMC Systems Biology.

[10]  Michael J. Keiser,et al.  Relating protein pharmacology by ligand chemistry , 2007, Nature Biotechnology.

[11]  Daniel R. Caffrey,et al.  Structure-based maximal affinity model predicts small-molecule druggability , 2007, Nature Biotechnology.

[12]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[13]  Hyeon-Eui Kim,et al.  Deep mining heterogeneous networks of biomedical linked data to predict novel drug‐target associations , 2017, Bioinform..

[14]  Xiang Zhang,et al.  Drug repositioning by integrating target information through a heterogeneous network model , 2014, Bioinform..

[15]  Bo Liao,et al.  Screening drug-target interactions with positive-unlabeled learning , 2017, Scientific Reports.

[16]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[17]  Jian Peng,et al.  A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information , 2017, Nature Communications.

[18]  M. Kanehisa,et al.  Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. , 2003, Journal of the American Chemical Society.

[19]  Thomas C. Wiegers,et al.  The Comparative Toxicogenomics Database: update 2011 , 2010, Nucleic Acids Res..