CGBVS‐DNN: Prediction of Compound‐protein Interactions Based on Deep Learning

Computational prediction of compound‐protein interactions (CPIs) is of great importance for drug design as the first step in in‐silico screening. We previously proposed chemical genomics‐based virtual screening (CGBVS), which predicts CPIs by using a support vector machine (SVM). However, the CGBVS has problems when training using more than a million datasets of CPIs since SVMs require an exponential increase in the calculation time and computer memory. To solve this problem, we propose the CGBVS‐DNN, in which we use deep neural networks, a kind of deep learning technique, instead of the SVM. Deep learning does not require learning all input data at once because the network can be trained with small mini‐batches. Experimental results show that the CGBVS‐DNN outperformed the original CGBVS with a quarter million CPIs. Results of cross‐validation show that the accuracy of the CGBVS‐DNN reaches up to 98.2 % (σ<0.01) with 4 million CPIs.

[1]  Gisbert Schneider,et al.  Deep Learning in Drug Discovery , 2016, Molecular informatics.

[2]  Todd J. A. Ewing,et al.  Critical evaluation of search algorithms for automated molecular docking and database screening , 1997 .

[3]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[4]  Alan Agresti,et al.  Categorical Data Analysis , 2003 .

[5]  Manuela Pavan,et al.  DRAGON SOFTWARE: AN EASY APPROACH TO MOLECULAR DESCRIPTOR CALCULATIONS , 2006 .

[6]  Hongma Sun,et al.  Pharmacophore-based virtual screening. , 2008, Current medicinal chemistry.

[7]  H. Yabuuchi,et al.  Analysis of multiple compound–protein interactions reveals novel bioactive molecules , 2011, Molecular systems biology.

[8]  Michael J. Keiser,et al.  Relating protein pharmacology by ligand chemistry , 2007, Nature Biotechnology.

[9]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[10]  George Papadatos,et al.  The ChEMBL bioactivity database: an update , 2013, Nucleic Acids Res..

[11]  Alexander J. Smola,et al.  Efficient mini-batch training for stochastic optimization , 2014, KDD.

[12]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[13]  Xiaohua Douglas Zhang Optimal High-Throughput Screening: Practical Experimental Design and Data Analysis for Genome-Scale RNAi Research , 2011 .

[14]  Fred L. Drake,et al.  The Python Language Reference Manual , 1999 .

[15]  David S. Goodsell,et al.  Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function , 1998 .

[16]  Rasmus Wernersson,et al.  Virtual Ribosome—a comprehensive DNA translation tool with support for integration of sequence feature annotation , 2006, Nucleic Acids Res..

[17]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..