Prediction of Drug-Likeness Using Deep Autoencoder Neural Networks

Due to diverse reasons, most drug candidates cannot eventually become marketed drugs. Developing reliable computational methods for prediction of drug-likeness of candidate compounds is of vital importance to improve the success rate of drug discovery and development. In this study, we used a fully connected neural networks (FNN) to construct drug-likeness classification models with deep autoencoder to initialize model parameters. We collected datasets of drugs (represented by ZINC World Drug), bioactive molecules (represented by MDDR and WDI), and common molecules (represented by ZINC All Purchasable and ACD). Compounds were encoded with MOLD2 two-dimensional structure descriptors. The classification accuracies of drug-like/non-drug-like model are 91.04% on WDI/ACD databases, and 91.20% on MDDR/ZINC, respectively. The performance of the models outperforms previously reported models. In addition, we develop a drug/non-drug-like model (ZINC World Drug vs. ZINC All Purchasable), which distinguishes drugs and common compounds, with a classification accuracy of 96.99%. Our work shows that by using high-latitude molecular descriptors, we can apply deep learning technology to establish state-of-the-art drug-likeness prediction models.

[1]  Kuldip K. Paliwal,et al.  Predicting backbone Cα angles and dihedrals from protein sequences by stacked sparse auto‐encoder deep neural network , 2014, J. Comput. Chem..

[2]  Jens Sadowski,et al.  Comparison of Support Vector Machine and Artificial Neural Network Systems for Drug/Nondrug Classification , 2003, J. Chem. Inf. Comput. Sci..

[3]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[4]  G. V. Paolini,et al.  Quantifying the chemical beauty of drugs. , 2012, Nature chemistry.

[5]  Yoshua Bengio,et al.  End-to-end attention-based large vocabulary speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[7]  Gunnar Rätsch,et al.  Classifying 'Drug-likeness' with Kernel-Based Learning Methods , 2005, J. Chem. Inf. Model..

[8]  Thorsten Meinl,et al.  KNIME - the Konstanz information miner: version 2.0 and beyond , 2009, SKDD.

[9]  Markus Wagener,et al.  Potential Drugs and Nondrugs: Prediction and Identification of Important Structural Features , 2000, J. Chem. Inf. Comput. Sci..

[10]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[11]  Jonathan J. Darrow,et al.  Drug development and FDA approval, 1938-2013. , 2014, The New England journal of medicine.

[12]  Aboul Ella Hassanien,et al.  Classification of toxicity effects of biotransformed hepatic drugs using whale optimized support vector machines , 2017, J. Biomed. Informatics.

[13]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[14]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[15]  James G. Lyons,et al.  Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning , 2015, Scientific Reports.

[16]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[17]  Michael C. Hutter,et al.  Gradual in Silico Filtering for Druglike Substances , 2008, J. Chem. Inf. Model..

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[21]  Weida Tong,et al.  Mold2, Molecular Descriptors from 2D Structures for Chemoinformatics and Toxicoinformatics , 2008, J. Chem. Inf. Model..

[22]  C. Lipinski Lead- and drug-like compounds: the rule-of-five revolution. , 2004, Drug discovery today. Technologies.

[23]  Yoshua Bengio,et al.  Deep Learning for NLP (without Magic) , 2012, ACL.

[24]  Alexander Tropsha,et al.  Trust, But Verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling Research , 2010, J. Chem. Inf. Model..

[25]  John J. Irwin,et al.  ZINC 15 – Ligand Discovery for Everyone , 2015, J. Chem. Inf. Model..

[26]  Andreas Bender,et al.  A Large Descriptor Set and a Probabilistic Kernel-Based Classifier Significantly Improve Druglikeness Classification , 2007, J. Chem. Inf. Model..

[27]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  Ryan G. Coleman,et al.  ZINC: A Free Tool to Discover Chemistry for Biology , 2012, J. Chem. Inf. Model..

[29]  Tingjun Hou,et al.  Drug-likeness analysis of traditional Chinese medicines: prediction of drug-likeness using machine learning approaches. , 2012, Molecular pharmaceutics.

[30]  S. Venkatesh,et al.  Role of the development scientist in compound lead selection and optimization. , 2000, Journal of pharmaceutical sciences.

[31]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[32]  Pierre Baldi,et al.  Deep architectures for protein contact map prediction , 2012, Bioinform..

[33]  CHUN WEI YAP,et al.  PaDEL‐descriptor: An open source software to calculate molecular descriptors and fingerprints , 2011, J. Comput. Chem..

[34]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[35]  Yi Li,et al.  Gene expression inference with deep learning , 2015, bioRxiv.

[36]  David K. Gifford,et al.  Convolutional neural network architectures for predicting DNA–protein binding , 2016, Bioinform..

[37]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[38]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[39]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.