Generative network complex (GNC) for drug discovery

It remains a challenging task to generate a vast variety of novel compounds with desirable pharmacological properties. In this work, a generative network complex (GNC) is proposed as a new platform for designing novel compounds, predicting their physical and chemical properties, and selecting potential drug candidates that fulfill various druggable criteria such as binding affinity, solubility, partition coefficient, etc. We combine a SMILES string generator, which consists of an encoder, a drug-property controlled or regulated latent space, and a decoder, with verification deep neural networks, a target-specific three-dimensional (3D) pose generator, and mathematical deep learning networks to generate new compounds, predict their drug properties, construct 3D poses associated with target proteins, and reevaluate druggability, respectively. New compounds were generated in the latent space by either randomized output, controlled output, or optimized output. In our demonstration, 2.08 million and 2.8 million novel compounds are generated respectively for Cathepsin S and BACE targets. These new compounds are very different from the seeds and cover a larger chemical space. For potentially active compounds, their 3D poses are generated using a state-of-the-art method. The resulting 3D complexes are further evaluated for druggability by a championing deep learning algorithm based on algebraic topology, differential geometry, and algebraic graph theories. Performed on supercomputers, the whole process took less than one week. Therefore, our GNC is an efficient new paradigm for discovering new drug candidates.

[1]  Andrew T. Fenley,et al.  Binding enthalpy calculations for a neutral host-guest pair yield widely divergent salt effects across water models. , 2015, Journal of chemical theory and computation.

[2]  Guo-Wei Wei,et al.  TopologyNet: Topology based deep convolutional and multi-task neural networks for biomolecular property predictions , 2017, PLoS Comput. Biol..

[3]  Dusanka Janezic,et al.  Graph-Theoretical Matrices in Chemistry , 2015 .

[4]  Kaifu Gao,et al.  MathDL: mathematical deep learning for D3R Grand Challenge 4 , 2019, Journal of Computer-Aided Molecular Design.

[5]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[6]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[7]  P Willett,et al.  Development and validation of a genetic algorithm for flexible docking. , 1997, Journal of molecular biology.

[8]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[9]  Katya Tsaioun,et al.  ADDME – Avoiding Drug Development Mistakes Early: central nervous system drug discovery perspective , 2009, BMC neurology.

[10]  Guo-Wei Wei,et al.  Rigidity Strengthening: A Mechanism for Protein-Ligand Binding , 2017, J. Chem. Inf. Model..

[11]  Huanwang Yang,et al.  D3R Grand Challenge 3: blind prediction of protein–ligand poses and affinity rankings , 2018, Journal of Computer-Aided Molecular Design.

[12]  Maria Laura Bolognesi,et al.  BACE-1 Inhibitors: From Recent Single-Target Molecules to Multitarget Compounds for Alzheimer's Disease. , 2017, Journal of medicinal chemistry.

[13]  Matt J. Kusner,et al.  Grammar Variational Autoencoder , 2017, ICML.

[14]  Arthur J. Olson,et al.  AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading , 2009, J. Comput. Chem..

[15]  James G. Nourse,et al.  Reoptimization of MDL Keys for Use in Drug Discovery , 2002, J. Chem. Inf. Comput. Sci..

[16]  I. Bahar,et al.  Global dynamics of proteins: bridging between structure and function. , 2010, Annual review of biophysics.

[17]  R. Jernigan,et al.  Anisotropy of fluctuation dynamics of proteins with an elastic network model. , 2001, Biophysical journal.

[18]  D. Bojanic,et al.  Impact of high-throughput screening in biomedical research , 2011, Nature Reviews Drug Discovery.

[19]  Zhihai Liu,et al.  Comparative Assessment of Scoring Functions on a Diverse Test Set , 2009, J. Chem. Inf. Model..

[20]  Matthew P. Repasky,et al.  Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. , 2004, Journal of medicinal chemistry.

[21]  G. Wei Differential Geometry Based Multiscale Models , 2010, Bulletin of mathematical biology.

[22]  Kyunghyun Cho,et al.  Conditional molecular design with deep generative models , 2018, J. Chem. Inf. Model..

[23]  Yunjie Zhao,et al.  A Network of Conformational Transitions in the Apo Form of NDM-1 Enzyme Revealed by MD Simulation and a Markov State Model. , 2017, The journal of physical chemistry. B.

[24]  Afra Zomorodian,et al.  Computing Persistent Homology , 2005, Discret. Comput. Geom..

[25]  Danilo P. Mandic,et al.  Recurrent Neural Networks for Prediction: Learning Algorithms, Architectures and Stability , 2001 .

[26]  Guo-Wei Wei,et al.  Quantitative Toxicity Prediction Using Topology Based Multitask Deep Neural Networks , 2017, J. Chem. Inf. Model..

[27]  I M Kapetanovic,et al.  Computer-aided drug discovery and development (CADDD): in silico-chemico-biological approach. , 2008, Chemico-biological interactions.

[28]  Regina Barzilay,et al.  Junction Tree Variational Autoencoder for Molecular Graph Generation , 2018, ICML.

[29]  F. Lombardo,et al.  Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings , 1997 .

[30]  Saeed Alqahtani,et al.  In silico ADME-Tox modeling: progress and prospects , 2017, Expert opinion on drug metabolism & toxicology.

[31]  James P Edwards,et al.  Diazinones as P2 replacements for pyrazole-based cathepsin S inhibitors. , 2010, Bioorganic & medicinal chemistry letters.

[32]  John J. Irwin,et al.  ZINC 15 – Ligand Discovery for Everyone , 2015, J. Chem. Inf. Model..

[33]  Minghui Yang,et al.  Molecular dynamics simulations of the Escherichia coli HPPK apo-enzyme reveal a network of conformational transitions. , 2015, Biochemistry.

[34]  Yiying Tong,et al.  Persistent homology for the quantitative prediction of fullerene stability , 2014, J. Comput. Chem..

[35]  Dragos Horvath,et al.  De Novo Molecular Design by Combining Deep Autoencoder Recurrent Neural Networks with Generative Topographic Mapping , 2019, J. Chem. Inf. Model..

[36]  Yonghui Wu,et al.  Exploring the Limits of Language Modeling , 2016, ArXiv.

[37]  T. Halgren Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94 , 1996, J. Comput. Chem..

[38]  N. Go,et al.  Dynamics of a small globular protein in terms of low-frequency vibrational modes. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Ya Jia,et al.  A Network of Conformational Transitions Revealed by Molecular Dynamics Simulations of the Binary Complex of Escherichia coli 6-Hydroxymethyl-7,8-dihydropterin Pyrophosphokinase with MgATP. , 2016, Biochemistry.

[40]  Guo-Wei Wei,et al.  A topological approach for protein classification , 2015, 1510.00953.

[41]  P. Wong,et al.  The β-Secretase Enzyme BACE in Health and Alzheimer's Disease: Regulation, Cell Biology, Function, and Therapeutic Potential , 2009, The Journal of Neuroscience.

[42]  Mahmud Tareq Hassan Khan,et al.  Predictions of the ADMET properties of candidate drug molecules utilizing different QSAR/QSPR modelling approaches. , 2010, Current drug metabolism.

[43]  R. M. Owen,et al.  An analysis of the attrition of drug candidates from four major pharmaceutical companies , 2015, Nature Reviews Drug Discovery.

[44]  Tommy Liljefors,et al.  Textbook of drug design and discovery , 2016 .

[45]  Gianni De Fabritiis,et al.  Shape-Based Generative Modeling for de Novo Drug Design , 2019, J. Chem. Inf. Model..

[46]  Jos'e Miguel Hern'andez-Lobato,et al.  Constrained Bayesian Optimization for Automatic Chemical Design , 2017 .

[47]  Zhihai Liu,et al.  Comparative Assessment of Scoring Functions on an Updated Benchmark: 2. Evaluation Methods and General Results , 2014, J. Chem. Inf. Model..

[48]  Alex Zhavoronkov,et al.  Applications of Deep Learning in Biomedicine. , 2016, Molecular pharmaceutics.

[49]  Cheng Wang,et al.  Improving scoring‐docking‐screening powers of protein–ligand scoring functions using random forest , 2017, J. Comput. Chem..

[50]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[51]  Károly Héberger,et al.  Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? , 2015, Journal of Cheminformatics.

[52]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[53]  Giulio Rastelli,et al.  Advances and applications of binding affinity prediction methods in drug discovery. , 2012, Biotechnology advances.

[54]  Sergey Nikolenko,et al.  druGAN: An Advanced Generative Adversarial Autoencoder Model for de Novo Generation of New Molecules with Desired Molecular Properties in Silico. , 2017, Molecular pharmaceutics.

[55]  Tom White,et al.  Generative Adversarial Networks: An Overview , 2017, IEEE Signal Processing Magazine.

[56]  Guo-Wei Wei,et al.  AGL-Score: Algebraic Graph Learning Score for Protein-Ligand Binding Scoring, Ranking, Docking, and Screening , 2019, J. Chem. Inf. Model..

[57]  Dejan Plavšić,et al.  Relation between the Wiener Index and the Schultz Index for Several Classes of Chemical Graphs , 1993 .

[58]  Steven Skiena,et al.  Syntax-Directed Variational Autoencoder for Structured Data , 2018, ICLR.

[59]  K V Balakin,et al.  Compound library design for target families. , 2009, Methods in molecular biology.

[60]  Laura Revel,et al.  Overcoming the obstacles in the pharma/biotech industry: 2008 update. , 2009, Drug news & perspectives.

[61]  Thomas Blaschke,et al.  Molecular de-novo design through deep reinforcement learning , 2017, Journal of Cheminformatics.

[62]  Guo-Wei Wei,et al.  Mathematical deep learning for pose and binding affinity prediction and ranking in D3R Grand Challenges , 2018, Journal of Computer-Aided Molecular Design.

[63]  Guo-Wei Wei,et al.  Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening , 2017, PLoS Comput. Biol..

[64]  S. Rees,et al.  Principles of early drug discovery , 2011, British journal of pharmacology.

[65]  Olexandr Isayev,et al.  Deep reinforcement learning for de novo drug design , 2017, Science Advances.

[66]  R. W. Hansen,et al.  Journal of Health Economics , 2016 .

[67]  James P Edwards,et al.  Thioether acetamides as P3 binding elements for tetrahydropyrido-pyrazole cathepsin S inhibitors. , 2010, Bioorganic & medicinal chemistry letters.

[68]  Byunghan Lee,et al.  Deep learning in bioinformatics , 2016, Briefings Bioinform..

[69]  Gisbert Schneider,et al.  Computer-based de novo design of drug-like molecules , 2005, Nature Reviews Drug Discovery.

[70]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[71]  Guo-Wei Wei,et al.  TopP–S: Persistent homology‐based multi‐task deep neural networks for simultaneous predictions of partition coefficient and aqueous solubility , 2017, J. Comput. Chem..

[72]  Guo-Wei Wei,et al.  Multidimensional persistence in biomolecular data , 2014, J. Comput. Chem..