Mathematical deep learning for pose and binding affinity prediction and ranking in D3R Grand Challenges

Advanced mathematics, such as multiscale weighted colored subgraph and element specific persistent homology, and machine learning including deep neural networks were integrated to construct mathematical deep learning models for pose and binding affinity prediction and ranking in the last two D3R Grand Challenges in computer-aided drug design and discovery. D3R Grand Challenge 2 focused on the pose prediction, binding affinity ranking and free energy prediction for Farnesoid X receptor ligands. Our models obtained the top place in absolute free energy prediction for free energy set 1 in stage 2. The latest competition, D3R Grand Challenge 3 (GC3), is considered as the most difficult challenge so far. It has five subchallenges involving Cathepsin S and five other kinase targets, namely VEGFR2, JAK2, p38-α, TIE2, and ABL1. There is a total of 26 official competitive tasks for GC3. Our predictions were ranked 1st in 10 out of these 26 tasks.

[1]  M. Gameiro,et al.  Topological Measurement of Protein Compressibility via Persistence Diagrams , 2012 .

[2]  Ronald M. Levy,et al.  PrimeX and the Schrödinger computational chemistry suite of programs , 2012 .

[3]  Matthew P. Repasky,et al.  Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. , 2004, Journal of medicinal chemistry.

[4]  Gianni De Fabritiis,et al.  DeltaDelta neural networks for lead optimization of small molecule potency† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c9sc04606b , 2019, Chemical science.

[5]  Woody Sherman,et al.  Use of an Induced Fit Receptor Structure in Virtual Screening , 2006, Chemical biology & drug design.

[6]  Richard D. Smith,et al.  Recent improvements to Binding MOAD: a resource for protein–ligand binding affinities and structures , 2014, Nucleic Acids Res..

[7]  Brian K Shoichet,et al.  Prediction of protein-ligand interactions. Docking and scoring: successes and gaps. , 2006, Journal of medicinal chemistry.

[8]  Zhihai Liu,et al.  Forging the Basis for Developing Protein-Ligand Interaction Scoring Functions. , 2017, Accounts of chemical research.

[9]  Guo-Wei Wei,et al.  Machine intelligence design of 2019-nCoV drugs , 2020, bioRxiv.

[10]  Guo-Wei Wei,et al.  Multiscale weighted colored graphs for protein flexibility and rigidity analysis. , 2018, The Journal of chemical physics.

[11]  SHENG-YOU HUANG,et al.  An iterative knowledge‐based scoring function to predict protein–ligand interactions: I. Derivation of interaction potentials , 2006, J. Comput. Chem..

[12]  Guo-Wei Wei,et al.  TopP–S: Persistent homology‐based multi‐task deep neural networks for simultaneous predictions of partition coefficient and aqueous solubility , 2017, J. Comput. Chem..

[13]  Guo-Wei Wei,et al.  Integration of element specific persistent homology and machine learning for protein‐ligand binding affinity prediction , 2018, International journal for numerical methods in biomedical engineering.

[14]  Kelin Xia,et al.  Weighted persistent homology for osmolyte molecular aggregation and hydrogen-bonding network analysis , 2019, Scientific Reports.

[15]  Afra Zomorodian,et al.  Computing Persistent Homology , 2004, SCG '04.

[16]  R. Friesner,et al.  New insights about HERG blockade obtained from protein modeling, potential energy mapping, and docking studies. , 2006, Bioorganic & medicinal chemistry.

[17]  Duc Duy Nguyen,et al.  Are 2D fingerprints still valuable for drug discovery? , 2020, Physical chemistry chemical physics : PCCP.

[18]  Anthony Dekker,et al.  Conceptual Distance in Social Network Analysis , 2005, J. Soc. Struct..

[19]  Pengfei Li,et al.  KECSA-Movable Type Implicit Solvation Model (KMTISM) , 2014, Journal of chemical theory and computation.

[20]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[21]  Kwong-Sak Leung,et al.  Substituting random forest for multiple linear regression improves binding affinity prediction of scoring functions: Cyscore as a case study , 2014, BMC Bioinformatics.

[22]  G. Klebe,et al.  DrugScore(CSD)-knowledge-based scoring function derived from small molecule crystal data with superior recognition rate of near-native ligand poses and better affinity prediction. , 2005, Journal of medicinal chemistry.

[23]  R. Friesner,et al.  Novel procedure for modeling ligand/receptor induced fit effects. , 2006, Journal of medicinal chemistry.

[24]  Z. Xiang,et al.  On the role of the crystal environment in determining protein side-chain conformations. , 2002, Journal of molecular biology.

[25]  Herbert Edelsbrunner,et al.  Topological Persistence and Simplification , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[26]  G. V. Paolini,et al.  Empirical scoring functions: I. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes , 1997, J. Comput. Aided Mol. Des..

[27]  Axel Drefahl,et al.  CurlySMILES: a chemical language to customize and annotate encodings of molecular and nanodevice structures , 2011, J. Cheminformatics.

[28]  Arthur J. Olson,et al.  AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading , 2009, J. Comput. Chem..

[29]  David E. Shaw,et al.  PHASE: a new engine for pharmacophore perception, 3D QSAR model development, and 3D database screening: 1. Methodology and preliminary results , 2006, J. Comput. Aided Mol. Des..

[30]  Guo-Wei Wei,et al.  A review of mathematical representations of biomolecular data. , 2019, Physical chemistry chemical physics : PCCP.

[31]  Yuri Dabaghian,et al.  A Topological Paradigm for Hippocampal Spatial Map Formation Using Persistent Homology , 2012, PLoS Comput. Biol..

[32]  Y. Martin,et al.  A general and fast scoring function for protein-ligand interactions: a simplified potential approach. , 1999, Journal of medicinal chemistry.

[33]  Herbert Edelsbrunner,et al.  Weighted alpha shapes , 1992 .

[34]  Kelin Xia,et al.  Persistent topology for cryo‐EM data analysis , 2014, International journal for numerical methods in biomedical engineering.

[35]  Renxiao Wang,et al.  Comparative evaluation of 11 scoring functions for molecular docking. , 2003, Journal of medicinal chemistry.

[36]  P Willett,et al.  Development and validation of a genetic algorithm for flexible docking. , 1997, Journal of molecular biology.

[37]  David S. Goodsell,et al.  AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility , 2009, J. Comput. Chem..

[38]  Leonidas J. Guibas,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm250 Structural bioinformatics Persistent voids: a new structural metric for membrane fusion , 2022 .

[39]  Guo-Wei Wei,et al.  Rigidity Strengthening: A Mechanism for Protein-Ligand Binding , 2017, J. Chem. Inf. Model..

[40]  Jiuyong Li,et al.  Identifying miRNAs, targets and functions , 2012, Briefings Bioinform..

[41]  Kelin Xia,et al.  Persistent homology analysis of protein structure, flexibility, and folding , 2014, International journal for numerical methods in biomedical engineering.

[42]  Hege S. Beard,et al.  Glide: a new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening. , 2004, Journal of medicinal chemistry.

[43]  R. Kroemer Structure-based drug design: docking and scoring. , 2007, Current protein & peptide science.

[44]  Carlos J. Camacho,et al.  Optimal strategies for virtual screening of induced-fit and flexible target in the 2015 D3R Grand Challenge , 2016, Journal of Computer-Aided Molecular Design.

[45]  Luhua Lai,et al.  Further development and validation of empirical scoring functions for structure-based binding affinity prediction , 2002, J. Comput. Aided Mol. Des..

[46]  Fedor N. Novikov,et al.  CSAR Scoring Challenge Reveals the Need for New Concepts in Estimating Protein-Ligand Binding Affinity , 2011, J. Chem. Inf. Model..

[47]  Yiying Tong,et al.  Persistent homology for the quantitative prediction of fullerene stability , 2014, J. Comput. Chem..

[48]  R. Wade,et al.  Prediction of drug binding affinities by comparative binding energy analysis. , 1997, Journal of medicinal chemistry.

[49]  Noel M. O'Boyle,et al.  De novo design of molecular wires with optimal properties for solar energy conversion , 2011, Journal of Cheminformatics.

[50]  Gennady M Verkhivker,et al.  Empirical free energy calculations of ligand-protein crystallographic complexes. I. Knowledge-based ligand-protein interaction potentials applied to the prediction of human immunodeficiency virus 1 protease binding affinity. , 1995, Protein engineering.

[51]  Guo-Wei Wei,et al.  Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening , 2017, PLoS Comput. Biol..

[52]  Woody Sherman,et al.  Protein and ligand preparation: parameters, protocols, and influence on virtual screening enrichments , 2013, Journal of Computer-Aided Molecular Design.

[53]  Yiying Tong,et al.  ESES: Software for Eulerian solvent excluded surface , 2017, J. Comput. Chem..

[54]  Steven L Dixon,et al.  PHASE: A Novel Approach to Pharmacophore Modeling and 3D Database Searching , 2006, Chemical biology & drug design.

[55]  Stephen P. Borgatti,et al.  Centrality and network flow , 2005, Soc. Networks.

[56]  Jie Liu,et al.  Classification of Current Scoring Functions , 2015, J. Chem. Inf. Model..

[57]  Guo-Wei Wei,et al.  Quantitative Toxicity Prediction Using Topology Based Multitask Deep Neural Networks , 2017, J. Chem. Inf. Model..

[58]  B. Honig,et al.  A hierarchical approach to all‐atom protein loop prediction , 2004, Proteins.

[59]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[60]  Gerhard Klebe,et al.  Non-additivity of functional group contributions in protein-ligand binding: a comprehensive study by crystallography and isothermal titration calorimetry. , 2010, Journal of molecular biology.

[61]  L. Freeman Centrality in social networks conceptual clarification , 1978 .

[62]  Guo-Wei Wei,et al.  A topological approach for protein classification , 2015, 1510.00953.

[63]  Patrizio Frosini,et al.  Size theory as a topological tool for computer vision , 1999 .

[64]  Guo-Wei Wei,et al.  Analysis and prediction of protein folding energy changes upon mutation by element specific persistent homology , 2017, Bioinform..

[65]  Guo-Wei Wei,et al.  Object-oriented persistent homology , 2016, J. Comput. Phys..

[66]  Guo-Wei Wei,et al.  TopologyNet: Topology based deep convolutional and multi-task neural networks for biomolecular property predictions , 2017, PLoS Comput. Biol..

[67]  Nikolay V. Dokholyan,et al.  MedusaScore: An Accurate Force Field-Based Scoring Function for Virtual Drug Screening , 2008, J. Chem. Inf. Model..

[68]  Alex Bavelas,et al.  Communication Patterns in Task‐Oriented Groups , 1950 .

[69]  Yaohang Li,et al.  Template-based C8-SCORPION: a protein 8-state secondary structure prediction method using structural information and context-based features , 2014, BMC Bioinformatics.