DG‐GL: Differential geometry‐based geometric learning of molecular datasets

MOTIVATION Despite its great success in various physical modeling, differential geometry (DG) has rarely been devised as a versatile tool for analyzing large, diverse, and complex molecular and biomolecular datasets because of the limited understanding of its potential power in dimensionality reduction and its ability to encode essential chemical and biological information in differentiable manifolds. RESULTS We put forward a differential geometry-based geometric learning (DG-GL) hypothesis that the intrinsic physics of three-dimensional (3D) molecular structures lies on a family of low-dimensional manifolds embedded in a high-dimensional data space. We encode crucial chemical, physical, and biological information into 2D element interactive manifolds, extracted from a high-dimensional structural data space via a multiscale discrete-to-continuum mapping using differentiable density estimators. Differential geometry apparatuses are utilized to construct element interactive curvatures in analytical forms for certain analytically differentiable density estimators. These low-dimensional differential geometry representations are paired with a robust machine learning algorithm to showcase their descriptive and predictive powers for large, diverse, and complex molecular and biomolecular datasets. Extensive numerical experiments are carried out to demonstrate that the proposed DG-GL strategy outperforms other advanced methods in the predictions of drug discovery-related protein-ligand binding affinity, drug toxicity, and molecular solvation free energy. AVAILABILITY AND IMPLEMENTATION http://weilab.math.msu.edu/DG-GL/ Contact: wei@math.msu.edu.

[1]  Yachen Lin,et al.  Geometric Data Analysis: An Empirical Approach to Dimensionality Reduction and the Study of Patterns , 2002, Technometrics.

[2]  R. Daudel,et al.  Quantum Theory of Chemical Reactivity , 1973 .

[3]  Guo-Wei Wei,et al.  Integration of element specific persistent homology and machine learning for protein‐ligand binding affinity prediction , 2018, International journal for numerical methods in biomedical engineering.

[4]  J. Andrew Grant,et al.  A smooth permittivity function for Poisson–Boltzmann solvation methods , 2001, J. Comput. Chem..

[5]  Shan Zhao,et al.  The minimal molecular surface , 2006, q-bio/0610038.

[6]  Alexander Golbraikh,et al.  Rational selection of training and test sets for the development of validated QSAR models , 2003, J. Comput. Aided Mol. Des..

[7]  B. Lee,et al.  The interpretation of protein structures: estimation of static accessibility. , 1971, Journal of molecular biology.

[8]  Guo-Wei Wei,et al.  Breaking the polar‐nonpolar division in solvation free energy prediction , 2018, J. Comput. Chem..

[9]  Jie Liu,et al.  Classification of Current Scoring Functions , 2015, J. Chem. Inf. Model..

[10]  Guo-Wei Wei,et al.  Feature functional theory–binding predictor (FFT–BP) for the blind prediction of binding free energies , 2017, Theoretical Chemistry Accounts.

[11]  Guo-Wei Wei,et al.  Geometric and electrostatic modeling using molecular rigidity functions , 2017, J. Comput. Appl. Math..

[12]  Guo-Wei Wei,et al.  The impact of surface area, volume, curvature, and Lennard–Jones potential to solvation modeling , 2016, J. Comput. Chem..

[13]  Guo-Wei Wei,et al.  Multiscale weighted colored graphs for protein flexibility and rigidity analysis. , 2018, The Journal of chemical physics.

[14]  Zhide Hu,et al.  Prediction of fungicidal activities of rice blast disease based on least-squares support vector machines and project pursuit regression. , 2008, Journal of agricultural and food chemistry.

[15]  Kelin Xia,et al.  Communication: Capturing protein multiscale thermal fluctuations. , 2015, The Journal of chemical physics.

[16]  Igor V. Tetko,et al.  Combinatorial QSAR Modeling of Chemical Toxicants Tested against Tetrahymena pyriformis , 2008, J. Chem. Inf. Model..

[17]  G. Deschamps Electromagnetics and differential forms , 1981, Proceedings of the IEEE.

[18]  Linus Pauling,et al.  Molecular Models of Amino Acids, Peptides, and Proteins , 1953 .

[19]  Guo-Wei Wei,et al.  Multiscale Multiphysics and Multidomain Models I: Basic Theory. , 2013, Journal of theoretical & computational chemistry.

[20]  G. Wei Differential Geometry Based Multiscale Models , 2010, Bulletin of mathematical biology.

[21]  Kelin Xia,et al.  Generalized flexibility-rigidity index. , 2016, The Journal of chemical physics.

[22]  Kelin Xia,et al.  A review of geometric, topological and graph theory apparatuses for the modeling and analysis of biomolecular data , 2016, 1612.01735.

[23]  Andrea J. van Doorn,et al.  Surface shape and curvature scales , 1992, Image Vis. Comput..

[24]  Guo-Wei Wei Wavelets generated by using discrete singular convolution kernels , 2000 .

[25]  Guo-Wei Wei,et al.  Differential geometry based solvation model. III. Quantum formulation. , 2011, The Journal of chemical physics.

[26]  Guo-Wei Wei,et al.  Multiresolution persistent homology for excessively large biomolecular datasets. , 2015, The Journal of chemical physics.

[27]  Kwong-Sak Leung,et al.  Improving AutoDock Vina Using Random Forest: The Growing Accuracy of Binding Affinity Prediction by the Effective Exploitation of Larger Data Sets , 2015, Molecular informatics.

[28]  Guo-Wei Wei,et al.  Parameterization of a geometric flow implicit solvation model , 2013, J. Comput. Chem..

[29]  M. Iwata,et al.  Stereospecific Construction of Chiral Quaternary Carbon Compounds from Chiral Secondary Alcohol Derivatives , 2003 .

[30]  Jian Jun Tan,et al.  Investigating interactions between HIV-1 gp41 and inhibitors by molecular dynamics simulation and MM–PBSA/GBSA calculations , 2006 .

[31]  Gregory W. Kauffman,et al.  QSAR and k-Nearest Neighbor Classification Analysis of Selective Cyclooxygenase-2 Inhibitors Using Topologically-Based Numerical Descriptors , 2001, J. Chem. Inf. Comput. Sci..

[32]  W. Kühnel Differential Geometry: Curves - Surfaces - Manifolds , 2002 .

[33]  G. Wei,et al.  Molecular multiresolution surfaces , 2005, math-ph/0511001.

[34]  Kwong-Sak Leung,et al.  Low-Quality Structural and Interaction Data Improves Binding Affinity Prediction via Random Forest , 2015, Molecules.

[35]  Yiying Tong,et al.  Multiscale geometric modeling of macromolecules II: Lagrangian representation , 2013, J. Comput. Chem..

[36]  J. Andrew McCammon,et al.  Computation of electrostatic forces on solvated molecules using the Poisson-Boltzmann equation , 1993 .

[37]  W. L. Koltun,et al.  Precision space‐filling atomic models , 1965, Biopolymers.

[38]  Malcolm E. Davis,et al.  Electrostatics in biomolecular structure and dynamics , 1990 .

[39]  Bao Wang,et al.  Parameter optimization in differential geometry based solvation models. , 2015, The Journal of chemical physics.

[40]  Gershon Elber,et al.  Global segmentation and curvature analysis of volumetric data sets using trivariate B-spline functions , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  F M Richards,et al.  Areas, volumes, packing and protein structure. , 1977, Annual review of biophysics and bioengineering.

[42]  Marta M. Stepniewska-Dziubinska,et al.  Development and evaluation of a deep learning model for protein-ligand binding affinity prediction , 2017, 1712.07042.

[43]  Barry Honig,et al.  Calculating total electrostatic energies with the nonlinear Poisson-Boltzmann equation , 1990 .

[44]  Guo-Wei Wei,et al.  Rigidity Strengthening: A Mechanism for Protein-Ligand Binding , 2017, J. Chem. Inf. Model..

[45]  Zhide Hu,et al.  Quantitative structure activity relationship model for predicting the depletion percentage of skin allergic chemical substances of glutathione. , 2007, Analytica chimica acta.

[46]  Guo-Wei Wei,et al.  Quantum dynamics in continuum for proton transport--generalized correlation. , 2012, The Journal of chemical physics.

[47]  J A McCammon,et al.  Coupling hydrophobicity, dispersion, and electrostatics in continuum solvent models. , 2005, Physical review letters.

[48]  TongYiying,et al.  Multiscale geometric modeling of macromolecules I , 2014 .

[49]  Gianni De Fabritiis,et al.  KDEEP: Protein-Ligand Absolute Binding Affinity Prediction via 3D-Convolutional Neural Networks , 2018, J. Chem. Inf. Model..

[50]  Zhihai Liu,et al.  Comparative Assessment of Scoring Functions on a Diverse Test Set , 2009, J. Chem. Inf. Model..

[51]  Omar Deeb,et al.  In silico quantitative structure toxicity relationship of chemical compounds: some case studies. , 2012, Current drug safety.

[52]  Y N Vorobjev,et al.  SIMS: computation of a smooth invariant molecular surface. , 1997, Biophysical journal.

[53]  Guo-Wei Wei,et al.  Quantum Dynamics in Continuum for Proton Transport I: Basic Formulation. , 2013, Communications in computational physics.

[54]  C. Cramer,et al.  Implicit Solvation Models: Equilibria, Structure, Spectra, and Dynamics. , 1999, Chemical reviews.

[55]  T W Schultz,et al.  Structure-toxicity relationships for selected halogenated aliphatic chemicals. , 1999, Environmental toxicology and pharmacology.

[56]  Alejandro Heredia-Langner,et al.  Origin of parameter degeneracy and molecular shape relationships in geometric-flow calculations of solvation free energies. , 2013, The Journal of chemical physics.

[57]  Guo-Wei Wei,et al.  Quantitative Toxicity Prediction Using Topology Based Multitask Deep Neural Networks , 2017, J. Chem. Inf. Model..

[58]  Kelin Xia,et al.  Multiscale multiphysics and multidomain models--flexibility and rigidity. , 2013, The Journal of chemical physics.

[59]  Nathan A. Baker,et al.  Differential geometry based solvation model I: Eulerian formulation , 2010, J. Comput. Phys..

[60]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[61]  Lin Li,et al.  On the Dielectric “Constant” of Proteins: Smooth Dielectric Function for Macromolecular Modeling and Its Implementation in DelPhi , 2013, Journal of chemical theory and computation.

[62]  Guo-Wei Wei,et al.  Quantum dynamics in continuum for proton transport II: Variational solvent–solute interface , 2012, International journal for numerical methods in biomedical engineering.

[63]  Shan Zhao,et al.  Minimal molecular surfaces and their applications , 2008, J. Comput. Chem..

[64]  Ye Mei,et al.  Predicting hydration free energies with a hybrid QM/MM approach: an evaluation of implicit and explicit solvation models in SAMPL4 , 2014, Journal of Computer-Aided Molecular Design.

[65]  R. Daudel Basis of the Quantum Theory of the Chemical Reactivity of Molecules , 1973 .

[66]  Kelin Xia,et al.  Fast and anisotropic flexibility-rigidity index for protein flexibility and fluctuation analysis. , 2014, The Journal of chemical physics.

[67]  J A Grant,et al.  The Gaussian Generalized Born model: application to small molecules. , 2007, Physical chemistry chemical physics : PCCP.

[68]  Lin Li,et al.  pKa predictions for proteins, RNAs, and DNAs with the Gaussian dielectric function using DelPhi pKa , 2015, Proteins.

[69]  Maria João Ramos,et al.  Prediction of Solvation Free Energies with Thermodynamic Integration Using the General Amber Force Field. , 2014, Journal of chemical theory and computation.

[70]  Yiying Tong,et al.  Multiscale geometric modeling of macromolecules I: Cartesian representation , 2014, J. Comput. Phys..

[71]  Yun Hee Jang,et al.  Poisson–Boltzmann Continuum Solvation Models for Nonaqueous Solvents I. 1-Octanol , 2003 .

[72]  Guo-Wei Wei,et al.  TopologyNet: Topology based deep convolutional and multi-task neural networks for biomolecular property predictions , 2017, PLoS Comput. Biol..

[73]  Shan Zhao,et al.  Variational approach for nonpolar solvation analysis. , 2012, The Journal of chemical physics.

[74]  Guo-Wei Wei,et al.  Analysis and prediction of protein folding energy changes upon mutation by element specific persistent homology , 2017, Bioinform..

[75]  Y. Tong,et al.  Geometric modeling of subcellular structures, organelles, and multiprotein complexes , 2012, International journal for numerical methods in biomedical engineering.

[76]  Sudhir A. Kulkarni,et al.  Three-Dimensional QSAR Using the k-Nearest Neighbor Method and Its Interpretation , 2006, J. Chem. Inf. Model..

[77]  John B. O. Mitchell,et al.  A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking , 2010, Bioinform..

[78]  Kwong-Sak Leung,et al.  istar: A Web Platform for Large-Scale Protein-Ligand Docking , 2014, PloS one.

[79]  Guo-Wei Wei,et al.  Variational Multiscale Models for Charge Transport , 2012, SIAM Rev..

[80]  Michael L. Connolly,et al.  Depth-buffer algorithms for molecular modelling , 1985 .

[81]  M. Sanner,et al.  Reduced surface: an efficient way to compute molecular surfaces. , 1996, Biopolymers.

[82]  Michael Gleicher,et al.  Multi-Scale Surface Descriptors , 2009, IEEE Transactions on Visualization and Computer Graphics.

[83]  Zhihai Liu,et al.  Comparative Assessment of Scoring Functions on an Updated Benchmark: 2. Evaluation Methods and General Results , 2014, J. Chem. Inf. Model..

[84]  P. Kollman,et al.  Solvation Model Based on Weighted Solvent Accessible Surface Area , 2001 .

[85]  B. Honig,et al.  Classical electrostatics in biology and chemistry. , 1995, Science.

[86]  Zhan Chen,et al.  Differential geometry based solvation model II: Lagrangian formulation , 2011, Journal of mathematical biology.

[87]  Lin-Li Li,et al.  ID-Score: A New Empirical Scoring Function Based on a Comprehensive Set of Descriptors Related to Protein-Ligand Interactions , 2013, J. Chem. Inf. Model..