Machine Learning Estimates of Natural Product Conformational Energies

Machine learning has been used for estimation of potential energy surfaces to speed up molecular dynamics simulations of small systems. We demonstrate that this approach is feasible for significantly larger, structurally complex molecules, taking the natural product Archazolid A, a potent inhibitor of vacuolar-type ATPase, from the myxobacterium Archangium gephyra as an example. Our model estimates energies of new conformations by exploiting information from previous calculations via Gaussian process regression. Predictive variance is used to assess whether a conformation is in the interpolation region, allowing a controlled trade-off between prediction accuracy and computational speed-up. For energies of relaxed conformations at the density functional level of theory (implicit solvent, DFT/BLYP-disp3/def2-TZVP), mean absolute errors of less than 1 kcal/mol were achieved. The study demonstrates that predictive machine learning models can be developed for structurally complex, pharmaceutically relevant compounds, potentially enabling considerable speed-ups in simulations of larger molecular structures.

[1]  C. Farés,et al.  Simultaneous determination of the conformation and relative configuration of archazolide a by using nuclear overhauser effects, J couplings, and residual dipolar couplings. , 2008, Angewandte Chemie.

[2]  Risi Kondor,et al.  Publisher's Note: On representing chemical environments , 2013 .

[3]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[4]  R. Kondor,et al.  On representing chemical environments , 2012, 1209.3140.

[5]  Douglas B Kell,et al.  Optimal construction of a fast and accurate polarisable water potential based on multipole moments trained by machine learning. , 2009, Physical chemistry chemical physics : PCCP.

[6]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[7]  Paul L. A. Popelier,et al.  Polarisable multipolar electrostatics from the machine learning method Kriging: an application to alanine , 2012, Theoretical Chemistry Accounts.

[8]  Joachim M. Buhmann,et al.  On Relevant Dimensions in Kernel Feature Spaces , 2008, J. Mach. Learn. Res..

[9]  Holger Gohlke,et al.  Understanding the Inhibitory Effect of Highly Potent and Selective Archazolides Binding to the Vacuolar ATPase , 2012, J. Chem. Inf. Model..

[10]  Gerta Rücker,et al.  y-Randomization and Its Variants in QSPR/QSAR , 2007, J. Chem. Inf. Model..

[11]  Andreas Zell,et al.  Locating Biologically Active Compounds in Medium-Sized Heterogeneous Datasets by Topological Autocorrelation Vectors: Dopamine and Benzodiazepine Agonists , 1996, J. Chem. Inf. Comput. Sci..

[12]  Diogo A. R. S. Latino,et al.  Approach to Potential Energy Surfaces by Neural Networks. A Review of Recent Work , 2010 .

[13]  Christopher K. I. Williams Prediction with Gaussian Processes: From Linear Regression to Linear Prediction and Beyond , 1999, Learning in Graphical Models.

[14]  Peter Gmeiner,et al.  Rational Molecular Design and EPC Synthesis of a Type VIβ-Turn Inducing Peptide Mimetic , 2001 .

[15]  Klaus-Robert Müller,et al.  Introduction to machine learning for brain imaging , 2011, NeuroImage.

[16]  P. Popelier,et al.  Potential energy surfaces fitted by artificial neural networks. , 2010, The journal of physical chemistry. A.

[17]  H. Steinmetz,et al.  Archazolid and apicularen: Novel specific V-ATPase inhibitors , 2005, BMC Biochemistry.

[18]  D. W. Noid,et al.  Potential energy surfaces for macromolecules. A neural network technique , 1992 .

[19]  J. Behler Neural network potential-energy surfaces in chemistry: a tool for large-scale simulations. , 2011, Physical chemistry chemical physics : PCCP.

[20]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[21]  Eamonn F. Healy,et al.  Development and use of quantum mechanical molecular models. 76. AM1: a new general purpose quantum mechanical molecular model , 1985 .

[22]  R. Kondor,et al.  Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons. , 2009, Physical review letters.

[23]  Michele Parrinello,et al.  A self-learning algorithm for biased molecular dynamics , 2010, Proceedings of the National Academy of Sciences.

[24]  Kunal Roy,et al.  On some aspects of validation of predictive quantitative structure–activity relationship models , 2007, Expert opinion on drug discovery.

[25]  C. Farés,et al.  Stereochemical determination of Archazolid A and B, highly potent vacuolar-type ATPase inhibitors from the Myxobacterium Archangium gephyra. , 2006, Organic letters.

[26]  S. Grimme,et al.  A consistent and accurate ab initio parametrization of density functional dispersion correction (DFT-D) for the 94 elements H-Pu. , 2010, The Journal of chemical physics.

[27]  H. Wieczorek,et al.  Design, synthesis, and biological evaluation of novel analogues of archazolid: a highly potent simplified V-ATPase inhibitor. , 2007, Bioorganic & medicinal chemistry letters.

[28]  H. Wieczorek,et al.  Inhibitors of V-ATPases: old and new players , 2009, Journal of Experimental Biology.

[29]  Benoît Roux,et al.  AUTOMATED FORCE FIELD PARAMETERIZATION FOR NON-POLARIZABLE AND POLARIZABLE ATOMIC MODELS BASED ON AB INITIO TARGET DATA. , 2013, Journal of chemical theory and computation.

[30]  M C Payne,et al.  "Learn on the fly": a hybrid classical and quantum-mechanical molecular dynamics simulation. , 2004, Physical review letters.

[31]  H. Lanig,et al.  Molecular building kit of fused-proline-derived peptide mimetics allowing specific adjustment of the dihedral Psi angle. , 2007, The Journal of organic chemistry.

[32]  Michael J Davis,et al.  Bi-fidelity fitting and optimization. , 2012, The Journal of chemical physics.

[33]  K. Tai Conformational sampling for the impatient. , 2004, Biophysical chemistry.

[34]  K. Müller,et al.  Fast and accurate modeling of molecular atomization energies with machine learning. , 2011, Physical review letters.

[35]  Stefan Grimme,et al.  Comparison of the performance of dispersion-corrected density functional theory for weak hydrogen bonds. , 2011, Physical chemistry chemical physics : PCCP.

[36]  R. Venkataraghavan,et al.  Atom pairs as molecular features in structure-activity studies: definition and applications , 1985, J. Chem. Inf. Comput. Sci..

[37]  H. Reichenbach,et al.  Archazolids, new cytotoxic macrolactones from Archangium gephyra (Myxobacteria). Production, isolation, physico-chemical and biological properties. , 2003, The Journal of antibiotics.

[38]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[39]  Marvin Johnson,et al.  Concepts and applications of molecular similarity , 1990 .

[40]  Klaus-Robert Müller,et al.  Finding Density Functionals with Machine Learning , 2011, Physical review letters.

[41]  Wolfgang Guba,et al.  Neighborhood-preserving visualization of adaptive structure-activity landscapes: application to drug discovery. , 2011, Angewandte Chemie.

[42]  G. Torrie,et al.  Nonphysical sampling distributions in Monte Carlo free-energy estimation: Umbrella sampling , 1977 .

[43]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[44]  Petra Schneider,et al.  Distance phenomena in high‐dimensional chemical descriptor spaces: Consequences for similarity‐based approaches , 2009, J. Comput. Chem..

[45]  J. Topliss,et al.  Chance correlations in structure-activity studies using multiple regression analysis , 1972 .

[46]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[47]  P. Popelier,et al.  Intramolecular polarisable multipolar electrostatics from the machine learning method Kriging , 2011 .

[48]  G. S. Watson,et al.  Smooth regression analysis , 1964 .

[49]  David W. Scott,et al.  Multivariate Density Estimation: Theory, Practice, and Visualization , 1992, Wiley Series in Probability and Statistics.

[50]  Jacob D. Durrant,et al.  Molecular dynamics simulations and drug discovery , 2011, BMC Biology.

[51]  Ichiro Yamato,et al.  Structure of the Rotor of the V-Type Na+-ATPase from Enterococcus hirae , 2005, Science.

[52]  R. Brereton,et al.  Handbook of chemoinformatics: from data to knowledge, edited by Johann Gasteiger, Volumes 1–4. Wiley‐VCH, Weinheim, 2003, ISBN 3527306803, €485 , 2004 .

[53]  Hans W. Horn,et al.  ELECTRONIC STRUCTURE CALCULATIONS ON WORKSTATION COMPUTERS: THE PROGRAM SYSTEM TURBOMOLE , 1989 .

[54]  E. Nadaraya On Estimating Regression , 1964 .

[55]  Klaus-Robert Müller,et al.  Assessment and Validation of Machine Learning Methods for Predicting Molecular Atomization Energies. , 2013, Journal of chemical theory and computation.

[56]  Klaus-Robert Müller,et al.  Optimizing transition states via kernel-based machine learning. , 2012, The Journal of chemical physics.

[57]  Klaus-Robert Müller,et al.  Machine learning models for lipophilicity and their domain of applicability. , 2007, Molecular pharmaceutics.