Machine-learned molecular mechanics force field for the simulation of protein-ligand systems and beyond

The development of reliable and extensible molecular mechanics (MM) force fields -- fast, empirical models characterizing the potential energy surface of molecular systems -- is indispensable for biomolecular simulation and computer-aided drug design. Here, we introduce a generalized and extensible machine-learned MM force field, \texttt{espaloma-0.3}, and an end-to-end differentiable framework using graph neural networks to overcome the limitations of traditional rule-based methods. Trained in a single GPU-day to fit a large and diverse quantum chemical dataset of over 1.1M energy and force calculations, \texttt{espaloma-0.3} reproduces quantum chemical energetic properties of chemical domains highly relevant to drug discovery, including small molecules, peptides, and nucleic acids. Moreover, this force field maintains the quantum chemical energy-minimized geometries of small molecules and preserves the condensed phase properties of peptides, self-consistently parametrizing proteins and ligands to produce stable simulations leading to highly accurate predictions of binding free energies. This methodology demonstrates significant promise as a path forward for systematically building more accurate force fields that are easily extensible to new chemical domains of interest.

[1]  Michael R. Shirts,et al.  Development and Benchmarking of Open Force Field 2.0.0: The Sage Small Molecule Force Field , 2023, Journal of chemical theory and computation.

[2]  J. Chodera,et al.  EspalomaCharge: Machine learning-enabled ultra-fast partial charge assignment , 2023, The journal of physical chemistry. A.

[3]  Michael R. Shirts,et al.  Development and benchmarking of an open, self-consistent force field for proteins and small molecules from the open force field initiative. , 2023, Biophysical Journal.

[4]  Chaitanya K. Joshi,et al.  On the Expressive Power of Geometric Graph Neural Networks , 2023, ICML.

[5]  David L. Dotson,et al.  Open Force Field BespokeFit: Automating Bespoke Torsion Parametrization at Scale , 2022, J. Chem. Inf. Model..

[6]  Benjamin A. Shoemaker,et al.  PubChem 2023 update , 2022, Nucleic Acids Res..

[7]  Benjamin P. Pritchard,et al.  SPICE, A Dataset of Drug-like Molecules and Peptides for Training Machine Learning Potentials , 2022, Scientific Data.

[8]  John E. Herr,et al.  End-to-end differentiable construction of molecular mechanics force fields , 2022, Chemical science.

[9]  Michael R. Shirts,et al.  Open Force Field Evaluator: An Automated, Efficient, and Scalable Framework for the Estimation of Physical Properties from Molecular Simulation. , 2022, Journal of chemical theory and computation.

[10]  Michael R. Shirts,et al.  Improving Force Field Accuracy by Training against Condensed-Phase Mixture Properties. , 2021, Journal of chemical theory and computation.

[11]  Reed M. Stein,et al.  A practical guide to large-scale docking , 2021, Nature Protocols.

[12]  Jeff Wagner,et al.  openforcefield/openff-forcefields: Version 2.0.0 "Sage" , 2021 .

[13]  Xiao Xiang Zhu,et al.  A survey of uncertainty in deep neural networks , 2021, Artificial Intelligence Review.

[14]  Ryan S. DeFever,et al.  Machine Learning Directed Optimization of Classical Molecular Modeling Force Fields , 2021, J. Chem. Inf. Model..

[15]  Andrew G. Taube,et al.  Quantum chemical benchmark databases of gold-standard dimer interaction energies , 2021, Scientific data.

[16]  Rafael Gómez-Bombarelli,et al.  Differentiable sampling of molecular geometries with uncertainty-based adversarial attacks , 2021, Nature Communications.

[17]  Michael R. Shirts,et al.  Development and Benchmarking of Open Force Field v1.0.0-the Parsley Small-Molecule Force Field. , 2020, Journal of chemical theory and computation.

[18]  Junmei Wang,et al.  A fast and high-quality charge model for the next generation general AMBER force field. , 2020, The Journal of chemical physics.

[19]  Michael R. Shirts,et al.  Best Practices for Alchemical Free Energy Calculations , 2020, 2008.03067.

[20]  Victoria T. Lim,et al.  Benchmark assessment of molecular geometries and energies from small molecule force fields , 2020, F1000Research.

[21]  G. Bussi,et al.  Pressure control using stochastic cell rescaling. , 2020, The Journal of chemical physics.

[22]  Regina Barzilay,et al.  Uncertainty Quantification Using Neural Networks for Molecular Property Prediction , 2020, J. Chem. Inf. Model..

[23]  Levi N. Naden,et al.  The MolSSI QCArchive project: An open‐source platform to compute, organize, and share quantum chemistry data , 2020, WIREs Computational Molecular Science.

[24]  Zachary L Glick,et al.  Psi4 1.4: Open-source software for high-throughput quantum chemistry. , 2020, The Journal of chemical physics.

[25]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[26]  Yuanqing Wang,et al.  Graph Nets for Partial Charge Prediction , 2019, ArXiv.

[27]  G. Karypis,et al.  Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks. , 2019 .

[28]  Huafeng Xu Optimal Measurement Network of Pairwise Differences , 2019, J. Chem. Inf. Model..

[29]  M. Tuckerman,et al.  A Unified Efficient Thermostat Schemefor the Canonical Ensemblewith Holonomic or Isokinetic Constraints via Molecular Dynamics. , 2019, The journal of physical chemistry. A.

[30]  C. Chipot,et al.  Affordable Membrane Permeability Calculations: Permeation of Short-Chain Alcohols through Pure-Lipid Bilayers and a Mammalian Cell Membrane. , 2019, Journal of chemical theory and computation.

[31]  Kilian Q. Weinberger,et al.  Simplifying Graph Convolutional Networks , 2019, ICML.

[32]  Viki Kumar Prasad,et al.  PEPCONF, a diverse data set of peptide conformational energies , 2019, Scientific Data.

[33]  Arnold T. Hagler,et al.  Force field development phase II: Relaxation of physics-based criteria… or inclusion of more rigorous physics into the representation of molecular energetics , 2018, Journal of Computer-Aided Molecular Design.

[34]  Pnina Dauber-Osguthorpe,et al.  Biomolecular force fields: where have we been, where are we now, where do we need to go and how do we get there? , 2018, Journal of Computer-Aided Molecular Design.

[35]  Michael R. Shirts,et al.  Escaping Atom Types in Force Fields Using Direct Chemical Perception. , 2018, Journal of chemical theory and computation.

[36]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[37]  Vinayak A. Rao,et al.  Janossy Pooling: Learning Deep Permutation-Invariant Functions for Variable-Size Inputs , 2018, ICLR.

[38]  Jessica B. Hamrick,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[39]  Adrian E. Roitberg,et al.  Less is more: sampling chemical space with active learning , 2018, The Journal of chemical physics.

[40]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[41]  Soummya Kar,et al.  Topology adaptive graph convolutional networks , 2017, ArXiv.

[42]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[43]  Kyle A. Beauchamp,et al.  Building a More Predictive Protein Force Field: A Systematic and Reproducible Route to AMBER-FB15. , 2017, The journal of physical chemistry. B.

[44]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[45]  Vijay S. Pande,et al.  OpenMM 7: Rapid development of high performance algorithms for molecular dynamics , 2016, bioRxiv.

[46]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[47]  Matthew P Jacobson,et al.  Exhaustive Conformational Sampling of Complex Fused Ring Macrocycles Using Inverse Kinematics. , 2016, Journal of chemical theory and computation.

[48]  Craig L. Zirbel,et al.  The RNA 3D Motif Atlas: Computational methods for extraction, organization and evaluation of RNA motifs. , 2016, Methods.

[49]  James C. Robertson,et al.  Assessing the Current State of Amber Force Field Modifications for DNA , 2016, Journal of chemical theory and computation.

[50]  Benedict Leimkuhler,et al.  Efficient molecular dynamics using geodesic integration and solvent–solute splitting , 2016, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[51]  Jennifer L. Knight,et al.  OPLS3: A Force Field Providing Broad Coverage of Drug-like Small Molecules and Proteins. , 2016, Journal of chemical theory and computation.

[52]  J. Šponer,et al.  Refinement of the Sugar-Phosphate Backbone Torsion Beta for AMBER Force Fields Improves the Description of Z- and B-DNA. , 2015, Journal of chemical theory and computation.

[53]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[54]  C. Simmerling,et al.  ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB. , 2015, Journal of chemical theory and computation.

[55]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[56]  Pengfei Li,et al.  Parameterization of Highly Charged Metal Ions Using the 12-6-4 LJ-Type Nonbonded Model in Explicit Water , 2014, The journal of physical chemistry. B.

[57]  Saeed Izadi,et al.  Building Water Models: A Different Approach , 2014, The journal of physical chemistry letters.

[58]  Vijay S Pande,et al.  Building Force Fields: An Automatic, Systematic, and Reproducible Approach. , 2014, The journal of physical chemistry letters.

[59]  Pengfei Li,et al.  Taking into Account the Ion-induced Dipole Interaction in the Nonbonded Model of Ions. , 2014, Journal of chemical theory and computation.

[60]  George A. Khoury,et al.  Forcefield_PTM: Ab Initio Charge and AMBER Forcefield Parameters for Frequently Occurring Post-Translational Modifications. , 2013, Journal of chemical theory and computation.

[61]  John D. Westbrook,et al.  The Nucleic Acid Database: new features and capabilities , 2013, Nucleic Acids Res..

[62]  Adam R. Johnson,et al.  Lead identification of novel and selective TYK2 inhibitors. , 2013, European journal of medicinal chemistry.

[63]  Pengfei Li,et al.  Rational Design of Particle Mesh Ewald Compatible Lennard-Jones Parameters for +2 Metal Cations in Explicit Solvent. , 2013, Journal of chemical theory and computation.

[64]  Brian A. Chauder,et al.  Discovery of potent myeloid cell leukemia 1 (Mcl-1) inhibitors using fragment-based methods and structure-based design. , 2013, Journal of medicinal chemistry.

[65]  Matthias Rarey,et al.  TFD: Torsion Fingerprints As a New Measure To Compare Small Molecule Conformations , 2012, J. Chem. Inf. Model..

[66]  J. Šponer,et al.  Refinement of the Cornell et al. Nucleic Acids Force Field Based on Reference Quantum Chemical Calculations of Glycosidic Torsion Profiles , 2011, Journal of chemical theory and computation.

[67]  Michael R. Shirts,et al.  Replica exchange and expanded ensemble simulations as Gibbs sampling: simple improvements for enhanced mixing. , 2011, The Journal of chemical physics.

[68]  S. Bryant,et al.  PubChem as a public resource for drug discovery. , 2010, Drug discovery today.

[69]  J. Prestegard,et al.  Presentation of membrane-anchored glycosphingolipids determined from molecular dynamics simulations and NMR paramagnetic relaxation rate enhancement. , 2010, Journal of the American Chemical Society.

[70]  Thomas E. Cheatham,et al.  Molecular Dynamics Simulations of the Dynamic and Energetic Properties of Alkali and Halide Ions Using Water-Model-Specific Ion Parameters , 2009, The journal of physical chemistry. B.

[71]  M. DeMarco,et al.  Atomic-resolution conformational analysis of the GM3 ganglioside in a lipid bilayer and its implications for ganglioside-protein recognition at membrane surfaces. , 2008, Glycobiology.

[72]  T. Cheatham,et al.  Determination of Alkali and Halide Monovalent Ion Parameters for Use in Explicitly Solvated Biomolecular Simulations , 2008, The journal of physical chemistry. B.

[73]  Karl Nicholas Kirschner,et al.  GLYCAM06: A generalizable biomolecular force field. Carbohydrates , 2008, J. Comput. Chem..

[74]  Michael R. Shirts,et al.  Statistically optimal analysis of samples from multiple equilibrium states. , 2008, The Journal of chemical physics.

[75]  John SantaLucia,et al.  AMBER Force Field Parameters for the Naturally Occurring Modified Nucleosides in RNA. , 2007, Journal of chemical theory and computation.

[76]  P. Kollman,et al.  Automatic atom type and bond type perception in molecular mechanical calculations. , 2006, Journal of molecular graphics & modelling.

[77]  Junmei Wang,et al.  Development and testing of a general amber force field , 2004, J. Comput. Chem..

[78]  Greg L. Hura,et al.  Development of an improved four-site water model for biomolecular simulations: TIP4P-Ew. , 2004, The Journal of chemical physics.

[79]  Michael K. Gilson,et al.  Fast Assignment of Accurate Partial Atomic Charges: An Electronegativity Equalization Method that Accounts for Alternate Resonance Forms , 2003, J. Chem. Inf. Comput. Sci..

[80]  Christopher I. Bayly,et al.  Fast, efficient generation of high‐quality atomic charges. AM1‐BCC model: II. Parameterization and validation , 2002, J. Comput. Chem..

[81]  Jane A. Endicott,et al.  Structure-based design of a potent purine-based cyclin-dependent kinase inhibitor , 2002, Nature Structural Biology.

[82]  T. Halgren Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94 , 1996, J. Comput. Chem..

[83]  W. L. Jorgensen,et al.  Comparison of simple potential functions for simulating liquid water , 1983 .

[84]  Michael R. Shirts,et al.  Best Practices for Alchemical Free Energy Calculations [Article v1.0]. , 2020, Living journal of computational molecular science.

[85]  David L. Mobley,et al.  Best Practices for Foundations in Molecular Simulations [Article v1.0]. , 2019, Living journal of computational molecular science.

[86]  T. Schlick Molecular Modeling and Simulation: An Interdisciplinary Guide , 2003 .

[87]  C. Bayly,et al.  Fast, efficient generation of high-quality atomic charges. AM1-BCC model: I. Method , 2000, J. Comput. Chem..

[88]  J. Crabbe,et al.  Molecular modelling: Principles and applications , 1997 .