Development and Benchmarking of Open Force Field 2.0.0: The Sage Small Molecule Force Field

We introduce the Open Force Field (OpenFF) 2.0.0 small molecule force field for drug-like molecules, code-named Sage, which builds upon our previous iteration, Parsley. OpenFF force fields are based on direct chemical perception, which generalizes easily to highly diverse sets of chemistries based on substructure queries. Like the previous OpenFF iterations, the Sage generation of OpenFF force fields was validated in protein-ligand simulations to be compatible with AMBER biopolymer force fields. In this work, we detail the methodology used to develop this force field, as well as the innovations and improvements introduced since the release of Parsley 1.0.0. One particularly significant feature of Sage is a set of improved Lennard-Jones (LJ) parameters retrained against condensed phase mixture data, the first refit of LJ parameters in the OpenFF small molecule force field line. Sage also includes valence parameters refit to a larger database of quantum chemical calculations than previous versions, as well as improvements in how this fitting is performed. Force field benchmarks show improvements in general metrics of performance against quantum chemistry reference data such as root-mean-square deviations (RMSD) of optimized conformer geometries, torsion fingerprint deviations (TFD), and improved relative conformer energetics (ΔΔE). We present a variety of benchmarks for these metrics against our previous force fields as well as in some cases other small molecule force fields. Sage also demonstrates improved performance in estimating physical properties, including comparison against experimental data from various thermodynamic databases for small molecule properties such as ΔHmix, ρ(x), ΔGsolv, and ΔGtrans. Additionally, we benchmarked against protein-ligand binding free energies (ΔGbind), where Sage yields results statistically similar to previous force fields. All the data is made publicly available along with complete details on how to reproduce the training results at https://github.com/openforcefield/openff-sage.

[1]  David L. Dotson,et al.  Collaborative Assessment of Molecular Geometries and Energies from the Open Force Field , 2022, J. Chem. Inf. Model..

[2]  David L. Dotson,et al.  Open Force Field BespokeFit: Automating Bespoke Torsion Parametrization at Scale , 2022, J. Chem. Inf. Model..

[3]  Asuka A. Orr,et al.  Preserving the Integrity of Empirical Force Fields , 2022, J. Chem. Inf. Model..

[4]  D. Riccardi,et al.  Towards improved FAIRness of the ThermoML Archive , 2022, J. Comput. Chem..

[5]  David F. Hahn,et al.  Pre-Exascale Computing of Protein–Ligand Binding Free Energies with Open Source Software for Drug Design , 2022, J. Chem. Inf. Model..

[6]  Michael R. Shirts,et al.  Improving Force Field Accuracy by Training against Condensed-Phase Mixture Properties. , 2021, Journal of chemical theory and computation.

[7]  John Chodera,et al.  Open Force Field Evaluator: An Automated, Efficient, and Scalable Framework for the Estimation of Physical Properties from Molecular Simulation. , 2021, Journal of chemical theory and computation.

[8]  David F. Hahn,et al.  Best practices for constructing, preparing, and evaluating protein-ligand binding affinity benchmarks [Article v0.1]. , 2021, Living journal of computational molecular science.

[9]  Zachary L Glick,et al.  Quantum Chemistry Common Driver and Databases (QCDB) and Quantum Chemistry Engine (QCEngine): Automation and interoperability among computational chemistry programs. , 2021, The Journal of chemical physics.

[10]  Gregory A Ross,et al.  OPLS4: Improving Force Field Accuracy on Challenging Regimes of Chemical Space. , 2021, Journal of chemical theory and computation.

[11]  Michael R. Shirts,et al.  Development and Benchmarking of Open Force Field v1.0.0-the Parsley Small-Molecule Force Field. , 2020, Journal of chemical theory and computation.

[12]  Junmei Wang,et al.  A fast and high-quality charge model for the next generation general AMBER force field. , 2020, The Journal of chemical physics.

[13]  Victoria T. Lim,et al.  Benchmark assessment of molecular geometries and energies from small molecule force fields , 2020, F1000Research.

[14]  Daniel G. A. Smith,et al.  Driving torsion scans with wavefront propagation. , 2020, The Journal of chemical physics.

[15]  Levi N. Naden,et al.  The MolSSI QCArchive project: An open‐source platform to compute, organize, and share quantum chemistry data , 2020, WIREs Computational Molecular Science.

[16]  Joel Nothman,et al.  SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.

[17]  Teresa Head-Gordon,et al.  Systematic Optimization of Water Models Using Liquid/Vapor Surface Tension Data. , 2019, The journal of physical chemistry. B.

[18]  Robert Abel,et al.  OPLS3e: Extending Force Field Coverage for Drug-Like Small Molecules. , 2019, Journal of chemical theory and computation.

[19]  Michael R. Shirts,et al.  Understanding the Nanoscale Structure of Inverted Hexagonal Phase Lyotropic Liquid Crystal Polymer Membranes. , 2018, The journal of physical chemistry. B.

[20]  David L Mobley,et al.  Escaping Atom Types in Force Fields Using Direct Chemical Perception. , 2018, Journal of chemical theory and computation.

[21]  Rommie E. Amaro,et al.  Ensemble Docking in Drug Discovery. , 2018, Biophysical journal.

[22]  L. Rulíšek,et al.  Toward Accurate Conformational Energies of Smaller Peptides and Medium-Sized Macrocycles: MPCONF196 Benchmark Energy Data Set. , 2018, Journal of chemical theory and computation.

[23]  Bryce K. Allen,et al.  Relative Binding Free Energy Calculations in Drug Discovery: Recent Advances and Practical Considerations , 2017, J. Chem. Inf. Model..

[24]  Hans-Peter Kriegel,et al.  DBSCAN Revisited, Revisited , 2017, ACM Trans. Database Syst..

[25]  Joseph Gomes,et al.  Building a More Predictive Protein Force Field: A Systematic and Reproducible Route to AMBER-FB15. , 2017, The journal of physical chemistry. B.

[26]  Vijay S. Pande,et al.  OpenMM 7: Rapid development of high performance algorithms for molecular dynamics , 2016, bioRxiv.

[27]  Lee-Ping Wang,et al.  Geometry optimization made simple with translation and rotation coordinates. , 2016, The Journal of chemical physics.

[28]  John D Chodera,et al.  A Simple Method for Automated Equilibration Detection in Molecular Simulations. , 2016, Journal of chemical theory and computation.

[29]  Alexander D. MacKerell,et al.  An Empirical Polarizable Force Field Based on the Classical Drude Oscillator Model: Development History and Recent Applications , 2016, Chemical reviews.

[30]  Jennifer L. Knight,et al.  OPLS3: A Force Field Providing Broad Coverage of Drug-like Small Molecules and Proteins. , 2016, Journal of chemical theory and computation.

[31]  Amir Karton,et al.  Benchmark ab Initio Conformational Energies for the Proteinogenic Amino Acids through Explicitly Correlated Methods. Assessment of Density Functional Methods. , 2016, Journal of chemical theory and computation.

[32]  Thomas J Lane,et al.  MDTraj: a modern, open library for the analysis of molecular dynamics trajectories , 2014, bioRxiv.

[33]  C. Simmerling,et al.  ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB. , 2015, Journal of chemical theory and computation.

[34]  W. L. Jorgensen,et al.  Improved Peptide and Protein Torsional Energetics with the OPLS-AA Force Field , 2015, Journal of chemical theory and computation.

[35]  Julian Tirado-Rives,et al.  Molecular dynamics and Monte Carlo simulations for protein-ligand binding and inhibitor design. , 2015, Biochimica et biophysica acta.

[36]  Bert L. de Groot,et al.  pmx: Automated protein structure and topology generation for alchemical perturbations , 2014, J. Comput. Chem..

[37]  David L. Mobley,et al.  FreeSolv: a database of experimental and calculated hydration free energies, with input files , 2014, Journal of Computer-Aided Molecular Design.

[38]  Vijay S Pande,et al.  Building Force Fields: An Automatic, Systematic, and Reproducible Approach. , 2014, The journal of physical chemistry letters.

[39]  Alexander D. MacKerell,et al.  Force Field for Peptides and Proteins based on the Classical Drude Oscillator. , 2013, Journal of chemical theory and computation.

[40]  Kai Wang,et al.  Identifying ligand binding sites and poses using GPU-accelerated Hamiltonian replica exchange molecular dynamics , 2013, Journal of Computer-Aided Molecular Design.

[41]  Lee-Ping Wang,et al.  Systematic Parametrization of Polarizable Force Fields from Quantum Chemistry Data. , 2013, Journal of chemical theory and computation.

[42]  Robert V. Swift,et al.  Rational Prediction with Molecular Dynamics for Hit Identification , 2012, Current topics in medicinal chemistry.

[43]  Matthew L. Leininger,et al.  Psi4: an open‐source ab initio electronic structure program , 2012 .

[44]  Matthias Rarey,et al.  TFD: Torsion Fingerprints As a New Measure To Compare Small Molecule Conformations , 2012, J. Chem. Inf. Model..

[45]  R. Dror,et al.  Systematic Validation of Protein Force Fields against Experimental Data , 2012, PloS one.

[46]  Giulio Rastelli,et al.  Advances and applications of binding affinity prediction methods in drug discovery. , 2012, Biotechnology advances.

[47]  Michael R Shirts,et al.  Identifying low variance pathways for free energy calculations of molecular transformations in solution phase. , 2011, The Journal of chemical physics.

[48]  K. Lindorff-Larsen,et al.  How robust are protein folding simulations with respect to force field parameterization? , 2011, Biophysical journal.

[49]  Stefan Grimme,et al.  Effect of the damping function in dispersion corrected density functional theory , 2011, J. Comput. Chem..

[50]  Andreas P. Eichenberger,et al.  Definition and testing of the GROMOS force-field versions 54A7 and 54B7 , 2011, European Biophysics Journal.

[51]  Peter A. Williams,et al.  ThermoML: an XML-Based Approach for Storage and Exchange of Experimental and Critically Evaluated Thermophysical and Thermochemical Property Data. 5. Speciation and Complex Equilibria , 2011 .

[52]  Bert L de Groot,et al.  Protein thermostability calculations using alchemical free energy simulations. , 2010, Biophysical journal.

[53]  S. Grimme,et al.  A consistent and accurate ab initio parametrization of density functional dispersion correction (DFT-D) for the 94 elements H-Pu. , 2010, The Journal of chemical physics.

[54]  R. Dror,et al.  Improved side-chain torsion potentials for the Amber ff99SB protein force field , 2010, Proteins.

[55]  Margaret E. Johnson,et al.  Current status of the AMOEBA polarizable force field. , 2010, The journal of physical chemistry. B.

[56]  E. Lindahl,et al.  Implementation of the CHARMM Force Field in GROMACS: Analysis of Protein Stability Effects from Correction Maps, Virtual Interaction Sites, and Water Models. , 2010, Journal of chemical theory and computation.

[57]  Alexander D. MacKerell,et al.  CHARMM general force field: A force field for drug‐like molecules compatible with the CHARMM all‐atom additive biological force fields , 2009, J. Comput. Chem..

[58]  José Mario Martínez,et al.  PACKMOL: A package for building initial configurations for molecular dynamics simulations , 2009, J. Comput. Chem..

[59]  G. Hummer,et al.  Optimized molecular dynamics force fields applied to the helix-coil transition of polypeptides. , 2009, The journal of physical chemistry. B.

[60]  Katie A. Maerzke,et al.  TraPPE-UA force field for acrylates and Monte Carlo simulations for their mixtures with alkanes and alcohols. , 2009, The journal of physical chemistry. B.

[61]  W. Pitt,et al.  Heteroaromatic rings of the future. , 2009, Journal of medicinal chemistry.

[62]  K. Schulten,et al.  Molecular dynamics simulations of membrane channels and transporters. , 2009, Current opinion in structural biology.

[63]  T. Cheatham,et al.  Determination of Alkali and Halide Monovalent Ion Parameters for Use in Explicitly Solvated Biomolecular Simulations , 2008, The journal of physical chemistry. B.

[64]  D. Tieleman,et al.  The MARTINI force field: coarse grained model for biomolecular simulations. , 2007, The journal of physical chemistry. B.

[65]  David L Mobley,et al.  Comparison of charge models for fixed-charge force fields: small-molecule hydration free energies in explicit solvent. , 2007, The journal of physical chemistry. B.

[66]  V. Hornak,et al.  Comparison of multiple Amber force fields and development of improved protein backbone parameters , 2006, Proteins.

[67]  Chris Oostenbrink,et al.  A biomolecular force field based on the free enthalpy of hydration and solvation: The GROMOS force‐field parameter sets 53A5 and 53A6 , 2004, J. Comput. Chem..

[68]  Alexander D. MacKerell,et al.  Extending the treatment of backbone energetics in protein force fields: Limitations of gas‐phase quantum mechanics in reproducing protein conformational distributions in molecular dynamics simulations , 2004, J. Comput. Chem..

[69]  Junmei Wang,et al.  Development and testing of a general amber force field , 2004, J. Comput. Chem..

[70]  William Swope,et al.  Describing Protein Folding Kinetics by Molecular Dynamics Simulations. 1. Theory , 2004 .

[71]  Michael D. Frenkel,et al.  ThermoML†An XML-Based Approach for Storage and Exchange of Experimental and Critically Evaluated Thermophysical and Thermochemical Property Data. 3. Critically Evaluated Data, Predicted Data, and Equation Representation‡ , 2004 .

[72]  P. Kirkpatrick,et al.  Chemical space , 2004, Nature.

[73]  Michael D. Frenkel,et al.  ThermoML -An XML-based approach for storage and exchange of experimental and critically evaluated thermophysical and thermochemical property data. 4. biomaterials , 2003 .

[74]  Peter Ertl,et al.  Cheminformatics Analysis of Organic Substituents: Identification of the Most Common Substituents, Calculation of Substituent Properties, and Automatic Identification of Drug-like Bioisosteric Groups , 2003, J. Chem. Inf. Comput. Sci..

[75]  Michael D. Frenkel,et al.  ThermoML-An XML-based approach for storage and exchange of experimental and critically evaluated thermophysical and thermochemical property data. 2. Uncertainties , 2003 .

[76]  Christopher I. Bayly,et al.  Fast, efficient generation of high‐quality atomic charges. AM1‐BCC model: II. Parameterization and validation , 2002, J. Comput. Chem..

[77]  K. Sanbonmatsu,et al.  α-Helical stabilization by side chain shielding of backbone hydrogen bonds , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[78]  Wilfred F. van Gunsteren,et al.  An improved GROMOS96 force field for aliphatic hydrocarbons in the condensed phase , 2001, J. Comput. Chem..

[79]  P. Kollman,et al.  A modified version of the Cornell et al. force field with improved sugar pucker phases and helical repeat. , 1999, Journal of biomolecular structure & dynamics.

[80]  Alexander D. MacKerell,et al.  All-atom empirical potential for molecular modeling and dynamics studies of proteins. , 1998, The journal of physical chemistry. B.

[81]  W. L. Jorgensen,et al.  Development and Testing of the OPLS All-Atom Force Field on Conformational Energetics and Properties of Organic Liquids , 1996 .

[82]  A. Becke Density-functional thermochemistry. III. The role of exact exchange , 1993 .

[83]  Dennis R. Salahub,et al.  Optimization of Gaussian-type basis sets for local spin density functional calculations. Part I. Boron through neon, optimization technique and validation , 1992 .

[84]  W. L. Jorgensen,et al.  The OPLS [optimized potentials for liquid simulations] potential functions for proteins, energy minimizations for crystals of cyclic peptides and crambin. , 1988, Journal of the American Chemical Society.

[85]  E. Baker,et al.  Hydrogen bonding in globular proteins. , 1984, Progress in biophysics and molecular biology.

[86]  W. L. Jorgensen,et al.  Comparison of simple potential functions for simulating liquid water , 1983 .