Open Force Field Evaluator: An Automated, Efficient, and Scalable Framework for the Estimation of Physical Properties from Molecular Simulation.

Developing accurate classical force field representations of molecules is key to realizing the full potential of molecular simulations, both as a powerful route to gaining fundamental insights into a broad spectrum of chemical and biological phenomena and for predicting physicochemical and mechanical properties of substances. The Open Force Field Consortium is an industry-funded open science effort to this end, developing open-source tools for rapidly generating new high-quality small-molecule force fields. An integral aspect of this is the parameterization and assessment of force fields against high-quality, condensed-phase physical property data, curated from open data sources such as the NIST ThermoML Archive, alongside quantum chemical data. The quantity of such experimental data in open data archives alone would require an onerous amount of human and computational resources to both curate and estimate manually, especially when estimations must be obtained for numerous sets of force field parameters. Here, we present an entirely automated, highly scalable framework for evaluating physical properties and their gradients in terms of force field parameters. It is written as a modular and extensible Python framework, which employs an intelligent multiscale estimation approach that allows for the automated estimation of properties from simulation and cached simulation data, and a pluggable API for estimation of new properties. In this study, we demonstrate the utility of the framework by benchmarking the OpenFF 1.0.0 small-molecule force field and GAFF 1.8 and GAFF 2.1 force fields against a test set of binary density and enthalpy of mixing measurements curated using the framework utilities. Further, we demonstrate the framework's utility as part of force field optimization by using it alongside ForceBalance, a framework for systematic force field optimization, to retrain a set of nonbonded van der Waals parameters against a training set of density and enthalpy of vaporization measurements.

[1]  Michael R. Shirts,et al.  Development and Benchmarking of Open Force Field v1.0.0-the Parsley Small-Molecule Force Field. , 2020, Journal of chemical theory and computation.

[2]  Teresa Head-Gordon,et al.  Systematic Optimization of Water Models Using Liquid/Vapor Surface Tension Data. , 2019, The journal of physical chemistry. B.

[3]  Pnina Dauber-Osguthorpe,et al.  Biomolecular force fields: where have we been, where are we now, where do we need to go and how do we get there? , 2018, Journal of Computer-Aided Molecular Design.

[4]  Michael R. Shirts,et al.  Configuration-Sampling-Based Surrogate Models for Rapid Parameterization of Non-Bonded Interactions. , 2018, Journal of chemical theory and computation.

[5]  Alexander D. MacKerell,et al.  Optimized Lennard-Jones Parameters for Druglike Small Molecules. , 2018, Journal of Chemical Theory and Computation.

[6]  Katarzyna B. Koziara,et al.  Optimization of Empirical Force Fields by Parameter Space Mapping: A Single-Step Perturbation Approach. , 2017, Journal of chemical theory and computation.

[7]  William L. Jorgensen,et al.  LigParGen web server: an automatic OPLS-AA parameter generator for organic ligands , 2017, Nucleic Acids Res..

[8]  William L. Jorgensen,et al.  1.14*CM1A-LBCC: Localized Bond-Charge Corrected CM1A Charges for Condensed-Phase Simulations. , 2017, The journal of physical chemistry. B.

[9]  Vijay S. Pande,et al.  OpenMM 7: Rapid development of high performance algorithms for molecular dynamics , 2016, bioRxiv.

[10]  David L. Mobley,et al.  FreeSolv: a database of experimental and calculated hydration free energies, with input files , 2014, Journal of Computer-Aided Molecular Design.

[11]  Vijay S Pande,et al.  Building Force Fields: An Automatic, Systematic, and Reproducible Approach. , 2014, The journal of physical chemistry letters.

[12]  Pengyu Y. Ren,et al.  Systematic improvement of a classical molecular model of water. , 2013, The journal of physical chemistry. B.

[13]  Jiahao Chen,et al.  Systematic Parametrization of Polarizable Force Fields from Quantum Chemistry Data. , 2013, Journal of chemical theory and computation.

[14]  Peter A. Williams,et al.  ThermoML: an XML-Based Approach for Storage and Exchange of Experimental and Critically Evaluated Thermophysical and Thermochemical Property Data. 5. Speciation and Complex Equilibria , 2011 .

[15]  Michael R. Shirts,et al.  Statistically optimal analysis of samples from multiple equilibrium states. , 2008, The Journal of chemical physics.

[16]  Xin Wen,et al.  BindingDB: a web-accessible database of experimentally determined protein–ligand binding affinities , 2006, Nucleic Acids Res..

[17]  Hans W. Horn,et al.  Characterization of the TIP4P-Ew water model: vapor pressure and boiling point. , 2005, The Journal of chemical physics.

[18]  Junmei Wang,et al.  Development and testing of a general amber force field , 2004, J. Comput. Chem..

[19]  Greg L. Hura,et al.  Development of an improved four-site water model for biomolecular simulations: TIP4P-Ew. , 2004, The Journal of chemical physics.

[20]  Michael D. Frenkel,et al.  ThermoML -An XML-based approach for storage and exchange of experimental and critically evaluated thermophysical and thermochemical property data. 4. biomaterials , 2003 .

[21]  W. L. Jorgensen,et al.  Development and Testing of the OPLS All-Atom Force Field on Conformational Energetics and Properties of Organic Liquids , 1996 .

[22]  Michael D. Frenkel,et al.  ThermoML-An XML-based approach for storage and exchange of experimental and critically evaluated thermophysical and thermochemical property data. 2. Uncertainties , 2003 .