Escaping Atom Types in Force Fields Using Direct Chemical Perception.

Traditional approaches to specifying a molecular mechanics force field encode all the information needed to assign force field parameters to a given molecule into a discrete set of atom types. This is equivalent to a representation consisting of a molecular graph comprising a set of vertices, which represent atoms labeled by atom type, and unlabeled edges, which represent chemical bonds. Bond stretch, angle bend, and dihedral parameters are then assigned by looking up bonded pairs, triplets, and quartets of atom types in parameter tables to assign valence terms and using the atom types themselves to assign nonbonded parameters. This approach, which we call indirect chemical perception because it operates on the intermediate graph of atom-typed nodes, creates a number of technical problems. For example, atom types must be sufficiently complex to encode all necessary information about the molecular environment, making it difficult to extend force fields encoded this way. Atom typing also results in a proliferation of redundant parameters applied to chemically equivalent classes of valence terms, needlessly increasing force field complexity. Here, we describe a new approach to assigning force field parameters via direct chemical perception. Rather than working through the intermediary of the atom-typed graph, direct chemical perception operates directly on the unmodified chemical graph of the molecule to assign parameters. In particular, parameters are assigned to each type of force field term (e.g., bond stretch, angle bend, torsion, and Lennard-Jones) based on standard chemical substructure queries implemented via the industry-standard SMARTS chemical perception language, using SMIRKS extensions that permit labeling of specific atoms within a chemical pattern. We use this to implement a new force field format, called the SMIRKS Native Open Force Field (SMIRNOFF) format. We demonstrate the power and generality of this approach using examples of specific molecules that pose problems for indirect chemical perception and construct and validate a minimalist yet very general force field, SMIRNOFF99Frosst. We find that a parameter definition file only ∼300 lines long provides coverage of all but <0.02% of a 5 million molecule drug-like test set. Despite its simplicity, the accuracy of SMIRNOFF99Frosst for small molecule hydration free energies and selected properties of pure organic liquids is similar to that of the General Amber Force Field, whose specification requires thousands of parameters. This force field provides a starting point for further optimization and refitting work to follow.

[1]  Jérôme Hert,et al.  Prospective Evaluation of Free Energy Calculations for the Prioritization of Cathepsin L Inhibitors. , 2017, Journal of medicinal chemistry.

[2]  V. Hornak,et al.  Comparison of multiple Amber force fields and development of improved protein backbone parameters , 2006, Proteins.

[3]  Robert Abel,et al.  Advancing Drug Discovery through Enhanced Free Energy Calculations. , 2017, Accounts of chemical research.

[4]  Thomas Fox,et al.  Accuracy Assessment and Automation of Free Energy Calculations for Drug Design , 2014, J. Chem. Inf. Model..

[5]  P. Kollman,et al.  A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules , 1995 .

[6]  David L. Mobley,et al.  A Fixed-Charge Model for Alcohol Polarization in the Condensed Phase, and Its Role in Small Molecule Hydration , 2014, The journal of physical chemistry. B.

[7]  David L. Mobley,et al.  Blind prediction of cyclohexane–water distribution coefficients from the SAMPL5 challenge , 2016, Journal of Computer-Aided Molecular Design.

[8]  Zhe Shen,et al.  Hierarchical atom type definitions and extensible all‐atom force fields , 2016, J. Comput. Chem..

[9]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[10]  W. L. Jorgensen,et al.  Comparison of simple potential functions for simulating liquid water , 1983 .

[11]  K. Dill,et al.  Blind prediction of charged ligand binding affinities in a model binding site. , 2013, Journal of molecular biology.

[12]  Christophe Chipot,et al.  Free Energy Calculations , 2008 .

[13]  Steve Plimpton,et al.  Fast parallel algorithms for short-range molecular dynamics , 1993 .

[14]  Junmei Wang,et al.  How well does a restrained electrostatic potential (RESP) model perform in calculating conformational energies of organic and biological molecules? , 2000, J. Comput. Chem..

[15]  David L Mobley,et al.  Predicting ligand binding affinity with alchemical free energy methods in a polar model binding site. , 2009, Journal of molecular biology.

[16]  David S. Wishart,et al.  DrugBank 4.0: shedding new light on drug metabolism , 2013, Nucleic Acids Res..

[17]  Berk Hess,et al.  GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers , 2015 .

[18]  David L Mobley,et al.  Small molecule hydration free energies in explicit solvent: An extensive test of fixed-charge atomistic simulations. , 2009, Journal of chemical theory and computation.

[19]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[20]  Vijay S Pande,et al.  Building Force Fields: An Automatic, Systematic, and Reproducible Approach. , 2014, The journal of physical chemistry letters.

[21]  Christopher J. Fennell,et al.  Predicting water-to-cyclohexane partitioning of the SAMPL5 molecules using dielectric balancing of force fields , 2016, Journal of Computer-Aided Molecular Design.

[22]  Clare McCabe,et al.  Perfluoropolyethers: Development of an All-Atom Force Field for Molecular Simulations and Validation with New Experimental Vapor Pressures and Liquid Densities. , 2017, The journal of physical chemistry. B.

[23]  David L Mobley,et al.  Predictions of hydration free energies from all-atom molecular dynamics simulations. , 2009, The journal of physical chemistry. B.

[24]  Alejandro Strachan,et al.  Molecular scale simulations on thermoset polymers: A review , 2015 .

[25]  Vijay S. Pande,et al.  OpenMM 7: Rapid development of high performance algorithms for molecular dynamics , 2016, bioRxiv.

[26]  Laxmikant V. Kalé,et al.  Scalable molecular dynamics with NAMD , 2005, J. Comput. Chem..

[27]  David L Mobley,et al.  Accurate and efficient corrections for missing dispersion interactions in molecular simulations. , 2007, The journal of physical chemistry. B.

[28]  Jay W Ponder,et al.  Revised Parameters for the AMOEBA Polarizable Atomic Multipole Water Model. , 2015, The journal of physical chemistry. B.

[29]  Arthur J. Olson,et al.  Distinguishing Binders from False Positives by Free Energy Calculations: Fragment Screening Against the Flap Site of HIV Protease , 2014, The journal of physical chemistry. B.

[30]  Pengyu Y. Ren,et al.  Systematic improvement of a classical molecular model of water. , 2013, The journal of physical chemistry. B.

[31]  Pascal T. Merz,et al.  A GROMOS-Compatible Force Field for Small Organic Molecules in the Condensed Phase: The 2016H66 Parameter Set. , 2016, Journal of chemical theory and computation.

[32]  Anthony Nicholls,et al.  Conformer Generation with OMEGA: Learning from the Data Set and the Analysis of Failures , 2012, J. Chem. Inf. Model..

[33]  Federico D. Sacerdoti,et al.  Scalable Algorithms for Molecular Dynamics Simulations on Commodity Clusters , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[34]  K. Dill,et al.  Predicting absolute ligand binding free energies to a simple model site. , 2007, Journal of molecular biology.

[35]  Vijay S. Pande,et al.  MoleculeNet: a benchmark for molecular machine learning , 2017, Chemical science.

[36]  C. Simmerling,et al.  ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB. , 2015, Journal of chemical theory and computation.

[37]  David S. Wishart,et al.  DrugBank: a knowledgebase for drugs, drug actions and drug targets , 2007, Nucleic Acids Res..

[38]  Jianpeng Ma,et al.  CHARMM: The biomolecular simulation program , 2009, J. Comput. Chem..

[39]  P. Kollman,et al.  Automatic atom type and bond type perception in molecular mechanical calculations. , 2006, Journal of molecular graphics & modelling.

[40]  Michael R. Shirts,et al.  Statistically optimal analysis of samples from multiple equilibrium states. , 2008, The Journal of chemical physics.

[41]  Robert M. Dirks,et al.  RNA force field with accuracy comparable to state-of-the-art protein force fields , 2018, Proceedings of the National Academy of Sciences.

[42]  Jian Yin,et al.  Overview of the SAMPL5 host–guest challenge: Are we doing better? , 2016, Journal of Computer-Aided Molecular Design.

[43]  David L. Mobley,et al.  Predicting hydration free energies using all-atom molecular dynamics simulations and multiple starting conformations , 2010, J. Comput. Aided Mol. Des..

[44]  Matthew T. Geballe,et al.  The SAMPL3 blind prediction challenge: transfer energy overview , 2012, Journal of Computer-Aided Molecular Design.

[45]  Brian K. Shoichet,et al.  ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[46]  Charles L. Brooks,et al.  Assessing the quality of absolute hydration free energies among CHARMM‐compatible ligand parameterization schemes , 2013, J. Comput. Chem..

[47]  W. L. Jorgensen,et al.  Development and Testing of the OPLS All-Atom Force Field on Conformational Energetics and Properties of Organic Liquids , 1996 .

[48]  David L. Mobley,et al.  Alchemical prediction of hydration free energies for SAMPL , 2012, Journal of Computer-Aided Molecular Design.

[49]  Stefano Piana,et al.  Assessing the accuracy of physical models used in protein-folding simulations: quantitative evidence from long molecular dynamics simulations. , 2014, Current opinion in structural biology.

[50]  Ryan G. Coleman,et al.  ZINC: A Free Tool to Discover Chemistry for Biology , 2012, J. Chem. Inf. Model..

[51]  Carl Caleman,et al.  Force Field Benchmark of Organic Liquids: Density, Enthalpy of Vaporization, Heat Capacities, Surface Tension, Isothermal Compressibility, Volumetric Expansion Coefficient, and Dielectric Constant , 2011, Journal of chemical theory and computation.

[52]  Holger Gohlke,et al.  The Amber biomolecular simulation programs , 2005, J. Comput. Chem..

[53]  Bogdan I. Iorga,et al.  Prediction of cyclohexane-water distribution coefficients for the SAMPL5 data set using molecular dynamics simulations with the OPLS-AA force field , 2016, Journal of Computer-Aided Molecular Design.

[54]  A. Pohorille,et al.  Free energy calculations : theory and applications in chemistry and biology , 2007 .

[55]  Woody Sherman,et al.  Predicting the Effect of Amino Acid Single-Point Mutations on Protein Stability-Large-Scale Validation of MD-Based Relative Free Energy Calculations. , 2017, Journal of molecular biology.

[56]  Andrew T. Fenley,et al.  Computational Calorimetry: High-Precision Calculation of Host–Guest Binding Thermodynamics , 2015, Journal of chemical theory and computation.

[57]  Joseph Gomes,et al.  Building a More Predictive Protein Force Field: A Systematic and Reproducible Route to AMBER-FB15. , 2017, The journal of physical chemistry. B.

[58]  Kai Wang,et al.  Identifying ligand binding sites and poses using GPU-accelerated Hamiltonian replica exchange molecular dynamics , 2013, Journal of Computer-Aided Molecular Design.

[59]  Benjamin A. Ellingson,et al.  Conformer Generation with OMEGA: Algorithm and Validation Using High Quality Structures from the Protein Databank and Cambridge Structural Database , 2010, J. Chem. Inf. Model..

[60]  Atli Thorarensen,et al.  Imidazotriazines: Spleen Tyrosine Kinase (Syk) Inhibitors Identified by Free‐Energy Perturbation (FEP) , 2016, ChemMedChem.

[61]  Jennifer L. Knight,et al.  OPLS3: A Force Field Providing Broad Coverage of Drug-like Small Molecules and Proteins. , 2016, Journal of chemical theory and computation.

[62]  Clara D. Christ Binding affinity prediction from molecular simulations: A new standard method in structure-based drug design? , 2016 .

[63]  Jian Yin,et al.  Lessons learned from comparing molecular dynamics engines on the SAMPL5 dataset , 2016, bioRxiv.

[64]  Hannes H. Loeffler,et al.  Approaches for calculating solvation free energies and enthalpies demonstrated with an update of the FreeSolv database , 2017, bioRxiv.

[65]  Christopher I. Bayly,et al.  Fast, efficient generation of high‐quality atomic charges. AM1‐BCC model: II. Parameterization and validation , 2002, J. Comput. Chem..

[66]  David L. Mobley,et al.  The SAMPL4 host–guest blind prediction challenge: an overview , 2014, Journal of Computer-Aided Molecular Design.

[67]  Gerrit Groenhof,et al.  GROMACS: Fast, flexible, and free , 2005, J. Comput. Chem..

[68]  David L. Mobley,et al.  FreeSolv: a database of experimental and calculated hydration free energies, with input files , 2014, Journal of Computer-Aided Molecular Design.

[69]  Herman van Vlijmen,et al.  Computational chemistry at Janssen , 2017, Journal of Computer-Aided Molecular Design.

[70]  Peter A. Kollman,et al.  How transferable are hydrogen parameters in molecular mechanics calculations? , 1992 .

[71]  Michael K. Gilson,et al.  Blind prediction of host–guest binding affinities: a new SAMPL3 challenge , 2012, Journal of Computer-Aided Molecular Design.

[72]  David L. Mobley,et al.  Open Force Field Consortium: Escaping atom types using direct chemical perception with SMIRNOFF v0.1 , 2018, bioRxiv.

[73]  Herman van Vlijmen,et al.  Collaborating to improve the use of free-energy and other quantitative methods in drug discovery , 2016, Journal of Computer-Aided Molecular Design.

[74]  Lee-Ping Wang,et al.  Systematic Parametrization of Polarizable Force Fields from Quantum Chemistry Data. , 2013, Journal of chemical theory and computation.

[75]  Rampi Ramprasad,et al.  Computational strategies for polymer dielectrics design , 2014 .

[76]  Kyle A Beauchamp,et al.  Toward Automated Benchmarking of Atomistic Force Fields: Neat Liquid Densities and Static Dielectric Constants from the ThermoML Data Archive. , 2015, The journal of physical chemistry. B.

[77]  Sereina Riniker,et al.  Fixed-Charge Atomistic Force Fields for Molecular Dynamics Simulations in the Condensed Phase: An Overview , 2018, J. Chem. Inf. Model..

[78]  David S. Wishart,et al.  DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs , 2010, Nucleic Acids Res..

[79]  Junmei Wang,et al.  Development and testing of a general amber force field , 2004, J. Comput. Chem..

[80]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[81]  Huai Sun,et al.  COMPASS II: extended coverage for polymer and drug-like molecule databases , 2016, Journal of Molecular Modeling.