Blind prediction of cyclohexane–water distribution coefficients from the SAMPL5 challenge

In the recent SAMPL5 challenge, participants submitted predictions for cyclohexane/water distribution coefficients for a set of 53 small molecules. Distribution coefficients (log D) replace the hydration free energies that were a central part of the past five SAMPL challenges. A wide variety of computational methods were represented by the 76 submissions from 18 participating groups. Here, we analyze submissions by a variety of error metrics and provide details for a number of reference calculations we performed. As in the SAMPL4 challenge, we assessed the ability of participants to evaluate not just their statistical uncertainty, but their model uncertainty—how well they can predict the magnitude of their model or force field error for specific predictions. Unfortunately, this remains an area where prediction and analysis need improvement. In SAMPL4 the top performing submissions achieved a root-mean-squared error (RMSE) around 1.5 kcal/mol. If we anticipate accuracy in log D predictions to be similar to the hydration free energy predictions in SAMPL4, the expected error here would be around 1.54 log units. Only a few submissions had an RMSE below 2.5 log units in their predicted log D values. However, distribution coefficients introduced complexities not present in past SAMPL challenges, including tautomer enumeration, that are likely to be important in predicting biomolecular properties of interest to drug discovery, therefore some decrease in accuracy would be expected. Overall, the SAMPL5 distribution coefficient challenge provided great insight into the importance of modeling a variety of physical effects. We believe these types of measurements will be a promising source of data for future blind challenges, especially in view of the relatively straightforward nature of the experiments and the level of insight provided.

[1]  Miguel Jorge,et al.  1-Octanol/Water Partition Coefficients of n-Alkanes from Molecular Simulations of Absolute Solvation Free Energies. , 2009, Journal of chemical theory and computation.

[2]  Jonathan W. Essex,et al.  Theoretical determination of partition coefficients , 1992 .

[3]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[4]  Araz Jakalian,et al.  Fast, efficient generation of high‐quality atomic charges. AM1‐BCC model: I. Method , 2000 .

[5]  Anthony Nicholls,et al.  The SAMPL2 blind prediction challenge: introduction and overview , 2010, J. Comput. Aided Mol. Des..

[6]  Christopher J. Fennell,et al.  Adapting the semi-explicit assembly solvation model for estimating water-cyclohexane partitioning with the SAMPL5 molecules , 2016, Journal of Computer-Aided Molecular Design.

[7]  Daisy Y. Kyu,et al.  Calculating Partition Coefficients of Small Molecules in Octanol/Water and Cyclohexane/Water. , 2016, Journal of chemical theory and computation.

[8]  Miguel Jorge,et al.  Predicting hydration Gibbs energies of alkyl-aromatics using molecular simulation: a comparison of current force fields and the development of a new parameter set for accurate solvation data. , 2011, Physical chemistry chemical physics : PCCP.

[9]  Kenneth M. Merz,et al.  Free Energy Perturbation Study of Octanol/Water Partition Coefficients: Comparison with Continuum GB/SA Calculations , 1999 .

[10]  A. Leo,et al.  Partition coefficients and their uses , 1971 .

[11]  David L Mobley,et al.  Predicting small-molecule solvation free energies: an informal blind test for computational chemistry. , 2008, Journal of medicinal chemistry.

[12]  Bogdan I. Iorga,et al.  Prediction of cyclohexane-water distribution coefficients for the SAMPL5 data set using molecular dynamics simulations with the OPLS-AA force field , 2016, Journal of Computer-Aided Molecular Design.

[13]  W. L. Jorgensen,et al.  Comparison of simple potential functions for simulating liquid water , 1983 .

[14]  Stefan M. Kast,et al.  The SAMPL5 challenge for embedded-cluster integral equation theory: solvation free energies, aqueous pKa, and cyclohexane–water log D , 2016, Journal of Computer-Aided Molecular Design.

[15]  Bernard R Brooks,et al.  Partition coefficients for the SAMPL5 challenge using transfer free energies , 2016, Journal of Computer-Aided Molecular Design.

[16]  R. C. Weast CRC Handbook of Chemistry and Physics , 1973 .

[17]  Jeremy R. Greenwood,et al.  Epik: a software program for pKa prediction and protonation state generation for drug-like molecules , 2007, J. Comput. Aided Mol. Des..

[18]  Jonathan W. Essex,et al.  All-atom/coarse-grained hybrid predictions of distribution coefficients in SAMPL5 , 2016, Journal of Computer-Aided Molecular Design.

[19]  David L Mobley,et al.  Using MD Simulations To Calculate How Solvents Modulate Solubility. , 2016, Journal of chemical theory and computation.

[20]  D. van der Spoel,et al.  GROMACS: A message-passing parallel molecular dynamics implementation , 1995 .

[21]  Hugh S. Taylor,et al.  The Solubility of Water in Hydrocarbons , 1948 .

[22]  Carsten Kutzner,et al.  GROMACS 4:  Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. , 2008, Journal of chemical theory and computation.

[23]  J. Sangster,et al.  Octanol‐Water Partition Coefficients of Simple Organic Compounds , 1989 .

[24]  Peter A. Kollman,et al.  Calculation of Chloroform/Water Partition Coefficients for the N-Methylated Nucleic Acid Bases , 1997 .

[25]  Junmei Wang,et al.  Development and testing of a general amber force field , 2004, J. Comput. Chem..

[26]  Ganesh Kamath,et al.  Prediction of cyclohexane-water distribution coefficient for SAMPL5 drug-like compounds with the QMPFF3 and ARROW polarizable force fields , 2016, Journal of Computer-Aided Molecular Design.

[27]  Julien Michel,et al.  Prediction of partition coefficients by multiscale hybrid atomic-level/coarse-grain simulations. , 2008, The journal of physical chemistry. B.

[28]  Christopher J. Fennell,et al.  Predicting water-to-cyclohexane partitioning of the SAMPL5 molecules using dielectric balancing of force fields , 2016, Journal of Computer-Aided Molecular Design.

[29]  Andreas Klamt,et al.  Prediction of cyclohexane-water distribution coefficients with COSMO-RS on the SAMPL5 data set , 2016, Journal of Computer-Aided Molecular Design.

[30]  R. Gnanadesikan,et al.  Probability plotting methods for the analysis of data. , 1968, Biometrika.

[31]  David L. Mobley,et al.  Predicting hydration free energies using all-atom molecular dynamics simulations and multiple starting conformations , 2010, J. Comput. Aided Mol. Des..

[32]  Bernard R. Brooks,et al.  Blind prediction of distribution in the SAMPL5 challenge with QM based protomer and pKa corrections , 2016, Journal of Computer-Aided Molecular Design.

[33]  David L Mobley,et al.  Predictions of hydration free energies from all-atom molecular dynamics simulations. , 2009, The journal of physical chemistry. B.

[34]  Gerrit Groenhof,et al.  GROMACS: Fast, flexible, and free , 2005, J. Comput. Chem..

[35]  Li Yang,et al.  Comparison of two simulation methods to compute solvation free energies and partition coefficients , 2013, J. Comput. Chem..

[36]  David Calkins,et al.  Towards the comprehensive, rapid, and accurate prediction of the favorable tautomeric states of drug-like molecules in aqueous solution , 2010, J. Comput. Aided Mol. Des..

[37]  Sebastian Diaz-Rodriguez,et al.  Predicting cyclohexane/water distribution coefficients for the SAMPL5 challenge using MOSCED and the SMD solvation model , 2016, Journal of Computer-Aided Molecular Design.

[38]  Berk Hess,et al.  GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers , 2015 .

[39]  Bernard R. Brooks,et al.  Calculating distribution coefficients based on multi-scale free energy simulations: an evaluation of MM and QM/MM explicit solvent simulations of water-cyclohexane transfer in the SAMPL5 challenge , 2016, Journal of Computer-Aided Molecular Design.

[40]  Julien Michel,et al.  Blinded predictions of host-guest standard free energies of binding in the SAMPL5 challenge , 2016, Journal of Computer-Aided Molecular Design.

[41]  David L. Mobley,et al.  Box size effects are negligible for solvation free energies of neutral solutes , 2014, Journal of Computer-Aided Molecular Design.

[42]  Ioannis G. Economou,et al.  Prediction of the n‐hexane/water and 1‐octanol/water partition coefficients for environmentally relevant compounds using molecular simulation , 2012 .

[43]  W. L. Jorgensen Free energy calculations: a breakthrough for modeling organic chemistry in solution , 1989 .

[44]  Hwangseo Park,et al.  Extended solvent-contact model approach to blind SAMPL5 prediction challenge for the distribution coefficients of drug-like molecules , 2016, Journal of Computer-Aided Molecular Design.

[45]  L. Lai,et al.  Calculating partition coefficient by atom-additive method , 2000 .

[46]  Pedro Alexandrino Fernandes,et al.  Calculation of distribution coefficients in the SAMPL5 challenge from atomic solvation parameters and surface areas , 2016, Journal of Computer-Aided Molecular Design.

[47]  Peter M. Kasson,et al.  GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit , 2013, Bioinform..

[48]  Christopher I. Bayly,et al.  Fast, efficient generation of high‐quality atomic charges. AM1‐BCC model: II. Parameterization and validation , 2002, J. Comput. Chem..

[49]  David L. Mobley,et al.  Guidelines for the analysis of free energy calculations , 2015, Journal of Computer-Aided Molecular Design.

[50]  David L. Mobley,et al.  Blind prediction of solvation free energies from the SAMPL4 challenge , 2014, Journal of Computer-Aided Molecular Design.

[51]  Darren V S Green,et al.  Getting physical in drug discovery II: the impact of chromatographic hydrophobicity measurements and aromaticity. , 2011, Drug discovery today.

[52]  Erwin Laure,et al.  Solving Software Challenges for Exascale , 2014, Lecture Notes in Computer Science.

[53]  David L. Mobley,et al.  Alchemical prediction of hydration free energies for SAMPL , 2012, Journal of Computer-Aided Molecular Design.

[54]  Andriy Kovalenko,et al.  SAMPL5: 3D-RISM partition coefficient calculations with partial molar volume corrections and solute conformational sampling , 2016, Journal of Computer-Aided Molecular Design.

[55]  Berk Hess,et al.  GROMACS 3.0: a package for molecular simulation and trajectory analysis , 2001 .

[56]  Samuel Genheden,et al.  Predicting Partition Coefficients with a Simple All-Atom/Coarse-Grained Hybrid Model. , 2016, Journal of chemical theory and computation.

[57]  Matthew T. Geballe,et al.  The SAMPL3 blind prediction challenge: transfer energy overview , 2012, Journal of Computer-Aided Molecular Design.