Bayesian optimization for conformer generation

Generating low-energy molecular conformers is a key task for many areas of computational chemistry, molecular modeling and cheminformatics. Most current conformer generation methods primarily focus on generating geometrically diverse conformers rather than finding the most probable or energetically lowest minima. Here, we present a new stochastic search method called the Bayesian optimization algorithm (BOA) for finding the lowest energy conformation of a given molecule. We compare BOA with uniform random search, and systematic search as implemented in Confab, to determine which method finds the lowest energy. Energetic difference, root-mean-square deviation, and torsion fingerprint deviation are used to quantify the performance of the conformer search algorithms. In general, we find BOA requires far fewer evaluations than systematic or uniform random search to find low-energy minima. For molecules with four or more rotatable bonds, Confab typically evaluates $$10^{4}$$104 (median) conformers in its search, while BOA only requires $$10^{2}$$102 energy evaluations to find top candidates. Despite using evaluating fewer conformers, 20–40% of the time BOA finds lower-energy conformations than a systematic Confab search for molecules with four or more rotatable bonds.

[1]  David Duvenaud,et al.  Automatic model construction with Gaussian processes , 2014 .

[2]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[3]  Benjamin A. Ellingson,et al.  Conformer Generation with OMEGA: Algorithm and Validation Using High Quality Structures from the Protein Databank and Cambridge Structural Database , 2010, J. Chem. Inf. Model..

[4]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[5]  Matthias Rarey,et al.  Torsion Library Reloaded: A New Version of Expert-Derived SMARTS Rules for Assessing Conformations of Small Molecules , 2016, J. Chem. Inf. Model..

[6]  Koji Tsuda,et al.  COMBO: An efficient Bayesian optimization library for materials science , 2016 .

[7]  F. Allen The Cambridge Structural Database: a quarter of a million crystal structures and rising. , 2002, Acta crystallographica. Section B, Structural science.

[8]  John A. Keith,et al.  A sobering assessment of small-molecule force field methods for low energy conformer predictions , 2017, 1705.04308.

[9]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[10]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[11]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[12]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[13]  Pierre Tufféry,et al.  Frog2: Efficient 3D conformation ensemble generator for small compounds , 2010, Nucleic Acids Res..

[14]  Krishna Rajan,et al.  Information Science for Materials Discovery and Design , 2016 .

[15]  G. Chang,et al.  An internal-coordinate Monte Carlo method for searching conformational space , 1989 .

[16]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[17]  Prabhat,et al.  Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.

[18]  James Theiler,et al.  Adaptive Strategies for Materials Design using Uncertainties , 2016, Scientific Reports.

[19]  Patrick McCabe,et al.  Knowledge-Based Conformer Generation Using the Cambridge Structural Database , 2018, J. Chem. Inf. Model..

[20]  Alán Aspuru-Guzik,et al.  Phoenics: A Bayesian Optimizer for Chemistry , 2018, ACS central science.

[21]  Mark S. Johnson,et al.  Generating Conformer Ensembles Using a Multiobjective Genetic Algorithm , 2007, J. Chem. Inf. Model..

[22]  Alán Aspuru-Guzik,et al.  Parallel and Distributed Thompson Sampling for Large-scale Accelerated Exploration of Chemical Space , 2017, ICML.

[23]  Charlotte M. Deane,et al.  Freely Available Conformer Generation Methods: How Good Are They? , 2012, J. Chem. Inf. Model..

[24]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[25]  Paul N. Mortenson,et al.  Diverse, high-quality test set for the validation of protein-ligand docking performance. , 2007, Journal of medicinal chemistry.

[26]  Chris Morley,et al.  Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit , 2008, Chemistry Central journal.

[27]  Peter I. Frazier,et al.  Bayesian optimization for materials design , 2015, 1506.01349.

[28]  Sereina Riniker,et al.  Better Informed Distance Geometry: Using What We Know To Improve Conformation Generation , 2015, J. Chem. Inf. Model..

[29]  Jens Meiler,et al.  BCL::Conf: small molecule conformational sampling using a knowledge based rotamer library , 2015, Journal of Cheminformatics.

[30]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[31]  Anita R. Maguire,et al.  Confab - Systematic generation of diverse low-energy conformers , 2011, J. Cheminformatics.

[32]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[33]  Matthias Rarey,et al.  High-Quality Dataset of Protein-Bound Ligand Conformations and Its Application to Benchmarking Conformer Ensemble Generators , 2017, J. Chem. Inf. Model..

[34]  Matthias Rarey,et al.  TFD: Torsion Fingerprints As a New Measure To Compare Small Molecule Conformations , 2012, J. Chem. Inf. Model..

[35]  Thomas A. Halgren Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94 , 1996, J. Comput. Chem..

[36]  Andrew Gordon Wilson,et al.  Student-t Processes as Alternatives to Gaussian Processes , 2014, AISTATS.

[37]  D C Spellmeyer,et al.  Conformational analysis using distance geometry methods. , 1997, Journal of molecular graphics & modelling.

[38]  Matthias Rarey,et al.  Benchmarking Commercial Conformer Ensemble Generators , 2017, J. Chem. Inf. Model..

[39]  Stephen R. Wilson,et al.  Applications of simulated annealing to the conformational analysis of flexible molecules , 1991 .

[40]  Mark S. Johnson,et al.  Accurate conformation‐dependent molecular electrostatic potentials for high‐throughput in silico drug discovery , 2009, J. Comput. Chem..

[41]  Dimitar Dimitrov,et al.  Conformational Coverage by a Genetic Algorithm , 1999, J. Chem. Inf. Comput. Sci..

[42]  P. Hawkins Conformation Generation: The State of the Art , 2017, J. Chem. Inf. Model..