Assessment and Validation of Machine Learning Methods for Predicting Molecular Atomization Energies.

The accurate and reliable prediction of properties of molecules typically requires computationally intensive quantum-chemical calculations. Recently, machine learning techniques applied to ab initio calculations have been proposed as an efficient approach for describing the energies of molecules in their given ground-state structure throughout chemical compound space (Rupp et al. Phys. Rev. Lett. 2012, 108, 058301). In this paper we outline a number of established machine learning techniques and investigate the influence of the molecular representation on the methods performance. The best methods achieve prediction errors of 3 kcal/mol for the atomization energies of a wide variety of molecules. Rationales for this performance improvement are given together with pitfalls and challenges when applying machine learning approaches to the prediction of quantum-mechanical observables.

[1]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[2]  David G. Stork,et al.  Pattern Classification , 1973 .

[3]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[4]  L. Bottou Stochastic Gradient Learning in Neural Networks , 1991 .

[5]  D. W. Noid,et al.  Potential energy surfaces for macromolecules. A neural network technique , 1992 .

[6]  W. Goddard,et al.  UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations , 1992 .

[7]  L. Breiman,et al.  Submodel selection and evaluation in regression. The X-random case , 1992 .

[8]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[9]  Steven D. Brown,et al.  Neural network models of potential energy surfaces , 1995 .

[10]  M. Wolff,et al.  BURGER'S MEDICINAL CHEMISTRY AND DRUG DISCOVERY , 1996 .

[11]  Burke,et al.  Generalized Gradient Approximation Made Simple. , 1996, Physical review letters.

[12]  Klaus Schulten,et al.  A Numerical Study on Learning Curves in Stochastic Multilayer Feedforward Networks , 1996, Neural Computation.

[13]  Klaus-Robert Müller,et al.  Asymptotic statistical theory of overtraining and cross-validation , 1997, IEEE Trans. Neural Networks.

[14]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[15]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[16]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[17]  G. Scuseria,et al.  Assessment of the Perdew–Burke–Ernzerhof exchange-correlation functional , 1999 .

[18]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[19]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[20]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[21]  D. Bates,et al.  Mixed-Effects Models in S and S-PLUS , 2001 .

[22]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[23]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[24]  V. Carey,et al.  Mixed-Effects Models in S and S-Plus , 2001 .

[25]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[26]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[27]  I. Jolliffe Principal Component Analysis , 2002 .

[28]  Lihong Hu,et al.  Combined first-principles calculation and neural-network correction approach for heat of formation , 2003 .

[29]  Douglas M. Hawkins,et al.  The Problem of Overfitting , 2004, J. Chem. Inf. Model..

[30]  Bernhard Schölkopf,et al.  Training Invariant Support Vector Machines , 2002, Machine Learning.

[31]  Lihong Hu,et al.  A generalized exchange-correlation functional: the Neural-Networks approach ☆ , 2003, physics/0311024.

[32]  A. Gross,et al.  Representing high-dimensional potential-energy surfaces for reactions at surfaces by neural networks , 2004 .

[33]  Gunnar Rätsch,et al.  Classifying 'Drug-likeness' with Kernel-Based Learning Methods , 2005, J. Chem. Inf. Model..

[34]  A. Gross,et al.  Descriptions of surface chemical reactions using a neural network representation of the potential-energy surface , 2006 .

[35]  Sergei Manzhos,et al.  A random-sampling high dimensional model representation neural network for building potential energy surfaces. , 2006, The Journal of chemical physics.

[36]  Egon L. Willighagen,et al.  The Blue Obelisk—Interoperability in Chemical Informatics , 2006, J. Chem. Inf. Model..

[37]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[38]  Michele Parrinello,et al.  Generalized neural-network representation of high-dimensional potential-energy surfaces. , 2007, Physical review letters.

[39]  Bernadette Govaerts,et al.  A review of quantitative structure-activity relationship (QSAR) models , 2007 .

[40]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[41]  J. Stewart Optimization of parameters for semiempirical methods V: Modification of NDDO approximations and application to 70 elements , 2007, Journal of molecular modeling.

[42]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[43]  Jason Weston,et al.  Large-Scale Kernel Machines (Neural Information Processing) , 2007 .

[44]  Wolfram Burgard,et al.  Most likely heteroscedastic Gaussian process regression , 2007, ICML '07.

[45]  Joachim M. Buhmann,et al.  On Relevant Dimensions in Kernel Feature Spaces , 2008, J. Mach. Learn. Res..

[46]  Gordana Ivosev,et al.  Dimensionality reduction and visualization in principal component analysis. , 2008, Analytical chemistry.

[47]  J. Behler,et al.  Metadynamics simulations of the high-pressure phases of silicon employing a high-dimensional neural network potential. , 2008, Physical review letters.

[48]  Roman M. Balabin,et al.  Neural network approach to quantum-chemistry data: accurate prediction of density functional theory energies. , 2009, The Journal of chemical physics.

[49]  P. Popelier,et al.  Dynamically Polarizable Water Potential Based on Multipole Moments Trained by Machine Learning. , 2009, Journal of chemical theory and computation.

[50]  Lorenz C. Blum,et al.  970 million druglike small molecules for virtual screening in the chemical universe database GDB-13. , 2009, Journal of the American Chemical Society.

[51]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[52]  Sebastian Mika,et al.  Bias-Correction of Regression Models: A Case Study on hERG Inhibition , 2009, J. Chem. Inf. Model..

[53]  Matthias Scheffler,et al.  Ab initio molecular simulations with numeric atom-centered orbitals , 2009, Comput. Phys. Commun..

[54]  Anubhav Jain,et al.  Finding Nature’s Missing Ternary Oxide Compounds Using Machine Learning and Density Functional Theory , 2010 .

[55]  C. Selassie,et al.  History of Quantitative Structure–Activity Relationships , 2010 .

[56]  R. Kondor,et al.  Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons. , 2009, Physical review letters.

[57]  Luca Maria Gambardella,et al.  Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[58]  P. Popelier,et al.  Potential energy surfaces fitted by artificial neural networks. , 2010, The journal of physical chemistry. A.

[59]  Gavin C. Cawley,et al.  On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation , 2010, J. Mach. Learn. Res..

[60]  A. Sayed,et al.  Foundations and Trends ® in Machine Learning > Vol 7 > Issue 4-5 Ordering Info About Us Alerts Contact Help Log in Adaptation , Learning , and Optimization over Networks , 2011 .

[61]  P. Bühlmann,et al.  Estimation for High‐Dimensional Linear Mixed‐Effects Models Using ℓ1‐Penalization , 2010, 1002.3784.

[62]  Klaus-Robert Müller,et al.  Kernel Analysis of Deep Networks , 2011, J. Mach. Learn. Res..

[63]  P. Popelier,et al.  Intramolecular polarisable multipolar electrostatics from the machine learning method Kriging , 2011 .

[64]  Roman M. Balabin,et al.  Support vector machine regression (LS-SVM)--an alternative to artificial neural networks (ANNs) for the analysis of quantum chemistry data? , 2011, Physical chemistry chemical physics : PCCP.

[65]  J. Behler Neural network potential-energy surfaces in chemistry: a tool for large-scale simulations. , 2011, Physical chemistry chemical physics : PCCP.

[66]  Klaus-Robert Müller,et al.  ℓ1-penalized linear mixed-effects models for high dimensional data with application to BCI , 2011, NeuroImage.

[67]  Klaus-Robert Müller,et al.  Introduction to machine learning for brain imaging , 2011, NeuroImage.

[68]  Klaus-Robert Müller,et al.  Optimizing transition states via kernel-based machine learning. , 2012, The Journal of chemical physics.

[69]  J. Behler,et al.  Construction of high-dimensional neural network potentials using environment-dependent atom pairs. , 2012, The Journal of chemical physics.

[70]  Andreas Ziehe,et al.  Learning Invariant Representations of Molecules for Atomization Energy Prediction , 2012, NIPS.

[71]  J. Moussa Comment on "Fast and accurate modeling of molecular atomization energies with machine learning". , 2012, Physical review letters.

[72]  K. Müller,et al.  Ruppet al.Reply , 2012 .

[73]  Klaus-Robert Müller,et al.  Finding Density Functionals with Machine Learning , 2011, Physical review letters.

[74]  Takafumi Kanamori,et al.  Density Ratio Estimation in Machine Learning , 2012 .

[75]  Grgoire Montavon,et al.  Neural Networks: Tricks of the Trade , 2012, Lecture Notes in Computer Science.

[76]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[77]  K. Müller,et al.  Fast and accurate modeling of molecular atomization energies with machine learning. , 2011, Physical review letters.

[78]  Klaus-Robert Müller,et al.  Analyzing Local Structure in Kernel-Based Learning: Explanation, Complexity, and Reliability Assessment , 2013, IEEE Signal Processing Magazine.