ANI-1, A data set of 20 million calculated off-equilibrium conformations for organic molecules

One of the grand challenges in modern theoretical chemistry is designing and implementing approximations that expedite ab initio methods without loss of accuracy. Machine learning (ML) methods are emerging as a powerful approach to constructing various forms of transferable atomistic potentials. They have been successfully applied in a variety of applications in chemistry, biology, catalysis, and solid-state physics. However, these models are heavily dependent on the quality and quantity of data used in their fitting. Fitting highly flexible ML potentials, such as neural networks, comes at a cost: a vast amount of reference data is required to properly train these models. We address this need by providing access to a large computational DFT database, which consists of more than 20 M off equilibrium conformations for 57,462 small organic molecules. We believe it will become a new standard benchmark for comparison of current and future methods in the ML potential community.

[1]  J. Reymond The chemical space project. , 2015, Accounts of chemical research.

[2]  S. Grimme,et al.  Density functional theory with dispersion corrections for supramolecular structures, aggregates, and complexes of (bio)organic molecules. , 2007, Organic & biomolecular chemistry.

[3]  Jörg Behler,et al.  Constructing high‐dimensional neural network potentials: A tutorial review , 2015 .

[4]  O. A. von Lilienfeld,et al.  Communication: Understanding molecular representations in machine learning: The role of uniqueness and target similarity. , 2016, The Journal of chemical physics.

[5]  K. Müller,et al.  Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space , 2015, The journal of physical chemistry letters.

[6]  Alexander V. Neimark,et al.  Density functional theory methods for characterization of porous materials , 2013 .

[7]  Michael Gastegger,et al.  Machine learning molecular dynamics for the simulation of infrared spectra† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc02267k , 2017, Chemical science.

[8]  Klaus-Robert Müller,et al.  Machine learning of accurate energy-conserving molecular force fields , 2016, Science Advances.

[9]  Pavel Hobza,et al.  Stabilization and structure calculations for noncovalent interactions in extended molecular systems based on wave function and density functional theories. , 2010, Chemical reviews.

[10]  Michele Parrinello,et al.  Generalized neural-network representation of high-dimensional potential-energy surfaces. , 2007, Physical review letters.

[11]  George E. Dahl,et al.  Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error. , 2017, Journal of chemical theory and computation.

[12]  O. Anatole von Lilienfeld,et al.  Chemical space exploration with molecular genes and machine learning , 2017 .

[13]  J. Pople,et al.  Self‐Consistent Molecular‐Orbital Methods. IX. An Extended Gaussian‐Type Basis for Molecular‐Orbital Studies of Organic Molecules , 1971 .

[14]  Thomas Bligaard,et al.  Density functional theory in surface chemistry and catalysis , 2011, Proceedings of the National Academy of Sciences.

[15]  Jean-Louis Reymond,et al.  Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17 , 2012, J. Chem. Inf. Model..

[16]  Lori A Burns,et al.  Assessment of the Performance of DFT and DFT-D Methods for Describing Distance Dependence of Hydrogen-Bonded Interactions. , 2011, Journal of chemical theory and computation.

[17]  A. Becke Perspective: Fifty years of density-functional theory in chemical physics. , 2014, The Journal of chemical physics.

[18]  F. Matthias Bickelhaupt,et al.  Chemistry with ADF , 2001, J. Comput. Chem..

[19]  M. Rupp,et al.  Machine learning of molecular electronic properties in chemical compound space , 2013, 1305.7074.

[20]  Jean-Louis Reymond,et al.  Virtual exploration of the small-molecule chemical universe below 160 Daltons. , 2005, Angewandte Chemie.

[21]  T. Halgren Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94 , 1996, J. Comput. Chem..

[22]  Samuel S. Schoenholz,et al.  Fast machine learning models of electronic and energetic properties consistently reach approximation errors better than DFT accuracy , 2017 .

[23]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[24]  S. Grimme,et al.  A thorough benchmark of density functional methods for general main group thermochemistry, kinetics, and noncovalent interactions. , 2011, Physical chemistry chemical physics : PCCP.

[25]  Alán Aspuru-Guzik,et al.  Advances in molecular quantum chemistry contained in the Q-Chem 4 program package , 2014, Molecular Physics.

[26]  Jörg Behler,et al.  Structure of aqueous NaOH solutions: insights from neural-network-based molecular dynamics simulations. , 2016, Physical chemistry chemical physics : PCCP.

[27]  M. Head‐Gordon,et al.  Systematic optimization of long-range corrected hybrid density functionals. , 2008, The Journal of chemical physics.

[28]  J. Behler First Principles Neural Network Potentials for Reactive Simulations of Large Molecular and Condensed Systems. , 2017, Angewandte Chemie.

[29]  K. Müller,et al.  Fast and accurate modeling of molecular atomization energies with machine learning. , 2011, Physical review letters.

[30]  Alexandre Tkatchenko,et al.  Quantum-chemical insights from deep tensor neural networks , 2016, Nature Communications.

[31]  Donald G Truhlar,et al.  Computational Thermochemistry: Scale Factor Databases and Scale Factors for Vibrational Frequencies Obtained from Electronic Model Chemistries. , 2010, Journal of chemical theory and computation.

[32]  Pavlo O. Dral,et al.  Quantum chemistry structures and properties of 134 kilo molecules , 2014, Scientific Data.

[33]  U. Rothlisberger,et al.  Mixed Quantum Mechanical/Molecular Mechanical Molecular Dynamics Simulations of Biological Systems in Ground and Electronically Excited States. , 2015, Chemical reviews.

[34]  Jürgen Hafner,et al.  Ab‐initio simulations of materials using VASP: Density‐functional theory and beyond , 2008, J. Comput. Chem..

[35]  Jean-Louis Reymond,et al.  Virtual Exploration of the Chemical Universe up to 11 Atoms of C, N, O, F: Assembly of 26.4 Million Structures (110.9 Million Stereoisomers) and Analysis for New Ring Systems, Stereochemistry, Physicochemical Properties, Compound Classes, and Drug Discovery , 2007, J. Chem. Inf. Model..

[36]  J S Smith,et al.  ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost , 2016, Chemical science.