Energy refinement and analysis of structures in the QM9 database via a highly accurate quantum chemical method

A wide variety of data-driven approaches have been introduced in the field of quantum chemistry. To extend the applicable range and improve the prediction power of those approaches, highly accurate quantum chemical benchmarks that cover extremely large chemical spaces are required. Here, we report ~134 k quantum chemical calculations performed with G4MP2, the fourth generation of the G-n series in which second-order perturbation theory is employed. A single composite method calculation executes several low-level calculations to reproduce the results of high-level ab initio calculations with the aim of saving computational costs. Therefore, our database reports the results of the various methods (e.g., density functional theory, Hartree-Fock, Møller–Plesset perturbation theory, and coupled-cluster theory). Additionally, we examined the structure information of both the QM9 and the revised databases via chemical graph analysis. Our database can be applied to refine and improve the quality of data-driven quantum chemical prediction. Furthermore, we reported the raw outputs of all calculations performed in this work for other potential applications.Design Type(s)chemical structure classification objective • chemical reaction data analysis objective • modeling and simulation objectiveMeasurement Type(s)chemical structure analysisTechnology Type(s)ab initio quantum chemistry computational methodFactor Type(s)atomSample Characteristic(s)Machine-accessible metadata file describing the reported data (ISA-Tab format)

[1]  Pavlo O. Dral,et al.  Quantum chemistry structures and properties of 134 kilo molecules , 2014, Scientific Data.

[2]  Thomas F. Miller,et al.  Transferability in Machine Learning for Electronic Structure via the Molecular Orbital Basis. , 2018, Journal of chemical theory and computation.

[3]  M. G. Medvedev,et al.  Density functional theory is straying from the path toward the exact functional , 2017, Science.

[4]  Daniel S. Falster,et al.  Corrigendum: The Coral Trait Database, a curated database of trait information for coral species from the global oceans , 2017, Scientific Data.

[5]  T. L. Cottrell The strengths of chemical bonds , 1958 .

[6]  W. Kim,et al.  Feasibility of Activation Energy Prediction of Gas-Phase Reactions by Machine Learning. , 2018, Chemistry.

[7]  Chris Wolverton,et al.  High-throughput DFT calculations of formation energy, stability and oxygen vacancy formation energy of ABO3 perovskites , 2017, Scientific Data.

[8]  Jordan M. Malof,et al.  Distributed solar photovoltaic array location and extent dataset for remote sensing object identification , 2016, Scientific Data.

[9]  Alán Aspuru-Guzik,et al.  The Harvard organic photovoltaic dataset , 2016, Scientific Data.

[10]  Jean-Louis Reymond,et al.  Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17 , 2012, J. Chem. Inf. Model..

[11]  Namita Srivastava,et al.  The Machine‐Learning Approach , 2020, Machine Learning for iOS Developers.

[12]  Kun Yao,et al.  Kinetic Energy of Hydrocarbons as a Function of Electron Density and Convolutional Neural Networks. , 2015, Journal of chemical theory and computation.

[13]  Olexandr Isayev,et al.  ANI-1, A data set of 20 million calculated off-equilibrium conformations for organic molecules , 2017, Scientific Data.

[14]  L. Curtiss,et al.  Gaussian-4 theory. , 2007, The Journal of chemical physics.

[15]  Lorenz C. Blum,et al.  970 million druglike small molecules for virtual screening in the chemical universe database GDB-13. , 2009, Journal of the American Chemical Society.

[16]  Markus Schneider,et al.  First-principles data set of 45,892 isolated and cation-coordinated conformers of 20 proteinogenic amino acids , 2015, Scientific Data.

[17]  Sergey Nikolenko,et al.  druGAN: An Advanced Generative Adversarial Autoencoder Model for de Novo Generation of New Molecules with Desired Molecular Properties in Silico. , 2017, Molecular pharmaceutics.

[18]  Connor W. Coley,et al.  Machine Learning in Computer-Aided Synthesis Planning. , 2018, Accounts of chemical research.

[19]  Jean-Louis Reymond,et al.  Virtual Exploration of the Chemical Universe up to 11 Atoms of C, N, O, F: Assembly of 26.4 Million Structures (110.9 Million Stereoisomers) and Analysis for New Ring Systems, Stereochemistry, Physicochemical Properties, Compound Classes, and Drug Discovery , 2007, J. Chem. Inf. Model..

[20]  Matthias Rupp,et al.  Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach. , 2015, Journal of chemical theory and computation.

[21]  Jakoah Brgoch,et al.  Predicting the Band Gaps of Inorganic Solids by Machine Learning. , 2018, The journal of physical chemistry letters.

[22]  Di Wu,et al.  An Effective and Efficient Adaptive Probability Data Dissemination Protocol in VANET , 2019, Data.

[23]  Ramakrishnan Raghunathan,et al.  Readme file: Data description for "Quantum chemistry structures and properties of 134 kilo molecules" , 2014 .

[24]  Roman M. Balabin,et al.  Neural network approach to quantum-chemistry data: accurate prediction of density functional theory energies. , 2009, The Journal of chemical physics.

[25]  Klaus-Robert Müller,et al.  Finding Density Functionals with Machine Learning , 2011, Physical review letters.

[26]  Sunghwan Choi,et al.  Highly accurate G4(MP2) benchmark on QM9 database: Energy refinement and analysis of structures , 2019 .

[27]  Xiao Li,et al.  In Silico Prediction of Chemical Acute Oral Toxicity Using Multi-Classification Methods , 2014, J. Chem. Inf. Model..

[28]  Weitao Yang,et al.  Insights into Current Limitations of Density Functional Theory , 2008, Science.

[29]  L. Curtiss,et al.  Gaussian-4 theory using reduced order perturbation theory. , 2007, The Journal of chemical physics.

[30]  Jin Woo Kim,et al.  Molecular generative model based on conditional variational autoencoder for de novo molecular design , 2018, Journal of Cheminformatics.

[31]  Xin Xu,et al.  The X1 method for accurate and efficient prediction of heats of formation. , 2007, The Journal of chemical physics.