Chemical diversity in molecular orbital energy predictions with kernel ridge regression.

Instant machine learning predictions of molecular properties are desirable for materials design, but the predictive power of the methodology is mainly tested on well-known benchmark datasets. Here, we investigate the performance of machine learning with kernel ridge regression (KRR) for the prediction of molecular orbital energies on three large datasets: the standard QM9 small organic molecules set, amino acid and dipeptide conformers, and organic crystal-forming molecules extracted from the Cambridge Structural Database. We focus on the prediction of highest occupied molecular orbital (HOMO) energies, computed at the density-functional level of theory. Two different representations that encode the molecular structure are compared: the Coulomb matrix (CM) and the many-body tensor representation (MBTR). We find that KRR performance depends significantly on the chemistry of the underlying dataset and that the MBTR is superior to the CM, predicting HOMO energies with a mean absolute error as low as 0.09 eV. To demonstrate the power of our machine learning method, we apply our model to structures of 10k previously unseen molecules. We gain instant energy predictions that allow us to identify interesting molecules for future applications.

[1]  H. Queisser,et al.  Detailed Balance Limit of Efficiency of p‐n Junction Solar Cells , 1961 .


[3]  Burke,et al.  Generalized Gradient Approximation Made Simple. , 1996, Physical review letters.

[4]  Frank H. Allen,et al.  Cambridge Structural Database , 2002 .

[5]  Matthias Scheffler,et al.  Combining GW calculations with exact-exchange density-functional theory: an analysis of valence-band photoemission for compound semiconductors , 2005, cond-mat/0502404.

[6]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[7]  A. Tkatchenko,et al.  Accurate molecular van der Waals interactions from ground-state electron density and free-atom reference data. , 2009, Physical review letters.

[8]  Lorenz C. Blum,et al.  970 million druglike small molecules for virtual screening in the chemical universe database GDB-13. , 2009, Journal of the American Chemical Society.

[9]  Matthias Scheffler,et al.  Efficient O(N) integration for all-electron electronic structure calculation using numeric basis functions , 2009, J. Comput. Phys..

[10]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[11]  Matthias Scheffler,et al.  Ab initio molecular simulations with numeric atom-centered orbitals , 2009, Comput. Phys. Commun..

[12]  Andreas Ziehe,et al.  Learning Invariant Representations of Molecules for Atomization Energy Prediction , 2012, NIPS.

[13]  Jean-Louis Reymond,et al.  Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17 , 2012, J. Chem. Inf. Model..

[14]  A. Tkatchenko,et al.  Resolution-of-identity approach to Hartree–Fock, hybrid density functionals, RPA, MP2 and GW with numeric atom-centered orbital basis functions , 2012, 1201.0655.

[15]  A. Majumdar,et al.  Opportunities and challenges for a sustainable energy future , 2012, Nature.

[16]  K. Müller,et al.  Fast and accurate modeling of molecular atomization energies with machine learning. , 2011, Physical review letters.

[17]  Klaus-Robert Müller,et al.  Assessment and Validation of Machine Learning Methods for Predicting Molecular Atomization Energies. , 2013, Journal of chemical theory and computation.

[18]  M. Rupp,et al.  Machine learning of molecular electronic properties in chemical compound space , 2013, 1305.7074.

[19]  Bowler,et al.  Atomistic Computer Simulations: A Practical Guide , 2013 .

[20]  M. Rupp,et al.  Fourier series of atomic radial distribution functions: A molecular fingerprint for machine learning models of quantum chemical properties , 2013, 1307.2918.

[21]  R. Kondor,et al.  On representing chemical environments , 2012, 1209.3140.

[22]  Pavlo O. Dral,et al.  Quantum chemistry structures and properties of 134 kilo molecules , 2014, Scientific Data.

[23]  Li Li,et al.  Understanding Machine-learned Density Functionals , 2014, ArXiv.

[24]  Michael P. Marshak,et al.  A metal-free organic–inorganic aqueous flow battery , 2014, Nature.

[25]  O. A. von Lilienfeld,et al.  Transferable Atomic Multipole Machine Learning Models for Small Organic Molecules. , 2015, Journal of chemical theory and computation.

[26]  K. Müller,et al.  Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space , 2015, The journal of physical chemistry letters.

[27]  Matthias Rupp,et al.  Machine learning for quantum mechanics in a nutshell , 2015 .

[28]  Sergey V. Levchenko,et al.  Hybrid functionals for large periodic systems in an all-electron, numeric atom-centered basis framework , 2015, Comput. Phys. Commun..

[29]  Robert P. Sheridan,et al.  Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships , 2015, J. Chem. Inf. Model..

[30]  Edward O. Pyzer-Knapp,et al.  Learning from the Harvard Clean Energy Project: The Use of Neural Networks to Accelerate Materials Discovery , 2015 .

[31]  O. A. von Lilienfeld,et al.  Electronic spectra from TDDFT and machine learning in chemical space. , 2015, The Journal of chemical physics.

[32]  Matthias Rupp,et al.  Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach. , 2015, Journal of chemical theory and computation.

[33]  M. Rupp,et al.  Machine Learning for Quantum Mechanical Properties of Atoms in Molecules , 2015, 1505.00350.

[34]  Michael I. Jordan,et al.  Machine learning: Trends, perspectives, and prospects , 2015, Science.

[35]  O. A. von Lilienfeld,et al.  Communication: Understanding molecular representations in machine learning: The role of uniqueness and target similarity. , 2016, The Journal of chemical physics.

[36]  I. Bruno,et al.  Cambridge Structural Database , 2002 .

[37]  Raynald Gauvin,et al.  Application of machine learning methods for the prediction of crystal system of cathode materials in lithium-ion batteries , 2016 .

[38]  Markus Schneider,et al.  First-principles data set of 45,892 isolated and cation-coordinated conformers of 20 proteinogenic amino acids , 2015, Scientific Data.

[39]  Ryan P. Adams,et al.  Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach. , 2016, Nature materials.

[40]  Karsten Reuter,et al.  Virtual Screening for High Carrier Mobility in Organic Semiconductors. , 2016, The journal of physical chemistry letters.

[41]  Tim Mueller,et al.  Machine Learning in Materials Science , 2016 .

[42]  Oleksandr Voznyy,et al.  Enhanced electrocatalytic CO2 reduction via field-induced reagent concentration , 2016, Nature.

[43]  Gábor Csányi,et al.  Comparing molecules and solids across structural and alchemical space. , 2015, Physical chemistry chemical physics : PCCP.

[44]  Alexandre Tkatchenko,et al.  Quantum-chemical insights from deep tensor neural networks , 2016, Nature Communications.

[45]  Haoyan Huo,et al.  Unified Representation for Machine Learning of Molecules and Crystals , 2017 .

[46]  James E. Gubernatis,et al.  Multi-fidelity machine learning models for accurate bandgap predictions of solids , 2017 .

[47]  Qing-You Zhang,et al.  Machine Learning Methods to Predict Density Functional Theory B3LYP Energies of HOMO and LUMO Orbitals , 2017, J. Chem. Inf. Model..

[48]  Noam Bernstein,et al.  Machine learning unifies the modeling of materials and molecules , 2017, Science Advances.

[49]  Gerbrand Ceder,et al.  Efficient and accurate machine-learning interpolation of atomic energies in compositions with many species , 2017, 1706.06293.

[50]  George E. Dahl,et al.  Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error. , 2017, Journal of chemical theory and computation.

[51]  Michele Ceriotti,et al.  Mapping and classifying molecules from a high-throughput structural database , 2016, Journal of Cheminformatics.

[52]  Vijay S. Pande,et al.  MoleculeNet: a benchmark for molecular machine learning , 2017, Chemical science.

[53]  O. Anatole von Lilienfeld,et al.  Machine Learning, Quantum Chemistry, and Chemical Space , 2017 .

[54]  O. A. von Lilienfeld,et al.  Machine learning meets volcano plots: computational discovery of cross-coupling catalysts , 2018, Chemical science.

[55]  Florbela Pereira,et al.  Machine learning for the prediction of molecular dipole moments obtained by density functional theory , 2018, Journal of Cheminformatics.

[56]  A. Zunger Inverse design in search of materials with target functionalities , 2018 .

[57]  Geoffrey J. Gordon,et al.  Constant size descriptors for accurate machine learning models of molecular properties. , 2018, The Journal of chemical physics.

[58]  Alexandre Tkatchenko,et al.  Non-covalent interactions across organic and biological subsets of chemical space: Physics-based potentials parametrized from machine learning. , 2017, The Journal of chemical physics.

[59]  Anders S. Christensen,et al.  Alchemical and structural distribution based representation for universal quantum machine learning. , 2017, The Journal of chemical physics.

[60]  Gowoon Cheon,et al.  Machine Learning-Assisted Discovery of Solid Li-Ion Conducting Materials , 2018, Chemistry of Materials.

[61]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[62]  Klaus-Robert Müller,et al.  Capturing intensive and extensive DFT/TDDFT molecular properties with machine learning , 2018 .

[63]  Christopher J. Bartel,et al.  Machine learning for heterogeneous catalyst design and discovery , 2018 .

[64]  O. Anatole von Lilienfeld,et al.  Quantum Machine Learning in Chemical Compound Space , 2018 .

[65]  Kieron Burke,et al.  Guest Editorial: Special Topic on Data-Enabled Theoretical Chemistry. , 2018, The Journal of chemical physics.

[66]  Joseph Gomes,et al.  MoleculeNet: a benchmark for molecular machine learning† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc02664a , 2017, Chemical science.

[67]  Markus Meuwly,et al.  A reactive, scalable, and transferable model for molecular energies from a neural network approach based on local information. , 2018, The Journal of chemical physics.

[68]  Gowoon Cheon,et al.  Machine learning-assisted discovery of many new solid Li-ion conducting materials , 2018, 1808.02470.

[69]  Justin S. Smith,et al.  Hierarchical modeling of molecular energies using a deep neural network. , 2017, The Journal of chemical physics.

[70]  K-R Müller,et al.  SchNet - A deep learning architecture for molecules and materials. , 2017, The Journal of chemical physics.

[71]  Karsten Reuter,et al.  Knowledge discovery through chemical space networks: the case of organic electronics , 2019, Journal of Molecular Modeling.

[72]  Jukka Corander,et al.  Bayesian inference of atomistic structure in functional materials , 2017, npj Computational Materials.

[73]  Johannes T. Margraf,et al.  Finding the Right Bricks for Molecular Legos: A Data Mining Approach to Organic Semiconductor Design , 2019, Chemistry of Materials.

[74]  Mikkel N. Schmidt,et al.  Deep Learning Spectroscopy: Neural Networks for Molecular Excitation Spectra , 2019, Advanced science.