Molecular Design Using Signal Processing and Machine Learning: Time-Frequency-like Representation and Forward Design

Accumulation of molecular data obtained from quantum mechanics (QM) theories such as density functional theory (DFTQM) make it possible for machine learning (ML) to accelerate the discovery of new molecules, drugs, and materials. Models that combine QM with ML (QM-ML) have been very effective in delivering the precision of QM at the high speed of ML. In this study, we show that by integrating well-known signal processing (SP) techniques (i.e. short time Fourier transform, continuous wavelet analysis and Wigner-Ville distribution) in the QM-ML pipeline, we obtain a powerful machinery (QM-SP-ML) that can be used for representation, visualization and forward design of molecules. More precisely, in this study, we show that the time-frequency-like representation of molecules encodes their structural, geometric, energetic, electronic and thermodynamic properties. This is demonstrated by using the new representation in the forward design loop as input to a deep convolutional neural networks trained on DFTQM calculations, which outputs the properties of the molecules. Tested on the QM9 dataset (composed of 133,855 molecules and 19 properties), the new QM-SP-ML model is able to predict the properties of molecules with a mean absolute error (MAE) below acceptable chemical accuracy (i.e. MAE < 1 Kcal/mol for total energies and MAE < 0.1 ev for orbital energies). Furthermore, the new approach performs similarly or better compared to other ML state-of-the-art techniques described in the literature. In all, in this study, we show that the new QM-SP-ML model represents a powerful technique for molecular forward design. All the codes and data generated and used in this study are available as supporting materials at this https URL.

[1]  Kipton Barros,et al.  Approaching coupled cluster accuracy with a general-purpose neural network potential through transfer learning , 2019, Nature Communications.

[2]  Alexander D. MacKerell,et al.  Molecular mechanics. , 2014, Current pharmaceutical design.

[3]  K. Müller,et al.  Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space , 2015, The journal of physical chemistry letters.

[4]  Klaus-Robert Müller,et al.  Assessment and Validation of Machine Learning Methods for Predicting Molecular Atomization Energies. , 2013, Journal of chemical theory and computation.

[5]  Danail Bonchev,et al.  Statistical modelling of molecular descriptors in QSAR/QSPR , 2012 .

[6]  Markus Meuwly,et al.  PhysNet: A Neural Network for Predicting Energies, Forces, Dipole Moments, and Partial Charges. , 2019, Journal of chemical theory and computation.

[7]  O. A. von Lilienfeld,et al.  Communication: Understanding molecular representations in machine learning: The role of uniqueness and target similarity. , 2016, The Journal of chemical physics.

[8]  Boaz Porat,et al.  A course in digital signal processing , 1996 .

[9]  M. Plesset,et al.  Note on an Approximation Treatment for Many-Electron Systems , 1934 .

[10]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[11]  K. Fiedler,et al.  Monte Carlo Methods in Ab Initio Quantum Chemistry , 1995 .

[12]  Julio J. Valdés,et al.  Discrete Fourier Transform Improves the Prediction of the Electronic Properties of Molecules in Quantum Machine Learning , 2019, 2019 IEEE Canadian Conference of Electrical and Computer Engineering (CCECE).

[13]  Alexandre Tkatchenko,et al.  Quantum-chemical insights from deep tensor neural networks , 2016, Nature Communications.

[14]  Matthias Rupp,et al.  Machine learning for quantum mechanics in a nutshell , 2015 .

[15]  Geoffrey J. Gordon,et al.  Constant size descriptors for accurate machine learning models of molecular properties. , 2018, The Journal of chemical physics.

[16]  Andreas Ziehe,et al.  Learning Invariant Representations of Molecules for Atomization Energy Prediction , 2012, NIPS.

[17]  Maho Nakata,et al.  PubChemQC Project: A Large-Scale First-Principles Electronic Structure Database for Data-Driven Chemistry , 2017, J. Chem. Inf. Model..

[18]  Anders S. Christensen,et al.  Alchemical and structural distribution based representation for universal quantum machine learning. , 2017, The Journal of chemical physics.

[19]  Yanli Wang,et al.  PubChem: a public information system for analyzing bioactivities of small molecules , 2009, Nucleic Acids Res..

[20]  Guozhu Li,et al.  Comparison Study on the Prediction of Multiple Molecular Properties by Various Neural Networks. , 2018, The journal of physical chemistry. A.

[21]  M. Rupp,et al.  Machine learning of molecular electronic properties in chemical compound space , 2013, 1305.7074.

[22]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[23]  Klaus-Robert Müller,et al.  SchNet: A continuous-filter convolutional neural network for modeling quantum interactions , 2017, NIPS.

[24]  Lorenz C. Blum,et al.  970 million druglike small molecules for virtual screening in the chemical universe database GDB-13. , 2009, Journal of the American Chemical Society.

[25]  E. Villaseñor Introduction to Quantum Mechanics , 2008, Nature.

[26]  Qing-You Zhang,et al.  Machine Learning Methods to Predict Density Functional Theory B3LYP Energies of HOMO and LUMO Orbitals , 2017, J. Chem. Inf. Model..

[27]  Yang Yang,et al.  Accurate molecular polarizabilities with coupled cluster theory and machine learning , 2018, Proceedings of the National Academy of Sciences.

[28]  C. Sherrill An Introduction to Hartree-Fock Molecular Orbital Theory , 2009 .

[29]  D. Bowler,et al.  O(N) methods in electronic structure calculations. , 2011, Reports on progress in physics. Physical Society.

[30]  E. Iype,et al.  Machine learning model for non-equilibrium structures and energies of simple molecules. , 2019, The Journal of chemical physics.

[31]  C. David Sherrill,et al.  The Configuration Interaction Method: Advances in Highly Correlated Approaches , 1999 .

[32]  G. Hunault,et al.  Dataset’s chemical diversity limits the generalizability of machine learning predictions , 2019, Journal of Cheminformatics.

[33]  Rodney J. Bartlett,et al.  COUPLED-CLUSTER THEORY: AN OVERVIEW OF RECENT DEVELOPMENTS , 1995 .

[34]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[35]  P. Dirac Quantum Mechanics of Many-Electron Systems , 1929 .

[36]  Pavlo O. Dral,et al.  Quantum chemistry structures and properties of 134 kilo molecules , 2014, Scientific Data.

[37]  Noam Bernstein,et al.  Machine learning unifies the modeling of materials and molecules , 2017, Science Advances.

[38]  Evan Bolton,et al.  PubChem 2019 update: improved access to chemical data , 2018, Nucleic Acids Res..

[39]  J. C. Slater,et al.  Simplified LCAO Method for the Periodic Potential Problem , 1954 .

[40]  Fang Liu,et al.  Learning from Failure: Predicting Electronic Structure Calculation Outcomes with Machine Learning Models. , 2019, Journal of chemical theory and computation.

[41]  W. Kohn,et al.  Self-Consistent Equations Including Exchange and Correlation Effects , 1965 .

[42]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[43]  Mikkel N. Schmidt,et al.  Deep Learning Spectroscopy: Neural Networks for Molecular Excitation Spectra , 2019, Advanced science.

[44]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[45]  Markus Meuwly,et al.  A reactive, scalable, and transferable model for molecular energies from a neural network approach based on local information. , 2018, The Journal of chemical physics.

[46]  Julio J. Valdés,et al.  Prediction of the Atomization Energy of Molecules Using Coulomb Matrix and Atomic Composition in a Bayesian Regularized Neural Networks , 2019, ICANN.

[47]  Thomas F. Miller,et al.  A Universal Density Matrix Functional from Molecular Orbital-Based Machine Learning: Transferability across Organic Molecules , 2019, The Journal of chemical physics.

[48]  Isaac Tamblyn,et al.  Convolutional neural networks for atomistic systems , 2017, Computational Materials Science.

[49]  George E. Dahl,et al.  Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error. , 2017, Journal of chemical theory and computation.

[50]  Justin S. Smith,et al.  Hierarchical modeling of molecular energies using a deep neural network. , 2017, The Journal of chemical physics.

[51]  Daniel W. Davies,et al.  Machine learning for molecular and materials science , 2018, Nature.

[52]  Anand Chandrasekaran,et al.  Solving the electronic structure problem with machine learning , 2019, npj Computational Materials.

[53]  Julio J. Valdés,et al.  Characterization of Quantum Derived Electronic Properties of Molecules: A Computational Intelligence Approach , 2019, ICANN.

[54]  Chi Chen,et al.  Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals , 2018, Chemistry of Materials.

[55]  K. Müller,et al.  Fast and accurate modeling of molecular atomization energies with machine learning. , 2011, Physical review letters.

[56]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[57]  Risi Kondor,et al.  Predicting molecular properties with covariant compositional networks. , 2018, The Journal of chemical physics.

[58]  K-R Müller,et al.  SchNet - A deep learning architecture for molecules and materials. , 2017, The Journal of chemical physics.

[59]  Michele Ceriotti,et al.  A Data-Driven Construction of the Periodic Table of the Elements , 2018, 1807.00236.

[60]  J S Smith,et al.  ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost , 2016, Chemical science.

[61]  Y. Okamoto Data sampling scheme for reproducing energies along reaction coordinates in high-dimensional neural network potentials. , 2019, The Journal of chemical physics.

[62]  B. Horst Molecular Descriptors and the Electronic Structure , 2012 .

[63]  J. Stamper A note on the treatment of quadruple excitations in configuration interaction , 1968 .

[64]  Alberto Fabrizio,et al.  Transferable Machine-Learning Model of the Electron Density , 2018, ACS central science.

[65]  David R. Glowacki,et al.  Training neural nets to learn reactive potential energy surfaces using interactive quantum chemistry in virtual reality , 2019, The journal of physical chemistry. A.

[66]  James Theiler,et al.  Accelerated search for materials with targeted properties by adaptive design , 2016, Nature Communications.