Learning to Make Chemical Predictions: the Interplay of Feature Representation, Data, and Machine Learning Algorithms

Recently supervised machine learning has been ascending in providing new predictive approaches for chemical, biological and materials sciences applications. In this Perspective we focus on the interplay of machine learning algorithm with the chemically motivated descriptors and the size and type of data sets needed for molecular property prediction. Using Nuclear Magnetic Resonance chemical shift prediction as an example, we demonstrate that success is predicated on the choice of feature extracted or real-space representations of chemical structures, whether the molecular property data is abundant and/or experimentally or computationally derived, and how these together will influence the correct choice of popular machine learning algorithms drawn from deep learning, random forests, or kernel methods.

[1]  Markus Meuwly,et al.  Reactive molecular dynamics for the [Cl–CH3–Br]− reaction in the gas phase and in solution: a comparative study using empirical and neural network force fields , 2019, Electronic Structure.

[2]  Regina Barzilay,et al.  Analyzing Learned Molecular Representations for Property Prediction , 2019, J. Chem. Inf. Model..

[3]  R. Kondor,et al.  On representing chemical environments , 2012, 1209.3140.

[4]  Christoph Ortner,et al.  Incompleteness of Atomic Structure Representations. , 2020, Physical review letters.

[5]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[6]  X. Y. Zhang,et al.  Application of support vector machine (SVM) for prediction toxic activity of different data sets. , 2006, Toxicology.

[7]  Mark Asta,et al.  NMR Crystallography: Evaluation of Hydrogen Positions in Hydromagnesite by 13 C{1 H} REDOR Solid-State NMR and Density Functional Theory Calculation of Chemical Shielding Tensors. , 2019, Angewandte Chemie.

[8]  Jie Li,et al.  Accurate Prediction of Chemical Shifts for Aqueous Protein Structure for "Real World" Cases using Machine Learning , 2019, 1912.02735.

[9]  Jouko Yliruusi,et al.  Prediction of physicochemical properties based on neural network modelling. , 2003, Advanced drug delivery reviews.

[10]  Alán Aspuru-Guzik,et al.  Inverse molecular design using machine learning: Generative models for matter engineering , 2018, Science.

[11]  Teresa Head-Gordon,et al.  A monte carlo method for generating side chain structural ensembles. , 2015, Structure.

[12]  Francesco Mauri,et al.  All-electron magnetic response with pseudopotentials: NMR chemical shifts , 2001 .

[13]  Kilian Q. Weinberger,et al.  Convolutional Networks with Dense Connectivity , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[15]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[16]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17]  Michele Ceriotti,et al.  Chemical shifts in molecular solids by machine learning , 2018, Nature Communications.

[18]  Alexander Hexemer,et al.  A Multi-Resolution 3D-DenseNet for Chemical Shift Prediction in NMR Crystallography. , 2019, The journal of physical chemistry letters.

[19]  M. Rupp,et al.  Machine Learning for Quantum Mechanical Properties of Atoms in Molecules , 2015, 1505.00350.

[20]  K-R Müller,et al.  SchNet - A deep learning architecture for molecules and materials. , 2017, The Journal of chemical physics.

[21]  Michele Parrinello,et al.  Generalized neural-network representation of high-dimensional potential-energy surfaces. , 2007, Physical review letters.

[22]  George E. Dahl,et al.  Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error. , 2017, Journal of chemical theory and computation.

[23]  Nikos Paragios,et al.  EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation , 2017, PeerJ.

[24]  Wei-keng Liao,et al.  Enhancing materials property prediction by leveraging computational and experimental data using deep transfer learning , 2019, Nature Communications.

[25]  Mohammad Atif Faiz Afzal,et al.  Building and deploying a cyberinfrastructure for the data-driven design of chemical systems and the exploration of chemical space , 2018 .

[26]  Mojtaba Haghighatlari,et al.  Advances of machine learning in molecular modeling and simulation , 2019, Current Opinion in Chemical Engineering.

[27]  Chris J Pickard,et al.  Ab Initio Quality NMR Parameters in Solid-State Materials Using a High-Dimensional Neural-Network Representation. , 2016, Journal of chemical theory and computation.

[28]  Anders S Christensen,et al.  FCHL revisited: Faster and more accurate quantum machine learning. , 2020, The Journal of chemical physics.

[29]  David R. Glowacki,et al.  Training neural nets to learn reactive potential energy surfaces using interactive quantum chemistry in virtual reality , 2019, The journal of physical chemistry. A.

[30]  Jean Ponce,et al.  Computer Vision: A Modern Approach , 2002 .

[31]  Daniel W. Davies,et al.  Machine learning for molecular and materials science , 2018, Nature.

[32]  Krishna Rajan,et al.  Informatics for Materials Science and Engineering: Data-Driven Discovery for Accelerated Experimentation and Application , 2013 .

[33]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[34]  Vijay S. Pande,et al.  Molecular graph convolutions: moving beyond fingerprints , 2016, Journal of Computer-Aided Molecular Design.

[35]  Russ B. Altman,et al.  3D deep convolutional neural networks for amino acid environment similarity analysis , 2017, BMC Bioinformatics.

[36]  A. Bax,et al.  SPARTA+: a modest improvement in empirical NMR chemical shift prediction by means of an artificial neural network , 2010, Journal of biomolecular NMR.

[37]  Thomas F. Miller,et al.  Transferability in Machine Learning for Electronic Structure via the Molecular Orbital Basis. , 2018, Journal of chemical theory and computation.

[38]  Demis Hassabis,et al.  Improved protein structure prediction using potentials from deep learning , 2020, Nature.

[39]  Yihang Wang,et al.  Machine learning approaches for analyzing and enhancing molecular dynamics simulations. , 2019, Current opinion in structural biology.

[40]  Pavlo O. Dral,et al.  Quantum chemistry structures and properties of 134 kilo molecules , 2014, Scientific Data.

[41]  David R. Glowacki,et al.  IMPRESSION – prediction of NMR parameters for 3-dimensional chemical structures using machine learning with near quantum chemical accuracy† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c9sc03854j , 2019, Chemical science.

[42]  Simon W. Ginzinger,et al.  SHIFTX2: significantly improved protein chemical shift prediction , 2011, Journal of biomolecular NMR.

[43]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[44]  Christopher A. Hunter,et al.  Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction , 2018, ACS central science.

[45]  Mohammed AlQuraishi,et al.  End-to-end differentiable learning of protein structure , 2018, bioRxiv.

[46]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[47]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[48]  Kipton Barros,et al.  Discovering a Transferable Charge Assignment Model Using Machine Learning. , 2018, The journal of physical chemistry letters.

[49]  Jörg Behler,et al.  Accurate Neural Network Description of Surface Phonons in Reactive Gas–Surface Dynamics: N2 + Ru(0001) , 2017, The journal of physical chemistry letters.

[50]  Klaus-Robert Müller,et al.  sGDML: Constructing accurate and data efficient molecular force fields using machine learning , 2018, Comput. Phys. Commun..

[51]  Ribana Roscher,et al.  Explainable Machine Learning for Scientific Insights and Discoveries , 2019, IEEE Access.

[52]  Y. Koyama,et al.  Predicting Materials Properties with Little Data Using Shotgun Transfer Learning , 2019, ACS central science.

[53]  Srirangaraj Setlur,et al.  ChemML: A machine learning and informatics program package for the analysis, mining, and modeling of chemical and materials data , 2019, WIREs Computational Molecular Science.

[54]  Tanja Kortemme,et al.  Designing ensembles in conformational and sequence space to characterize and engineer proteins. , 2010, Current opinion in structural biology.

[55]  Chi Chen,et al.  Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals , 2018, Chemistry of Materials.

[56]  Kipton Barros,et al.  Approaching coupled cluster accuracy with a general-purpose neural network potential through transfer learning , 2019, Nature Communications.