Machine learning for molecular and materials science

Here we summarize recent progress in machine learning for the chemical sciences. We outline machine-learning techniques that are suitable for addressing research questions in this domain, as well as future directions for the field. We envisage a future in which the design, synthesis, characterization and application of molecules and materials is accelerated by artificial intelligence.Recent progress in machine learning in the chemical sciences and future directions in this field are discussed.

[1]  P. Dirac Quantum Mechanics of Many-Electron Systems , 1929 .

[2]  C. Hansch,et al.  p-σ-π Analysis. A Method for the Correlation of Biological Activity and Chemical Structure , 1964 .

[3]  P. Hohenberg,et al.  Inhomogeneous Electron Gas , 1964 .

[4]  W. Kohn,et al.  Self-Consistent Equations Including Exchange and Correlation Effects , 1965 .

[5]  E J Corey,et al.  Computer-assisted design of complex organic syntheses. , 1969, Science.

[6]  David A. Pensak,et al.  LHASA—Logic and Heuristics Applied to Synthetic Analysis , 1977 .

[7]  Paul C. van Oorschot,et al.  Introduction and Fundamentals , 2010 .

[8]  S. Segawa,et al.  End of the beginning , 1990, Nature.

[9]  D. Bonchev Chemical Graph Theory: Introduction and Fundamentals , 1991 .

[10]  Christoph Kuhn,et al.  Inverse Strategies for Molecular Design , 1996 .

[11]  A. Steane Quantum Computing , 1997, quant-ph/9708022.

[12]  N. N. Kiselyova,et al.  Computational materials design using artificial intelligence methods , 1998 .

[13]  John A Pople Quantum Chemical Models (Nobel Lecture). , 1999, Angewandte Chemie.

[14]  Alex Zunger,et al.  The inverse band-structure problem of finding an atomic configuration with given electronic properties , 1999, Nature.

[15]  D. Hand,et al.  Idiot's Bayes—Not So Stupid After All? , 2001 .

[16]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[17]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2006 .

[18]  Trevor Darrell,et al.  Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing) , 2006 .

[19]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[20]  M. Head‐Gordon,et al.  Simulated Quantum Computation of Molecular Energies , 2005, Science.

[21]  Tudor I. Oprea,et al.  Target, chemical and bioactivity databases – integration is key , 2006 .

[22]  P. Mahadevan,et al.  An overview , 2007, Journal of Biosciences.

[23]  Simon J L Billinge,et al.  The Problem with Determining Atomic Structure at the Nanoscale , 2007, Science.

[24]  Victor Hugo C. de Albuquerque,et al.  A new solution for automatic microstructures analysis from images based on a backpropagation artificial neural network , 2008 .

[25]  Hod Lipson,et al.  Distilling Free-Form Natural Laws from Experimental Data , 2009, Science.

[26]  Matthias Scheffler,et al.  Efficient O(N) integration for all-electron electronic structure calculation using numeric basis functions , 2009, J. Comput. Phys..

[27]  Vili Podgorelec,et al.  Decision trees , 2018, Encyclopedia of Database Systems.

[28]  A. Harrow,et al.  Quantum algorithm for linear systems of equations. , 2008, Physical review letters.

[29]  Alexander Tropsha,et al.  Best Practices for QSAR Model Development, Validation, and Exploitation , 2010, Molecular informatics.

[30]  Alexander Tropsha,et al.  Trust, But Verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling Research , 2010, J. Chem. Inf. Model..

[31]  Anubhav Jain,et al.  Finding Nature’s Missing Ternary Oxide Compounds Using Machine Learning and Density Functional Theory , 2010 .

[32]  R. Kondor,et al.  Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons. , 2009, Physical review letters.

[33]  Lior Rokach,et al.  Classification Trees , 2010, Data Mining and Knowledge Discovery Handbook.

[34]  John R. Proudfoot,et al.  Faculty Opinions recommendation of Trust, but verify: on the importance of chemical structure curation in cheminformatics and QSAR modeling research. , 2010 .

[35]  P. Popelier,et al.  Potential energy surfaces fitted by artificial neural networks. , 2010, The journal of physical chemistry. A.

[36]  Todd A. Brun,et al.  Quantum Computing , 2011, Computer Science, The Hardware, Software and Heart of It.

[37]  Alán Aspuru-Guzik,et al.  The Harvard Clean Energy Project: Large-Scale Computational Screening and Design of Organic Photovoltaics on the World Community Grid , 2011 .

[38]  Norbert Jankowski,et al.  Meta-Learning in Computational Intelligence , 2013, Meta-Learning in Computational Intelligence.

[39]  Héléna A. Gaspar,et al.  Generative Topographic Mapping (GTM): Universal Tool for Data Visualization, Structure‐Activity Modeling and Dataset Comparison , 2012, Molecular informatics.

[40]  Klaus-Robert Müller,et al.  Finding Density Functionals with Machine Learning , 2011, Physical review letters.

[41]  Thomas Bligaard,et al.  Density functionals for surface science: Exchange-correlation model development with Bayesian error estimation , 2012 .

[42]  K. Müller,et al.  Fast and accurate modeling of molecular atomization energies with machine learning. , 2011, Physical review letters.

[43]  Marco Buongiorno Nardelli,et al.  The high-throughput highway to computational materials design. , 2013, Nature materials.

[44]  K. Schwab The Fourth Industrial Revolution , 2013 .

[45]  Aron Walsh,et al.  Computational Approaches to Energy Materials , 2013 .

[46]  Muratahan Aykol,et al.  Materials Design and Discovery with High-Throughput Density Functional Theory: The Open Quantum Materials Database (OQMD) , 2013 .

[47]  Sanguthevar Rajasekaran,et al.  Accelerating materials property predictions using machine learning , 2013, Scientific Reports.

[48]  Kristin A. Persson,et al.  Commentary: The Materials Project: A materials genome approach to accelerating materials innovation , 2013 .

[49]  Donald B. Boyd Quantum Chemistry Program Exchange, Facilitator of Theoretical and Computational Chemistry in Pre-Internet History , 2013 .

[50]  Chris-Kriton Skylaris,et al.  Hybrid MPI-OpenMP Parallelism in the ONETEP Linear-Scaling Electronic Structure Code: Application to the Delamination of Cellulose Nanofibrils. , 2014, Journal of chemical theory and computation.

[51]  Kristof T. Schütt,et al.  How to represent crystal structures for machine learning: Towards fast prediction of electronic properties , 2013, 1307.1266.

[52]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[53]  Michiaki Arita,et al.  Stable and Efficient Linear Scaling First-Principles Molecular Dynamics for 10000+ Atoms. , 2014, Journal of chemical theory and computation.

[54]  Jerome G. P. Wicker,et al.  Will it crystallise? Predicting crystallinity of molecular materials , 2015 .

[55]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[56]  John J. Irwin,et al.  ZINC 15 – Ligand Discovery for Everyone , 2015, J. Chem. Inf. Model..

[57]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[58]  J. Vybíral,et al.  Big data of materials science: critical role of the descriptor. , 2014, Physical review letters.

[59]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[60]  Marco Buongiorno Nardelli,et al.  The AFLOW standard for high-throughput materials science calculations , 2015, 1506.00303.

[61]  Sergei V. Kalinin,et al.  Big-deep-smart data in imaging for guiding materials design. , 2015, Nature materials.

[62]  Tejs Vegge,et al.  Identifying systematic DFT errors in catalytic reactions , 2015 .

[63]  Aron Walsh,et al.  Inorganic materials: The quest for new functionality. , 2015, Nature chemistry.

[64]  Muratahan Aykol,et al.  The Open Quantum Materials Database (OQMD): assessing the accuracy of DFT formation energies , 2015 .

[65]  W. Alkema,et al.  Application of text mining in the biomedical domain. , 2015, Methods.

[66]  A. Choudhary,et al.  Perspective: Materials informatics and big data: Realization of the “fourth paradigm” of science in materials science , 2016 .

[67]  Olexandr Isayev,et al.  Material informatics driven design and experimental validation of lead titanate as an aqueous solar photocathode , 2016 .

[68]  Felix A Faber,et al.  Machine Learning Energies of 2 Million Elpasolite (ABC_{2}D_{6}) Crystals. , 2015, Physical review letters.

[69]  Taylor D. Sparks,et al.  High-Throughput Machine-Learning-Driven Synthesis of Full-Heusler Compounds , 2016 .

[70]  Wojciech Czarnecki,et al.  Learning to SMILE(S) , 2016, ArXiv.

[71]  Piotr Dittwald,et al.  Computer-Assisted Synthetic Planning: The End of the Beginning. , 2016, Angewandte Chemie.

[72]  Juno Nam,et al.  Linking the Neural Machine Translation and the Prediction of Organic Chemistry Reactions , 2016, ArXiv.

[73]  Paul Raccuglia,et al.  Machine-learning-assisted materials discovery using failed experiments , 2016, Nature.

[74]  B. Meredig,et al.  Materials science with large-scale data and informatics: Unlocking new opportunities , 2016 .

[75]  Ryan P. Adams,et al.  Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach. , 2016, Nature materials.

[76]  Aron Walsh,et al.  Computational Screening of All Stoichiometric Inorganic Materials , 2016, Chem.

[77]  Hans-J. Briegel,et al.  Quantum-enhanced machine learning , 2016, Physical review letters.

[78]  Roger G. Melko,et al.  Machine learning phases of matter , 2016, Nature Physics.

[79]  M. Head‐Gordon,et al.  ωB97M-V: A combinatorially optimized, range-separated hybrid, meta-GGA density functional with VV10 nonlocal correlation. , 2016, The Journal of chemical physics.

[80]  Stefano de Gironcoli,et al.  Reproducibility in density functional theory calculations of solids , 2016, Science.

[81]  Patrick McCabe,et al.  Generation of crystal structures using known crystal structures as analogues , 2016, Acta crystallographica Section B, Structural science, crystal engineering and materials.

[82]  Steven L. Brunton,et al.  Data-driven discovery of partial differential equations , 2016, Science Advances.

[83]  Igor V Tetko,et al.  Public (Q)SAR Services, Integrated Modeling Environments, and Model Repositories on the Web: State of the Art and Perspectives for Future Development , 2017, Molecular informatics.

[84]  Alán Aspuru-Guzik,et al.  Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models , 2017, ArXiv.

[85]  Andrew I. Cooper,et al.  Functional materials discovery using energy–structure–function maps , 2017, Nature.

[86]  Leroy Cronin,et al.  An autonomous organic reaction search engine for chemical reactivity , 2017, Nature Communications.

[87]  Andreas Trabesinger Quantum leaps, bit by bit , 2017, Nature.

[88]  Li Li,et al.  Bypassing the Kohn-Sham equations with machine learning , 2016, Nature Communications.

[89]  M. Troyer,et al.  Elucidating reaction mechanisms on quantum computers , 2016, Proceedings of the National Academy of Sciences.

[90]  Vijay S. Pande,et al.  Low Data Drug Discovery with One-Shot Learning , 2016, ACS central science.

[91]  Cormac Toher,et al.  Universal fragment descriptors for predicting properties of inorganic crystals , 2016, Nature Communications.

[92]  Marcin Andrychowicz,et al.  One-Shot Imitation Learning , 2017, NIPS.

[93]  Alok Choudhary,et al.  Including crystal structure attributes in machine learning models of formation energies via Voronoi tessellations , 2017 .

[94]  Marwin H. S. Segler,et al.  Neural-Symbolic Machine Learning for Retrosynthesis and Reaction Prediction. , 2017, Chemistry.

[95]  J. Behler First Principles Neural Network Potentials for Reactive Simulations of Large Molecular and Condensed Systems. , 2017, Angewandte Chemie.

[96]  Maxim Ziatdinov,et al.  Learning surface molecular structures via machine vision , 2017, npj Computational Materials.

[97]  Bowen Liu,et al.  Retrosynthetic Reaction Prediction Using Neural Sequence-to-Sequence Models , 2017, ACS central science.

[98]  Jacob biamonte,et al.  Quantum machine learning , 2016, Nature.

[99]  George E. Dahl,et al.  Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error. , 2017, Journal of chemical theory and computation.

[100]  Jerome G. P. Wicker,et al.  A publicly available crystallisation data set and its application in machine learning , 2017 .

[101]  A. McCallum,et al.  Materials Synthesis Insights from Scientific Literature via Text Extraction and Machine Learning , 2017 .

[102]  J S Smith,et al.  ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost , 2016, Chemical science.

[103]  François-Xavier Coudert Reproducible Research in Computational Chemistry of Materials , 2017 .

[104]  Atsuto Seko,et al.  Descriptors for Machine Learning of Materials Data , 2017, 1709.01666.

[105]  Piotr Dittwald,et al.  Efficient Syntheses of Diverse, Medicinally Relevant Targets Planned by Computer and Executed in the Laboratory , 2018 .

[106]  Natalio Mingo,et al.  Materials Screening for the Discovery of New Half-Heuslers: Machine Learning versus ab Initio Methods. , 2017, The journal of physical chemistry. B.

[107]  Mike Preuss,et al.  Planning chemical syntheses with deep neural networks and symbolic AI , 2017, Nature.

[108]  Yaohua Liu,et al.  Volumetric Data Exploration with Machine Learning-Aided Visualization in Neutron Science , 2019, CVC.

[109]  F. Racioppi,et al.  Unlocking new opportunities , 2022 .