Chemist versus Machine: Traditional Knowledge versus Machine Learning Techniques

Chemical heuristics have been fundamental to the advancement of chemistry and materials science. These heuristics are typically established by scientists using knowledge and creativity to extract patterns from limited datasets. Machine learning offers opportunities to perfect this approach using computers and larger datasets. Here, we discuss the relationships between traditional heuristics and machine learning approaches. We show how traditional rules can be challenged by large-scale statistical assessment and how traditional concepts commonly used as features are feeding the machine learning techniques. We stress the waste involved in relearning chemical rules and the challenges in terms of data size requirements for purely data-driven approaches. Our view is that heuristic and machine learning approaches are at their best when they work together.

[1]  R. Kondor,et al.  On representing chemical environments , 2012, 1209.3140.

[2]  Claudia Draxl,et al.  The NOMAD laboratory: from data sharing to artificial intelligence , 2019, Journal of Physics: Materials.

[3]  H. Behrens Data Import and Validation in the Inorganic Crystal Structure Database , 1996, Journal of research of the National Institute of Standards and Technology.

[4]  G. Rignanese,et al.  ChemEnv: a fast and robust coordination environment identification tool , 2020, Acta crystallographica Section B, Structural science, crystal engineering and materials.

[5]  A. Choudhary,et al.  Perspective: Materials informatics and big data: Realization of the “fourth paradigm” of science in materials science , 2016 .

[6]  L. Pauling THE PRINCIPLES DETERMINING THE STRUCTURE OF COMPLEX IONIC CRYSTALS , 1929 .

[7]  Murray S. Daw,et al.  The embedded-atom method: a review of theory and applications , 1993 .

[9]  Anubhav Jain,et al.  Data mined ionic substitutions for the discovery of new compounds. , 2011, Inorganic chemistry.

[10]  Stefano Curtarolo,et al.  SISSO: A compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates , 2017, Physical Review Materials.

[11]  Geoffroy Hautier,et al.  Combining phonon accuracy with high transferability in Gaussian approximation potential models. , 2020, The Journal of chemical physics.

[12]  Jeffrey C Grossman,et al.  Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties. , 2017, Physical review letters.

[13]  Janet E. Jones On the determination of molecular fields. —II. From the equation of state of a gas , 1924 .

[14]  Chi Chen,et al.  Genetic algorithm-guided deep learning of grain boundary diagrams: Addressing the challenge of five degrees of freedom , 2020, 2002.10632.

[15]  Chi Chen,et al.  Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals , 2018, Chemistry of Materials.

[16]  R. Kondor,et al.  Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons. , 2009, Physical review letters.

[17]  Volker L. Deringer,et al.  Data-driven learning and prediction of inorganic crystal structures. , 2018, Faraday discussions.

[18]  Volker L. Deringer,et al.  Machine learning based interatomic potential for amorphous carbon , 2016, 1611.03277.

[19]  Daniel W. Davies,et al.  Machine learning for molecular and materials science , 2018, Nature.

[20]  Gerbrand Ceder,et al.  A map of the inorganic ternary metal nitrides , 2018, Nature Materials.

[21]  Feliu Maseras,et al.  Managing the Computational Chemistry Big Data Problem: The ioChem-BD Platform , 2015, J. Chem. Inf. Model..

[22]  Kristin A. Persson,et al.  Commentary: The Materials Project: A materials genome approach to accelerating materials innovation , 2013 .

[23]  Tony Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery , 2009 .

[24]  Germany,et al.  Neural network interatomic potential for the phase change material GeTe , 2012, 1201.2026.

[25]  R. Hoffmann,et al.  Atomic and Ionic Radii of Elements 1-96. , 2016, Chemistry.

[26]  M. Marques,et al.  Recent advances and applications of machine learning in solid-state materials science , 2019, npj Computational Materials.

[27]  Stefano Curtarolo,et al.  How the Chemical Composition Alone Can Predict Vibrational Free Energies and Entropies of Solids , 2017, 1703.02309.

[28]  Geoffroy Hautier,et al.  The Limited Predictive Power of the Pauling Rules† , 2020, Angewandte Chemie.

[29]  I. Bruno,et al.  Cambridge Structural Database , 2002 .

[30]  R. Hoffmann,et al.  Squeezing all Elements in the Periodic Table: Electron Configuration and Electronegativity of the Atoms under Compression. , 2019, Journal of the American Chemical Society.

[31]  Anubhav Jain,et al.  Performance of genetic algorithms in search for water splitting perovskites , 2013, Journal of Materials Science.

[32]  Fujio Izumi,et al.  VESTA 3 for three-dimensional visualization of crystal, volumetric and morphology data , 2011 .

[33]  V. M. Goldschmidt,et al.  Die Gesetze der Krystallochemie , 1926, Naturwissenschaften.

[34]  Saulius Gražulis,et al.  Crystallography Open Database – an open-access collection of crystal structures , 2009, Journal of applied crystallography.

[35]  Zhenbin Wang,et al.  Mining Unexplored Chemistries for Phosphors for High-Color-Quality White-Light-Emitting Diodes , 2018 .

[36]  D. G. Pettifor,et al.  A chemical scale for crystal-structure maps , 1984 .

[37]  G. R. Schleder,et al.  From DFT to machine learning: recent approaches to materials science–a review , 2019, Journal of Physics: Materials.

[38]  Feliciano Giustino,et al.  The geometric blueprint of perovskites , 2018, Proceedings of the National Academy of Sciences.

[39]  W. Jensen Electronegativity from Avogadro to Pauling: Part 1: Origins of the Electronegativity Concept , 1996 .

[40]  Y. Noda,et al.  Neural-network interatomic potential for grain boundary structures and their energetics in silicon , 2020 .

[41]  J. Vybíral,et al.  Big data of materials science: critical role of the descriptor. , 2014, Physical review letters.

[42]  Olexandr Isayev,et al.  Predicting Thermal Properties of Crystals Using Machine Learning , 2019, Advanced Theory and Simulations.

[43]  P. Karen Oxidation State, A Long-Standing Issue , 2015, Angewandte Chemie.

[44]  Volker L. Deringer,et al.  Machine Learning Interatomic Potentials as Emerging Tools for Materials Science , 2019, Advanced materials.

[45]  Michele Parrinello,et al.  Generalized neural-network representation of high-dimensional potential-energy surfaces. , 2007, Physical review letters.

[46]  Marco Buongiorno Nardelli,et al.  AFLOWLIB.ORG: A distributed materials properties repository from high-throughput ab initio calculations , 2012 .

[47]  Czech Republic,et al.  Learning physical descriptors for materials science by compressed sensing , 2016, 1612.04285.

[48]  Volker L. Deringer Modelling and understanding battery materials with machine-learning-driven atomistic simulations , 2020, Journal of Physics: Energy.

[49]  D. Zagorac,et al.  Recent developments in the Inorganic Crystal Structure Database: theoretical crystal structure data and related features , 2019, Journal of applied crystallography.

[50]  Kyle Chard,et al.  Matminer: An open source toolkit for materials data mining , 2018, Computational Materials Science.

[51]  G. Rignanese,et al.  Statistical Analysis of Coordination Environments in Oxides , 2017 .

[52]  Anubhav Jain,et al.  Carbonophosphates: A New Family of Cathode Materials for Li-Ion Batteries Identified Computationally , 2012 .

[53]  J. Behler Atom-centered symmetry functions for constructing high-dimensional neural network potentials. , 2011, The Journal of chemical physics.