Learning atoms for materials discovery

Significance Motivated by the recent achievements of artificial intelligence (AI) in linguistics, we design AI to learn properties of atoms from materials data on its own. Our work realizes knowledge representation of atoms via computers and could serve as a foundational step toward materials discovery and design fully based on machine learning. Exciting advances have been made in artificial intelligence (AI) during recent decades. Among them, applications of machine learning (ML) and deep learning techniques brought human-competitive performances in various tasks of fields, including image recognition, speech recognition, and natural language understanding. Even in Go, the ancient game of profound complexity, the AI player has already beat human world champions convincingly with and without learning from the human. In this work, we show that our unsupervised machines (Atom2Vec) can learn the basic properties of atoms by themselves from the extensive database of known compounds and materials. These learned properties are represented in terms of high-dimensional vectors, and clustering of atoms in vector space classifies them into meaningful groups consistent with human knowledge. We use the atom vectors as basic input units for neural networks and other ML models designed and trained to predict materials properties, which demonstrate significant accuracy.

[1]  Felix A Faber,et al.  Crystal structure representations for machine learning models of formation energies , 2015, 1503.07406.

[2]  N. N. Greenwood,et al.  Chemistry of the elements , 1984 .

[3]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[4]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[5]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[6]  Byoungwoo Kang,et al.  Battery materials for ultrafast charging and discharging , 2009, Nature.

[7]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[8]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[9]  Kristin A. Persson,et al.  Commentary: The Materials Project: A materials genome approach to accelerating materials innovation , 2013 .

[10]  Claudia Felser,et al.  Tunable multifunctional topological insulators in ternary Heusler compounds. , 2010, Nature materials.

[11]  Marco Buongiorno Nardelli,et al.  The high-throughput highway to computational materials design. , 2013, Nature materials.

[12]  A. Choudhary,et al.  Perspective: Materials informatics and big data: Realization of the “fourth paradigm” of science in materials science , 2016 .

[13]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14]  Samuel S. Schoenholz,et al.  Fast machine learning models of electronic and energetic properties consistently reach approximation errors better than DFT accuracy , 2017 .

[15]  Alexandre Tkatchenko,et al.  Quantum-chemical insights from deep tensor neural networks , 2016, Nature Communications.

[16]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[17]  J. Nørskov,et al.  Towards the computational design of solid catalysts. , 2009, Nature chemistry.

[18]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[19]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[20]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[21]  Vijay S. Pande,et al.  Molecular graph convolutions: moving beyond fingerprints , 2016, Journal of Computer-Aided Molecular Design.

[22]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[23]  Shay B. Cohen,et al.  Advances in Neural Information Processing Systems 25 , 2012, NIPS 2012.

[24]  J. Glodo,et al.  Tl2LiYCl6:Ce: A New Elpasolite Scintillator , 2016, IEEE Transactions on Nuclear Science.

[25]  Felix A Faber,et al.  Machine Learning Energies of 2 Million Elpasolite (ABC_{2}D_{6}) Crystals. , 2015, Physical review letters.

[26]  Alok Choudhary,et al.  Combinatorial screening for new materials in unconstrained composition space with machine learning , 2014 .

[27]  Paul Raccuglia,et al.  Machine-learning-assisted materials discovery using failed experiments , 2016, Nature.

[28]  K. Müller,et al.  Fast and accurate modeling of molecular atomization energies with machine learning. , 2011, Physical review letters.

[29]  J. Vybíral,et al.  Big data of materials science: critical role of the descriptor. , 2014, Physical review letters.

[30]  Ryan P. Adams,et al.  Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach. , 2016, Nature materials.

[31]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[32]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[33]  James Theiler,et al.  Accelerated search for materials with targeted properties by adaptive design , 2016, Nature Communications.

[34]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[35]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .