Automating crystal-structure phase mapping by combining deep learning with constraint reasoning

1Department of Computer Science, Cornell University, Ithaca, NY, USA. 2Division of Engineering and Applied Science and Liquid Sunlight Alliance, California Institute of Technology, Pasadena, CA, USA. 3Department of Materials Science and Engineering, Cornell University, Ithaca, NY, USA. ✉e-mail: gregoire@caltech.edu; gomes@cs.cornell.edu Artificial intelligence (AI)1 aims to develop intelligent systems, inspired in part by human intelligence. AI systems are now performing at human and even superhuman levels on a range of tasks, such as image identification2 and face3 and speech recognition4. AI also has the potential to dramatically accelerate scientific discovery5–10. Recent AI achievements have been driven mainly by advances in supervised deep learning11, which requires large labelled datasets to supervise model training. However, in general, scientists do not have large amounts of labelled data for scientific discovery. They often solve complex tasks using only a few data samples by amplifying intuitive pattern recognition with detailed reasoning about prior knowledge to make sense of the data. Such a hybrid strategy has been difficult to automate. In this Article we consider crystal-structure phase mapping, a long-standing challenge in materials science that is emblematic of the class of scientific problems whose automation constitutes a substantial advancement with respect to the grand challenge of high-throughput unsupervised scientific data interpretation. Crystal-structure phase mapping involves separating noisy mixtures of X-ray diffraction (XRD) patterns into the source XRD signals of the corresponding crystal structures, a task for which labelled training data are typically not available. Furthermore, a valid phase diagram of the crystal structures of a given chemical system must satisfy thermodynamics rules (Fig. 1a–f). Here we provide a detailed description of how to formulate phase mapping as an unsupervised pattern demixing problem and how to solve it using deep reasoning networks (DRNets)12. DRNets are a general framework for combining deep learning with constraint reasoning for incorporating scientific prior knowledge. DRNets are designed with an interpretable latent space for encoding the prior-knowledge domain constraints, enabling seamless integration of constraint reasoning into neural network optimization. Constraint reasoning is a particular type of AI reasoning in which axioms and rules are expressed as constraints, and the inference procedure is a search method. The axioms and rules pertaining to a given task comprise the prior knowledge needed to identify valid solutions. In this Article, we show how DRNets require only a modest amount of (unlabelled) data and compensate for the limited data by exploiting and magnifying the rich scientific prior knowledge about the thermodynamic rules that govern the mixtures of crystals. We further provide insights concerning the interpretability and scalability of DRNets, as well as the role of data and the different DRNets’ modules, through a series of ablation studies. DRNets make this crystal-structure phase mapping advancement by combining learning with constraint reasoning, emulating the analysis of expert scientists and enabling interpretation of complex systems in high-dimensional composition spaces. Given the scientific complexity of crystal-structure phase mapping, we provide an initial intuitive explanation of the DRNets framework based on Multi-MNIST-Sudoku12, a variant of the Sudoku game that involves demixing two completed overlapping hand-written Sudokus (Fig. 1g–i). To demonstrate the scalability of DRNets, we also consider 9 × 9 Sudoku instances combining both digits and letters, beyond the 4 × 4 multi-MNIST-Sudoku instances Automating crystal-structure phase mapping by combining deep learning with constraint reasoning

[1]  Di Chen,et al.  End-to-End Learning for the Deep Multivariate Probit Model , 2018, ICML.

[2]  T. Yato,et al.  Complexity and Completeness of Finding Another Solution and Its Application to Puzzles , 2003, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[3]  Learning from the big picture , 2018, Nature Materials.

[4]  Toby Walsh,et al.  Handbook of Constraint Programming , 2006, Handbook of Constraint Programming.

[5]  Ning Zhang,et al.  Weakly Supervised Audio Source Separation via Spectrum Energy Preserved Wasserstein Learning , 2018, IJCAI.

[6]  Paul A. Midgley,et al.  Multicomponent Signal Unmixing from Nanoheterostructures: Overcoming the Traditional Challenges of Nanoscale X-ray Analysis via Machine Learning , 2015, Nano letters.

[7]  Alfred Ludwig,et al.  Discovery of new materials using combinatorial synthesis and high-throughput characterization of thin-film materials libraries combined with computational methods , 2019, npj Computational Materials.

[8]  Jure Leskovec,et al.  GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models , 2018, ICML.

[9]  Thomas G. Dietterich,et al.  The eBird enterprise: An integrated approach to development and application of citizen science , 2014 .

[10]  Savitha Ramasamy,et al.  Fast and interpretable classification of small X-ray diffraction datasets using data augmentation and deep neural networks , 2018, npj Computational Materials.

[11]  Gregory Cohen,et al.  EMNIST: Extending MNIST to handwritten letters , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[12]  Gregory Cohen,et al.  EMNIST: an extension of MNIST to handwritten letters , 2017, CVPR 2017.

[13]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[14]  John M. Gregoire,et al.  Materials Representation and Transfer Learning for Multi-Property Prediction , 2021, Applied Physics Reviews.

[15]  Ronan Le Bras,et al.  Constraint Reasoning and Kernel Clustering for Pattern Decomposition with Scaling , 2011, CP.

[16]  Yexiang Xue,et al.  Deep Multi-species Embedding , 2016, IJCAI.

[17]  PHASE DIAGRAM DETERMINATION OF CERAMIC SYSTEMS , 2007 .

[18]  I Takeuchi,et al.  High-throughput determination of structural phase diagram and constituent phases using GRENDEL , 2015, Nanotechnology.

[19]  Luis C. Lamb,et al.  Neurosymbolic AI: the 3rd wave , 2020, Artificial Intelligence Review.

[20]  Eric P. Xing,et al.  Harnessing Deep Neural Networks with Logic Rules , 2016, ACL.

[21]  John M. Gregoire,et al.  Deep Reasoning Networks for Unsupervised Pattern De-mixing with Constraint Reasoning , 2020, ICML.

[22]  Alán Aspuru-Guzik,et al.  Inverse molecular design using machine learning: Generative models for matter engineering , 2018, Science.

[23]  V Elser,et al.  Searching with iterated maps , 2007, Proceedings of the National Academy of Sciences.

[24]  Ronan Le Bras,et al.  Challenges in Materials Discovery - Synthetic Generator and Real Datasets , 2014, AAAI.

[25]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[26]  Veit Elser,et al.  Divide and concur: a general approach to constraint satisfaction. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[27]  Ichiro Takeuchi,et al.  Fulfilling the promise of the materials genome initiative with high-throughput experimental methodologies , 2017 .

[28]  Leonid Kruglyak,et al.  Rise of the Machines , 2008, PLoS genetics.

[29]  I Takeuchi,et al.  Rapid identification of structural phases in combinatorial thin-film libraries using x-ray diffraction and non-negative matrix factorization. , 2009, The Review of scientific instruments.

[30]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Ichiro Takeuchi,et al.  Unsupervised phase mapping of X-ray diffraction data by nonnegative matrix factorization integrated with custom clustering , 2018, npj Computational Materials.

[32]  Alán Aspuru-Guzik,et al.  Accelerating the discovery of materials for clean energy in the era of smart automation , 2018, Nature Reviews Materials.

[33]  Yingli Tian,et al.  Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[35]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  R. V. Dover,et al.  CRYSTAL: a multi-agent AI system for automated mapping of materials’ crystal structures , 2019, MRS Communications.

[38]  Brian L. DeCost,et al.  Accelerated Development of Perovskite-Inspired Materials via High-Throughput Synthesis and Machine-Learning Diagnosis , 2018, Joule.

[39]  S. Suram,et al.  High-throughput synchrotron X-ray diffraction for combinatorial phase mapping. , 2014, Journal of synchrotron radiation.

[40]  Bart Selman,et al.  Boosting Combinatorial Search Through Randomization , 1998, AAAI/IAAI.

[41]  W. Park,et al.  A deep-learning technique for phase identification in multiphase inorganic compounds using synthetic XRD powder patterns , 2020, Nature Communications.

[42]  W. Park,et al.  Classification of crystal structure using a convolutional neural network , 2017, IUCrJ.

[43]  B. Lindsay Mixture models : theory, geometry, and applications , 1995 .

[44]  Yan Zhang,et al.  Generalized machine learning technique for automatic phase attribution in time variant high-throughput experimental studies , 2015 .

[45]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[46]  Jelena Stajic,et al.  Artificial intelligence. Rise of the Machines. , 2015, Science.

[47]  Brian L. DeCost,et al.  On-the-fly closed-loop materials discovery via Bayesian active learning , 2020, Nature Communications.

[48]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[49]  Materials representation and transfer learning for multi-property prediction , 2021, Applied Physics Reviews.

[50]  N. Pettorelli,et al.  A Horizon Scan of Emerging Issues for Global Conservation in 2019. , 2019, Trends in ecology & evolution.

[52]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[53]  Guy Van den Broeck,et al.  A Semantic Loss Function for Deep Learning with Symbolic Knowledge , 2017, ICML.

[54]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.