Fast classification of small X-ray diffraction datasets using data augmentation and deep neural networks

X-ray diffraction (XRD) for crystal structure characterization is among the most time-consuming and complex steps in the development cycle of novel materials. We propose a machine-learning-enabled approach to predict crystallographic dimensionality and space group from a limited number of experimental thin-film XRD patterns. We overcome the sparse-data problem intrinsic to novel materials development by coupling a supervised machine-learning approach with a physics-based data augmentation strategy . Using this approach, XRD spectrum acquisition and analysis occurs under 5.5 minutes, with accuracy comparable to human expert labeling. We simulate experimental powder diffraction patterns from crystallographic information contained in the Inorganic Crystal Structure Database (ICSD). We train a classification algorithm using a combination of labeled simulated and experimental augmented datasets, which account for thin-film characteristics and measurement noise. As a test case, 88 metal-halide thin films spanning 3 dimensionalities and 7 space-groups are synthesized and classified. The accuracies and throughputs of multiple machine-learning techniques are evaluated, along with the effect of augmented dataset size. The most accurate classification algorithm is found to be a feed-forward deep neural network. The calculated accuracies for dimensionality and space-group classification are comparable to ground-truth labelling by a human expert, approximately 90\% and 85\%, respectively. Additionally, we systematically evaluate the maximum XRD spectrum step size (data acquisition rate) before loss of predictive accuracy occurs, and determine it to be \ang{0.16} $2\theta $, which enables an XRD spectrum to be obtained and analyzed in 5 minutes or less.

[1]  Manuel Moliner,et al.  Design of a full-profile-matching solution for high-throughput analysis of multiphase samples through powder X-ray diffraction. , 2009, Chemistry.

[2]  David B. Mitzi,et al.  Searching for promising new perovskite-based photovoltaic absorbers: the importance of electronic dimensionality , 2017 .

[3]  M. Scheffler,et al.  Insightful classification of crystal structures using deep learning , 2017, Nature Communications.

[4]  M. Johnston,et al.  Formamidinium lead trihalide: a broadly tunable perovskite for efficient planar heterojunction solar cells , 2014 .

[5]  J. Teuscher,et al.  Efficient Hybrid Solar Cells Based on Meso-Superstructured Organometal Halide Perovskites , 2012, Science.

[6]  Jianbin Xu,et al.  Stable and Efficient 3D-2D Perovskite-Perovskite Planar Heterojunction Solar Cell without Organic Hole Transport Layer , 2018, Joule.

[7]  Andrew V. Martin,et al.  Unsupervised classification of single-particle X-ray diffraction snapshots by spectral clustering. , 2011, Optics express.

[8]  W. Park,et al.  Classification of crystal structure using a convolutional neural network , 2017, IUCrJ.

[9]  R. T. Beyer,et al.  Reports on Progress in Physics , 1959 .

[10]  Ichiro Takeuchi,et al.  Unsupervised phase mapping of X-ray diffraction data by nonnegative matrix factorization integrated with custom clustering , 2018, npj Computational Materials.

[11]  Alán Aspuru-Guzik,et al.  Accelerating the discovery of materials for clean energy in the era of smart automation , 2018, Nature Reviews Materials.

[12]  Lioz Etgar,et al.  The merit of perovskite's dimensionality; can this replace the 3D halide perovskite? , 2018 .

[13]  I Takeuchi,et al.  High-throughput determination of structural phase diagram and constituent phases using GRENDEL , 2015, Nanotechnology.

[14]  Christopher Wolverton,et al.  Accelerated discovery of metallic glasses through iteration of machine learning and high-throughput experiments , 2018, Science Advances.

[15]  Manuel Moliner,et al.  A reliable methodology for high throughput identification of a mixture of crystallographic phases from powder X-ray diffraction data , 2008 .

[16]  P. Luksch,et al.  New developments in the Inorganic Crystal Structure Database (ICSD): accessibility in support of materials research and design. , 2002, Acta crystallographica. Section B, Structural science.

[17]  Kee-Sun Sohn,et al.  Combinatorial chemistry of oxynitride phosphors and discovery of a novel phosphor for use in light emitting diodes, Ca1.5Ba0.5Si5N6O3:Eu2+ , 2013 .

[18]  Victor B. Rybakov,et al.  X-ray mapping in heterocyclic design: VI. X-ray diffraction study of 3-(isonicotinoyl)-2-oxooxazolo[3,2-a]pyridine and the product of its hydrolysis , 2002 .

[19]  Yijing Li,et al.  Learning from class-imbalanced data: Review of methods and applications , 2017, Expert Syst. Appl..

[20]  Tonio Buonassisi,et al.  Structural and Chemical Features Giving Rise to Defect Tolerance of Binary Semiconductors , 2018, Chemistry of Materials.

[21]  T. Ida,et al.  Extended pseudo-Voigt function for approximating the Voigt profile , 2000 .

[22]  K. Nagao,et al.  X-ray thin film measurement techniques VII . Pole figure measurement , 2011 .

[23]  Iosif I. Vaisman,et al.  Machine learning approach for structure-based zeolite classification , 2009 .

[24]  Chao Yang,et al.  A convolutional neural network-based screening tool for X-ray serial crystallography , 2018, Journal of synchrotron radiation.

[25]  R. J. Hill,et al.  Quantitative phase analysis from neutron powder diffraction data using the Rietveld method , 1987 .

[26]  Stan Salvador,et al.  FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space , 2004 .