Warwick Electron Microscopy Datasets

Large, carefully partitioned datasets are essential to train neural networks and standardize performance benchmarks. As a result, we have set up new repositories to make our electron microscopy datasets available to the wider community. There are three main datasets containing 19769 scanning transmission electron micrographs, 17266 transmission electron micrographs, and 98340 simulated exit wavefunctions, and multiple variants of each dataset for different applications. To visualize image datasets, we trained variational autoencoders to encode data as 64-dimensional multivariate normal distributions, which we cluster in two dimensions by t-distributed stochastic neighbor embedding. In addition, we have improved dataset visualization with variational autoencoders by introducing encoding normalization and regularization, adding an image gradient loss, and extending t-distributed stochastic neighbor embedding to account for encoded standard deviations. Our datasets, source code, pretrained models, and interactive visualizations are openly available at this https URL.

[1]  Richard S. Zemel,et al.  Learning Latent Subspaces in Variational Autoencoders , 2018, NeurIPS.

[2]  Cesare Furlanello,et al.  Not again! Data Leakage in Digital Pathology , 2019 .

[3]  Deborah F. Swayne,et al.  Data Visualization With Multidimensional Scaling , 2008 .

[4]  Saulius Gražulis,et al.  Crystallography Open Database – an open-access collection of crystal structures , 2009, Journal of applied crystallography.

[5]  O Anatole von Lilienfeld,et al.  Introducing Machine Learning: Science and Technology , 2020, Mach. Learn. Sci. Technol..

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[8]  M. Shaffer,et al.  Contamination of holey/lacey carbon films in STEM. , 2012, Micron.

[9]  Sam Jackson,et al.  Machine learning and big scientific data , 2019, Philosophical Transactions of the Royal Society A.

[10]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[11]  Sebastian Raschka,et al.  Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning , 2018, ArXiv.

[12]  Stephen Lynch,et al.  Image Processing with Python , 2018 .

[13]  Isaac Amidror,et al.  Sub-Nyquist artefacts and sampling moiré effects , 2015, Royal Society Open Science.

[14]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[15]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[16]  Anastasios Kyrillidis,et al.  Demon: Improved Neural Network Training With Momentum Decay , 2019, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  R. Downs Topology of the pyroxenes as a function of temperature, pressure, and composition as determined from the procrystal electron density , 2003 .

[18]  Sham M. Kakade,et al.  The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure , 2019, NeurIPS.

[19]  Andrius Merkys,et al.  Using SMILES strings for the description of chemical connectivity in the Crystallography Open Database , 2018, Journal of Cheminformatics.

[20]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[21]  Saulius Gražulis,et al.  Computing stoichiometric molecular composition from crystal structures , 2015, Journal of applied crystallography.

[22]  Jeffrey M. Ede Improving Electron Micrograph Signal-to-Noise with an Atrous Convolutional Encoder-Decoder , 2018, Ultramicroscopy.

[23]  C. Chui,et al.  Article in Press Applied and Computational Harmonic Analysis a Randomized Algorithm for the Decomposition of Matrices , 2022 .

[24]  Yiying Wu,et al.  Superconducting MgB2 Nanowires , 2001 .

[25]  Saulius Gražulis,et al.  COD::CIF::Parser: an error-correcting CIF parser for the Perl language , 2016, Journal of applied crystallography.

[26]  J. Fuhrmann Advanced Computing In Electron Microscopy , 2016 .

[27]  J. Llorca,et al.  Effect of layer thickness on the mechanical behaviour of oxidation-strengthened Zr/Nb nanoscale multilayers , 2017, Journal of Materials Science.

[28]  张振跃,et al.  Principal Manifolds and Nonlinear Dimensionality Reduction via Tangent Space Alignment , 2004 .

[29]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[30]  Jeffrey M. Ede,et al.  Adaptive learning rate clipping stabilizes learning , 2019, Mach. Learn. Sci. Technol..

[31]  M. Hutson Artificial intelligence faces reproducibility crisis. , 2018, Science.

[32]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[33]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[34]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[36]  D. Qin,et al.  Individual particles of cryoconite deposited on the mountain glaciers of the Tibetan Plateau: Insights into chemical composition and sources , 2016 .

[37]  Naoya Shibata,et al.  Theoretical framework of statistical noise in scanning transmission electron microscopy. , 2018, Ultramicroscopy.

[38]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[39]  Other Contributors Are Indicated Where They Contribute Python Software Foundation , 2017 .

[40]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[41]  Yongdong Zhang,et al.  Deep Hashing Based on VAE-GAN for Efficient Similarity Retrieval , 2019 .

[42]  D. Blom,et al.  Atomic-level imaging of Mo-V-O complex oxide phase intergrowth, grain boundaries, and defects using HAADF-STEM , 2010, Proceedings of the National Academy of Sciences.

[43]  Ole Winther,et al.  Autoencoding beyond pixels using a learned similarity metric , 2015, ICML.

[44]  Jameel Ahmed,et al.  Content-Based Image Retrieval and Feature Extraction: A Comprehensive Review , 2019, Mathematical Problems in Engineering.

[45]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[46]  M. Baker 1,500 scientists lift the lid on reproducibility , 2016, Nature.

[47]  H. Wills,et al.  Convergent beam electron diffraction , 2007 .

[48]  Michael Gertz,et al.  Intrinsic t-Stochastic Neighbor Embedding for Visualization and Outlier Detection - A Remedy Against the Curse of Dimensionality? , 2017, SISAP.

[49]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[50]  Jeffrey M. Ede,et al.  Exit Wavefunction Reconstruction from Single Transmission Electron Micrographs with Deep Learning [pre-print] , 2020 .

[51]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[52]  Michael Unser,et al.  Convolutional Neural Networks for Inverse Problems in Imaging: A Review , 2017, IEEE Signal Processing Magazine.

[53]  Jeffrey M. Ede Autoencoders, Kernels, and Multilayer Perceptrons for Electron Micrograph Restoration and Compression , 2018, ArXiv.

[54]  R. T. Mathers,et al.  1D vs. 2D shape selectivity in the crystallization-driven self-assembly of polylactide block copolymers† †Electronic supplementary information (ESI) available: Further polymer and nanostructure characterisation. See DOI: 10.1039/c7sc00641a Click here for additional data file. , 2017, Chemical science.

[55]  L. Bendersky,et al.  Electron Diffraction Using Transmission Electron Microscopy , 2001, Journal of research of the National Institute of Standards and Technology.

[56]  Dan Zhou,et al.  Sample Tilt Effects on Atom Column Position Determination in ABF-STEM Imaging , 2016, Microscopy and Microanalysis.

[57]  Peter Moeck,et al.  Crystallography Open Database (COD): an open-access collection of crystal structures and platform for world-wide collaboration , 2011, Nucleic Acids Res..

[58]  M. Kanatzidis,et al.  Cooling of melts: kinetic stabilization and polymorphic transitions in the KInSnSe 4 system. , 2004, Inorganic chemistry.

[59]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[60]  Jeffrey M. Ede,et al.  Exit Wavefunction Reconstruction from Single Transmisson Electron Micrographs with Deep Learning , 2020, ArXiv.

[61]  Max Tegmark,et al.  Why Does Deep and Cheap Learning Work So Well? , 2016, Journal of Statistical Physics.

[62]  Diederik P. Kingma,et al.  An Introduction to Variational Autoencoders , 2019, Found. Trends Mach. Learn..

[63]  Martin Wattenberg,et al.  How to Use t-SNE Effectively , 2016 .

[64]  Ardan Patwardhan,et al.  EMPIAR: a public archive for raw electron microscopy image data , 2016, Nature Methods.

[65]  Tom Minka,et al.  Automatic Choice of Dimensionality for PCA , 2000, NIPS.

[66]  P. Hunter The reproducibility “crisis” , 2017, EMBO reports.

[67]  Christopher. Simons,et al.  Machine learning with Python , 2017 .

[68]  Steven Euijong Whang,et al.  A Survey on Data Collection for Machine Learning: A Big Data - AI Integration Perspective , 2018, IEEE Transactions on Knowledge and Data Engineering.

[69]  Jeffrey M. Ede,et al.  Partial Scanning Transmission Electron Microscopy with Deep Learning , 2020, Scientific Reports.

[70]  F. Allen,et al.  The crystallographic information file (CIF) : a new standard archive file for crystallography , 1991 .

[71]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[72]  M. Marques,et al.  Recent advances and applications of machine learning in solid-state materials science , 2019, npj Computational Materials.

[73]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[74]  A M Russell,et al.  Science and technology. , 1972, Science.

[75]  The Microstructural Characterization of Multiferroic LaFeO3-YMnO3 Multilayers Grown on (001)- and (111)-SrTiO3 Substrates by Transmission Electron Microscopy , 2017, Materials.

[76]  Jeffrey M. Ede Deep Learning Supersampled Scanning Transmission Electron Microscopy , 2019, ArXiv.

[77]  H. Landau Sampling, data transmission, and the Nyquist rate , 1967 .

[78]  B. Schölkopf,et al.  MLLE: Modified Locally Linear Embedding Using Multiple Weights , 2007 .

[79]  S. Haigh,et al.  Recording low and high spatial frequencies in exit wave reconstructions. , 2013, Ultramicroscopy.

[80]  Luis Mateus Rocha,et al.  Singular value decomposition and principal component analysis , 2003 .

[81]  Anastasios Kyrillidis,et al.  Decaying momentum helps neural network training , 2019, ArXiv.

[82]  P. Good Resampling Methods , 1999, Birkhäuser Boston.

[83]  Emmanuelle Gouillart,et al.  scikit-image: image processing in Python , 2014, PeerJ.

[84]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[85]  G. Karlsson Thickness measurements of lacey carbon films , 2001, Journal of microscopy.

[86]  C. Dienemann,et al.  Transcription initiation complex structures elucidate DNA opening , 2016, Nature.

[87]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[88]  Yangyong Zhu,et al.  Towards Data Science , 2015, Data Sci. J..

[89]  R. Downs,et al.  The American Mineralogist crystal structure database , 2003 .

[90]  William McIlhagga,et al.  Estimates of edge detection filters in human vision , 2018, Vision Research.

[91]  Peng Peng,et al.  Unsupervised Anomaly Detection Using Variational Auto-Encoder based Feature Extraction , 2019, 2019 IEEE International Conference on Prognostics and Health Management (ICPHM).

[92]  Olivier Bachem,et al.  Recent Advances in Autoencoder-Based Representation Learning , 2018, ArXiv.

[93]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[94]  Vijayan N. Nair,et al.  A REVIEW AND RECENT DEVELOPMENTS , 2005 .

[95]  Jorge Cadima,et al.  Principal component analysis: a review and recent developments , 2016, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[96]  Mark Adam Dyson Advances in computational methods for transmission electron microscopy simulation and image processing , 2014 .

[97]  P. Nellist,et al.  Unscrambling Mixed Elements using High Angle Annular Dark Field Scanning Transmission Electron Microscopy. , 2016, Physical review letters.

[98]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[99]  Sergei V. Kalinin,et al.  Big data and deep data in scanning and electron microscopies: deriving functionality from multidimensional data sets , 2015, Advanced Structural and Chemical Imaging.

[100]  M. Kanatzidis,et al.  Cooling of Melts: Kinetic Stabilization and Polymorphic Transitions in the KInSnSe4 System. , 2004 .

[101]  Dong Su,et al.  Surface engineering of hierarchical platinum-cobalt nanowires for efficient electrocatalysis , 2016, Nature Communications.

[102]  Yang Feng,et al.  Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications , 2018, WWW.

[103]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[104]  Yuan Cao,et al.  Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.

[105]  B Tsatsa,et al.  [Introduction to electron microscopy]. , 1972, Hellenika stomatologika chronika. Hellenic stomatological annals.