Two-stage dimension reduction for noisy high-dimensional images and application to Cryogenic Electron Microscopy

Principal component analysis (PCA) is arguably the most widely used dimension-reduction method for vector-type data. When applied to a sample of images, PCA requires vectorization of the image data, which in turn entails solving an eigenvalue problem for the sample covariance matrix. We propose herein a two-stage dimension reduction (2SDR) method for image reconstruction from high-dimensional noisy image data. The first stage treats the image as a matrix, which is a tensor of order 2, and uses multilinear principal component analysis (MPCA) for matrix rank reduction and image denoising. The second stage vectorizes the reduced-rank matrix and achieves further dimension and noise reduction. Simulation studies demonstrate excellent performance of 2SDR, for which we also develop an asymptotic theory that establishes consistency of its rank selection. Applications to cryo-EM (cryogenic electronic microscopy), which has revolutionized structural biology, organic and medical chemistry, cellular and molecular physiology in the past decade, are also provided and illustrated with benchmark cryo-EM datasets. Connections to other contemporaneous developments in image reconstruction and high-dimensional statistical inference are also discussed.

[1]  Pawel A Penczek,et al.  Iterative stable alignment and clustering of 2D transmission electron microscope images. , 2012, Structure.

[2]  Rodrigo Villares Portugal,et al.  Multivariate Statistical Analysis of Large Datasets: Single Particle Electron Microscopy , 2016 .

[3]  Joos Vandewalle,et al.  On the Best Rank-1 and Rank-(R1 , R2, ... , RN) Approximation of Higher-Order Tensors , 2000, SIAM J. Matrix Anal. Appl..

[4]  J. Frank Three-Dimensional Electron Microscopy of Macromolecular Assemblies: Visualization of Biological Molecules in Their Native State , 1996 .

[5]  Alessandro Foi,et al.  Image Denoising by Sparse 3-D Transform-Domain Collaborative Filtering , 2007, IEEE Transactions on Image Processing.

[6]  S. Eguchi,et al.  Robust parameter estimation with a small bias against heavy contamination , 2008 .

[7]  J M Carazo,et al.  A clustering approach to multireference alignment of single-particle projections in electron microscopy. , 2010, Journal of structural biology.

[8]  H. Stark,et al.  Structure and Conformational Dynamics of the Human Spliceosomal Bact Complex , 2018, Cell.

[9]  Martin Vetterli,et al.  Adaptive wavelet thresholding for image denoising and compression , 2000, IEEE Trans. Image Process..

[10]  C. Stein Estimation of the Mean of a Multivariate Normal Distribution , 1981 .

[11]  Marek Kimmel,et al.  Identifying conformational states of macromolecules by eigen-analysis of resampled cryo-EM images. , 2011, Structure.

[12]  G. Kitagawa,et al.  Generalised information criteria in model selection , 1996 .

[13]  Sjors H.W. Scheres,et al.  RELION: Implementation of a Bayesian approach to cryo-EM structure determination , 2012, Journal of structural biology.

[14]  F. Chung,et al.  Higher eigenvalues and isoperimetric inequalities on Riemannian manifolds and graphs , 2000 .

[15]  Zhizhen Zhao,et al.  Rotationally Invariant Image Representation for Viewing Direction Classification in Cryo-EM , 2013, Journal of structural biology.

[16]  Joachim Frank,et al.  Use of multivariate statistics in analysing the images of biological macromolecules , 1981 .

[17]  Shun-ichi Amari,et al.  Geometry of q-Exponential Family of Probability Distributions , 2011, Entropy.

[18]  Andrew W. Mead Review of the Development of Multidimensional Scaling Methods , 1992 .

[19]  Victor Y. Pan,et al.  The complexity of the matrix eigenproblem , 1999, STOC '99.

[20]  Xuelong Li,et al.  Probabilistic Tensor Analysis with Akaike and Bayesian Information Criteria , 2007, ICONIP.

[21]  C. Ing,et al.  A generalized information criterion for high-dimensional PCA rank selection , 2020, Statistical Papers.

[22]  Ting-Li Chen,et al.  On the strengths of the self-updating process clustering algorithm , 2012 .

[23]  N. Altman,et al.  On dimension folding of matrix- or array-valued statistical objects , 2010, 1002.4789.

[24]  Amit Singer,et al.  Denoising and Covariance Estimation of Single Particle Cryo-EM Images , 2016, Journal of structural biology.

[25]  Victor Solo,et al.  Dimension Estimation in Noisy PCA With SURE and Random Matrix Theory , 2008, IEEE Transactions on Signal Processing.

[26]  Chao Yang,et al.  SPARX, a new environment for Cryo-EM image processing. , 2007, Journal of structural biology.

[27]  Marina Serna Hands on Methods for High Resolution Cryo-Electron Microscopy Structures of Heterogeneous Macromolecular Complexes , 2019, Front. Mol. Biosci..

[28]  Stéphane Chrétien,et al.  Von Neumann's trace inequality for tensors , 2015 .

[29]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[30]  Alexander Katsevich,et al.  Covariance Matrix Estimation for the Cryo-EM Heterogeneity Problem , 2013, SIAM J. Imaging Sci..

[31]  Michael Schatz,et al.  Single-particle cryo-EM using alignment by classification (ABC): the structure of Lumbricus terrestris haemoglobin , 2017, IUCrJ.