2SDR: Two Stage Dimension Reduction to Denoise Cryo-EM Images

Principal component analysis (PCA) is arguably the most widely used dimension reduction method for vector type data. When applied to a set of images, PCA demands that the images be vectorized. This demand consequentially introduced a weakness in the application: heavy computation due to solving the eigenvalue problem of a huge covariance matrix. In this paper, we propose a two stage dimension reduction (2SDR) method for images based on a statistical model with two layers of noise structures. 2SDR first applies multi-linear PCA (MPCA) to extract core scores from the images as well as to screen the first layer of noise, and then applies PCA on these scores to further reduce the second layer of noise. MPCA has computation advantages that it avoids image vectorization and applies the Kronecker product on column and row eigenvectors to model the image bases. In contrast, PCA can diagonalize the covariance matrix that its projected scores are guaranteed to be uncorrelated. Combining MPCA and PCA, 2SDR has two benefits that it inherits the computation advantage of MPCA and its projection scores are uncorrelated as those of PCA. Testing with two cryo-electron microscopy (cryo-EM) benchmark experimental datasets shows that 2SDR performs better than MPCA and PCA alone in terms of the computation efficiency and denoising performance. We further propose a rank selection method for 2SDR and prove that this method has the consistency property under some regular conditions.

[1]  Shun-ichi Amari,et al.  Geometry of q-Exponential Family of Probability Distributions , 2011, Entropy.

[2]  Andrew W. Mead Review of the Development of Multidimensional Scaling Methods , 1992 .

[3]  C. Stein Estimation of the Mean of a Multivariate Normal Distribution , 1981 .

[4]  F. Chung,et al.  Higher eigenvalues and isoperimetric inequalities on Riemannian manifolds and graphs , 2000 .

[5]  Amit Singer,et al.  Denoising and Covariance Estimation of Single Particle Cryo-EM Images , 2016, Journal of structural biology.

[6]  Pawel A Penczek,et al.  Iterative stable alignment and clustering of 2D transmission electron microscope images. , 2012, Structure.

[7]  Marina Serna Hands on Methods for High Resolution Cryo-Electron Microscopy Structures of Heterogeneous Macromolecular Complexes , 2019, Front. Mol. Biosci..

[8]  H. Stark,et al.  Structure and Conformational Dynamics of the Human Spliceosomal Bact Complex , 2018, Cell.

[9]  Victor Solo,et al.  Dimension Estimation in Noisy PCA With SURE and Random Matrix Theory , 2008, IEEE Transactions on Signal Processing.

[10]  Michael Schatz,et al.  Single-particle cryo-EM using alignment by classification (ABC): the structure of Lumbricus terrestris haemoglobin , 2017, IUCrJ.

[11]  Victor Y. Pan,et al.  The complexity of the matrix eigenproblem , 1999, STOC '99.

[12]  Alexander Katsevich,et al.  Covariance Matrix Estimation for the Cryo-EM Heterogeneity Problem , 2013, SIAM J. Imaging Sci..

[13]  Chao Yang,et al.  SPARX, a new environment for Cryo-EM image processing. , 2007, Journal of structural biology.

[14]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[15]  Ting-Li Chen,et al.  Clustering by self-updating process , 2012 .

[16]  Stéphane Chrétien,et al.  Von Neumann's trace inequality for tensors , 2015 .

[17]  Martin Vetterli,et al.  Adaptive wavelet thresholding for image denoising and compression , 2000, IEEE Trans. Image Process..

[18]  Zhizhen Zhao,et al.  Rotationally Invariant Image Representation for Viewing Direction Classification in Cryo-EM , 2013, Journal of structural biology.

[19]  Alessandro Foi,et al.  Image Denoising by Sparse 3-D Transform-Domain Collaborative Filtering , 2007, IEEE Transactions on Image Processing.

[20]  Joachim Frank,et al.  Use of multivariate statistics in analysing the images of biological macromolecules , 1981 .

[21]  Marek Kimmel,et al.  Identifying conformational states of macromolecules by eigen-analysis of resampled cryo-EM images. , 2011, Structure.

[22]  G. Kitagawa,et al.  Generalised information criteria in model selection , 1996 .

[23]  Sjors H.W. Scheres,et al.  RELION: Implementation of a Bayesian approach to cryo-EM structure determination , 2012, Journal of structural biology.

[24]  Rodrigo Villares Portugal,et al.  Multivariate Statistical Analysis of Large Datasets: Single Particle Electron Microscopy , 2016 .

[25]  Joos Vandewalle,et al.  On the Best Rank-1 and Rank-(R1 , R2, ... , RN) Approximation of Higher-Order Tensors , 2000, SIAM J. Matrix Anal. Appl..

[26]  J. Frank Three-Dimensional Electron Microscopy of Macromolecular Assemblies: Visualization of Biological Molecules in Their Native State , 1996 .

[27]  S. Eguchi,et al.  Robust parameter estimation with a small bias against heavy contamination , 2008 .

[28]  J M Carazo,et al.  A clustering approach to multireference alignment of single-particle projections in electron microscopy. , 2010, Journal of structural biology.

[29]  N. Altman,et al.  On dimension folding of matrix- or array-valued statistical objects , 2010, 1002.4789.

[30]  Xuelong Li,et al.  Probabilistic Tensor Analysis with Akaike and Bayesian Information Criteria , 2007, ICONIP.

[31]  C. Ing,et al.  A generalized information criterion for high-dimensional PCA rank selection , 2020, Statistical Papers.