Covariance Matrix Estimation for the Cryo-EM Heterogeneity Problem

In cryo-electron microscopy (cryo-EM), a microscope generates a top view of a sample of randomly oriented copies of a molecule. The problem of single particle reconstruction (SPR) from cryo-EM is to use the resulting set of noisy two-dimensional projection images taken at unknown directions to reconstruct the three-dimensional (3D) structure of the molecule. In some situations, the molecule under examination exhibits structural variability, which poses a fundamental challenge in SPR. The heterogeneity problem is the task of mapping the space of conformational states of a molecule. It has been previously suggested that the leading eigenvectors of the covariance matrix of the 3D molecules can be used to solve the heterogeneity problem. Estimating the covariance matrix is challenging, since only projections of the molecules are observed, but not the molecules themselves. In this paper, we formulate a general problem of covariance estimation from noisy projections of samples. This problem has intimate connections with matrix completion problems and high-dimensional principal component analysis. We propose an estimator and prove its consistency. When there are finitely many heterogeneity classes, the spectrum of the estimated covariance matrix reveals the number of classes. The estimator can be found as the solution to a certain linear system. In the cryo-EM case, the linear operator to be inverted, which we term the projection covariance transform, is an important object in covariance estimation for tomographic problems involving structural variation. Inverting it involves applying a filter akin to the ramp filter in tomography. We design a basis in which this linear operator is sparse and thus can be tractably inverted despite its large size. We demonstrate via numerical experiments on synthetic datasets the robustness of our algorithm to high levels of noise.

[1]  D. Donev Prolate Spheroidal Wave Functions , 2017 .

[2]  Tom Flint,et al.  Aide Memoire , 2017, BCS HCI.

[3]  J. Frank EXPLORING THE DYNAMICS OF SUPRAMOLECULAR MACHINES WITH CRYO-ELECTRON MICROSCOPY , 2014 .

[4]  Alan Brown,et al.  Structure of the Yeast Mitochondrial Large Ribosomal Subunit , 2014, Science.

[5]  W. Kühlbrandt The Resolution Revolution , 2014, Science.

[6]  Qiyu Jin,et al.  Iterative elastic 3D-to-2D alignment method using normal modes for studying structural dynamics of large macromolecular complexes. , 2014, Structure.

[7]  Zhizhen Zhao,et al.  Rotationally Invariant Image Representation for Viewing Direction Classification in Cryo-EM , 2013, Journal of structural biology.

[8]  M. Heel Principles of Phase Contrast (Electron) Microscopy , 2014 .

[9]  D. Julius,et al.  Structure of the TRPV1 ion channel determined by electron cryo-microscopy , 2013, Nature.

[10]  J. Frank Story in a sample-the potential (and limitations) of cryo-electron microscopy applied to molecular machines. , 2013, Biopolymers.

[11]  D. Agard,et al.  Electron counting and beam-induced motion correction enable near atomic resolution single particle cryoEM , 2013, Nature Methods.

[12]  John E. Johnson,et al.  Dynamics in cryo EM reconstructions visualized with maximum-likelihood derived variance maps. , 2013, Journal of structural biology.

[13]  S. Scheres,et al.  Ribosome structures to near-atomic resolution from thirty thousand cryo-EM particles , 2013, eLife.

[14]  G. Sapiro,et al.  A collaborative framework for 3D alignment and classification of heterogeneous subvolumes in cryo-electron tomography. , 2013, Journal of structural biology.

[15]  Prateek Jain,et al.  Low-rank matrix completion using alternating minimization , 2012, STOC '13.

[16]  Amit Singer,et al.  Orientation Determination of Cryo-EM Images Using Least Unsquared Deviations , 2012, SIAM J. Imaging Sci..

[17]  Zhizhen Zhao,et al.  Fourier-Bessel rotational invariant eigenimages , 2012, Journal of the Optical Society of America. A, Optics, image science, and vision.

[18]  J. Tropp User-Friendly Tools for Random Matrices: An Introduction , 2012 .

[19]  Sjors H.W. Scheres,et al.  RELION: Implementation of a Bayesian approach to cryo-EM structure determination , 2012, Journal of structural biology.

[20]  Adam Tauman Kalai,et al.  Disentangling Gaussians , 2012, Commun. ACM.

[21]  Karim Lounici High-dimensional covariance matrix estimation with missing observations , 2012, 1201.2577.

[22]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[23]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[24]  Marek Kimmel,et al.  Identifying conformational states of macromolecules by eigen-analysis of resampled cryo-EM images. , 2011, Structure.

[25]  Po-Ling Loh,et al.  High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity , 2011, NIPS.

[26]  Yoel Shkolnisky,et al.  Three-Dimensional Structure Determination from Common Lines in Cryo-EM by Eigenvectors and Semidefinite Programming , 2011, SIAM J. Imaging Sci..

[27]  Guillermo Sapiro,et al.  Efficient matrix completion with Gaussian models , 2010, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Philipp Birken,et al.  Numerical Linear Algebra , 2011, Encyclopedia of Parallel Computing.

[29]  Natalie Baddour,et al.  Operational and convolution properties of three-dimensional Fourier transforms in spherical polar coordinates. , 2010, Journal of the Optical Society of America. A, Optics, image science, and vision.

[30]  A. Zewail,et al.  4D Electron Tomography , 2010, Science.

[31]  Joachim Frank,et al.  Classification by bootstrapping in single particle methods , 2010, 2010 IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[32]  Jitendra Malik,et al.  Automated multi-model reconstruction from single-particle electron microscopy data. , 2010, Journal of structural biology.

[33]  Michael O'Neil,et al.  An algorithm for the rapid evaluation of special function transforms , 2010 .

[34]  Tapani Raiko,et al.  Tkk Reports in Information and Computer Science Practical Approaches to Principal Component Analysis in the Presence of Missing Values Tkk Reports in Information and Computer Science Practical Approaches to Principal Component Analysis in the Presence of Missing Values , 2022 .

[35]  Emmanuel J. Candès,et al.  Matrix Completion With Noise , 2009, Proceedings of the IEEE.

[36]  Andrea Montanari,et al.  Matrix completion from a few entries , 2009, 2009 IEEE International Symposium on Information Theory.

[37]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[38]  S. Scheres,et al.  Maximum-likelihood methods in cryo-EM. Part II: application to experimental data , 2010 .

[39]  Grant J. Jensen,et al.  3-D reconstruction , 2010 .

[40]  P. Penczek Resolution measures in molecular electron microscopy. , 2010, Methods in enzymology.

[41]  Fred J. Sigworth,et al.  Cryo-EM structure of the BK potassium channel in a lipid membrane , 2009, Nature.

[42]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[43]  B. Nadler Finite sample approximation results for principal component analysis: a matrix perturbation approach , 2009, 0901.3245.

[44]  P. Bickel,et al.  Covariance regularization by thresholding , 2009, 0901.3079.

[45]  Wei Zhang,et al.  Heterogeneity of large macromolecular complexes revealed by 3D cryo-EM variance analysis. , 2008, Structure.

[46]  Dong-Hua Chen,et al.  De novo backbone trace of GroEL from single particle electron cryomicroscopy. , 2008, Structure.

[47]  G T Herman,et al.  Classification of heterogeneous electron microscopic projections into homogeneous subsets. , 2008, Ultramicroscopy.

[48]  S. Harrison,et al.  Near-atomic resolution using electron cryomicroscopy and single-particle reconstruction , 2008, Proceedings of the National Academy of Sciences.

[49]  Are Hjørungnes,et al.  Complex-Valued Matrix Differentiation: Techniques and Key Results , 2007, IEEE Transactions on Signal Processing.

[50]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[51]  James Bennett,et al.  The Netflix Prize , 2007 .

[52]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[53]  Liguo Wang,et al.  Cryo-EM and single particles. , 2006, Physiology.

[54]  Chao Yang,et al.  Estimation of variance in single-particle reconstruction using the bootstrap technique. , 2006, Journal of Structural Biology.

[55]  M. Baker,et al.  Electron cryomicroscopy of biological machines at subnanometer resolution. , 2005, Structure.

[56]  J. W. Silverstein,et al.  Eigenvalues of large sample covariance matrices of spiked population models , 2004, math/0408165.

[57]  Pawel A Penczek,et al.  Gridding-based direct Fourier inversion of the three-dimensional ray transform. , 2004, Journal of the Optical Society of America. A, Optics, image science, and vision.

[58]  S. Péché,et al.  Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices , 2004, math/0403022.

[59]  R. Henderson Realizing the potential of electron cryo-microscopy , 2004, Quarterly Reviews of Biophysics.

[60]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[61]  P. A. Penczek,et al.  Variance in three-dimensional reconstructions from projections , 2002, Proceedings IEEE International Symposium on Biomedical Imaging.

[62]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[63]  T. Schneider Analysis of Incomplete Climate Data: Estimation of Mean Values and Covariance Matrices and Imputation of Missing Values. , 2001 .

[64]  M. Heel,et al.  Single-particle electron cryo-microscopy: towards atomic resolution , 2000, Quarterly Reviews of Biophysics.

[65]  Gregory Piatetsky-Shapiro,et al.  High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality , 2000 .

[66]  David L. Donoho,et al.  Aide-Memoire . High-Dimensional Data Analysis : The Curses and Blessings of Dimensionality , 2000 .

[67]  M. Rudelson Random Vectors in the Isotropic Position , 1996, math/9608208.

[68]  J. Frank Three-Dimensional Electron Microscopy of Macromolecular Assemblies: Visualization of Biological Molecules in Their Native State , 1996 .

[69]  J. W. Silverstein,et al.  On the empirical distribution of eigenvalues of a class of large dimensional random matrices , 1995 .

[70]  G. A. Parker,et al.  A guide to rotations in quantum mechanics , 1987 .

[71]  F. Natterer The Mathematics of Computerized Tomography , 1986 .

[72]  A. Prudnikov,et al.  Integrals and series of special functions , 1983 .

[73]  W. O. Saxton,et al.  The correlation averaging of a regularly arranged bacterial cell envelope protein , 1982, Journal of microscopy.

[74]  Z. Kam The reconstruction of structure from electron micrographs of randomly oriented particles. , 1980, Journal of theoretical biology.

[75]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[76]  E. Stein,et al.  Introduction to Fourier Analysis on Euclidean Spaces. , 1971 .

[77]  T. Creighton Methods in Enzymology , 1968, The Yale Journal of Biology and Medicine.

[78]  V. Marčenko,et al.  DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES , 1967 .

[79]  D. Slepian Prolate spheroidal wave functions, Fourier analysis and uncertainty — IV: Extensions to many dimensions; generalized prolate spheroidal functions , 1964 .

[80]  S. S. Wilks Moments and Distributions of Estimates of Population Parameters from Fragmentary Samples , 1932 .

[81]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .