Spectral Learning on Matrices and Tensors

Spectral methods have been the mainstay in several domains such as machine learning and scientific computing. They involve finding a certain kind of spectral decomposition to obtain basis functions that can capture important structures for the problem at hand. The most common spectral method is the principal component analysis (PCA). It utilizes the top eigenvectors of the data covariance matrix, e.g. to carry out dimensionality reduction. This data pre-processing step is often effective in separating signal from noise. PCA and other spectral techniques applied to matrices have several limitations. By limiting to only pairwise moments, they are effectively making a Gaussian approximation on the underlying data and fail on data with hidden variables which lead to non-Gaussianity. However, in most data sets, there are latent effects that cannot be directly observed, e.g., topics in a document corpus, or underlying causes of a disease. By extending the spectral decomposition methods to higher order moments, we demonstrate the ability to learn a wide range of latent variable models efficiently. Higher-order moments can be represented by tensors, and intuitively, they can encode more information than just pairwise moment matrices. More crucially, tensor decomposition can pick up latent effects that are missed by matrix methods, e.g. uniquely identify non-orthogonal components. Exploiting these aspects turns out to be fruitful for provable unsupervised learning of a wide range of latent variable models. We also outline the computational techniques to design efficient tensor decomposition methods. We introduce Tensorly, which has a simple python interface for expressing tensor operations. It has a flexible back-end system supporting NumPy, PyTorch, TensorFlow and MXNet amongst others, allowing multi-GPU and CPU operations and seamless integration with deep-learning functionalities.

[1]  K. Pearson Contributions to the Mathematical Theory of Evolution , 1894 .

[2]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[3]  Wm. R. Wright General Intelligence, Objectively Determined and Measured. , 1905 .

[4]  H. Weyl Das asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Differentialgleichungen (mit einer Anwendung auf die Theorie der Hohlraumstrahlung) , 1912 .

[5]  F. L. Hitchcock The Expression of a Tensor or a Polyadic as a Sum of Products , 1927 .

[6]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[7]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[8]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[9]  A. Bjerhammar Application of calculus of matrices to method of least squares : with special reference to geodetic calculations , 1951 .

[10]  Chandler Davis The rotation of eigenvectors by a perturbation , 1963 .

[11]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[12]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[13]  W. Kahan,et al.  The Rotation of Eigenvectors by a Perturbation. III , 1970 .

[14]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[15]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[16]  P. Wedin Perturbation bounds in connection with singular value decomposition , 1972 .

[17]  J. Kruskal More factors than subjects, tests and treatments: An indeterminacy theorem for canonical decomposition and individual differences scaling , 1976 .

[18]  J. Kruskal Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics , 1977 .

[19]  E. Davidson,et al.  Strategies for analyzing data from video fluorometric monitoring of liquid chromatographic effluents , 1981 .

[20]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[21]  J Möcks,et al.  Topographic components model for event-related potentials and some biophysical considerations. , 1988, IEEE transactions on bio-medical engineering.

[22]  V. N. Bogaevski,et al.  Matrix Perturbation Theory , 1991 .

[23]  D. Heckerman,et al.  ,81. Introduction , 2022 .

[24]  R. Harshman,et al.  PARAFAC: parallel factor analysis , 1994 .

[25]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[26]  S. Klinke,et al.  Exploratory Projection Pursuit , 1995 .

[27]  Nathalie Delfosse,et al.  Adaptive blind separation of independent sources: A deflation approach , 1995, Signal Process..

[28]  Pierre Comon,et al.  Independent component analysis, a survey of some algebraic methods , 1996, 1996 IEEE International Symposium on Circuits and Systems. Circuits and Systems Connecting the World. ISCAS 96.

[29]  Joseph T. Chang,et al.  Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. , 1996, Mathematical biosciences.

[30]  Alan M. Frieze,et al.  Learning linear transformations , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[31]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[32]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[33]  A. C. Koivunen,et al.  The Feasibility of Data Whitening to Improve Performance of Weather Radar , 1999 .

[34]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[35]  N. Sidiropoulos,et al.  On the uniqueness of multilinear decomposition of N‐way arrays , 2000 .

[36]  Nikos D. Sidiropoulos,et al.  Parallel factor analysis in sensor array processing , 2000, IEEE Trans. Signal Process..

[37]  Gene H. Golub,et al.  Rank-One Approximation to High Order Tensors , 2001, SIAM J. Matrix Anal. Appl..

[38]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[39]  Amnon Shashua,et al.  Linear image coding for regression and classification using the tensor-rank principle , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[40]  Dima Grigoriev,et al.  Linear lower bound on degrees of Positivstellensatz calculus proofs for the parity , 2001, Theor. Comput. Sci..

[41]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[42]  Uriel Feige,et al.  Relations between average case complexity and approximation complexity , 2002, STOC '02.

[43]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[44]  L. Lathauwer,et al.  On the Best Rank-1 and Rank-( , 2004 .

[45]  R. Lata,et al.  SOME ESTIMATES OF NORMS OF RANDOM MATRICES , 2004 .

[46]  Elchanan Mossel,et al.  Learning nonsingular phylogenies and hidden Markov models , 2005, STOC '05.

[47]  Bülent Yener,et al.  Modeling and Multiway Analysis of Chatroom Tensors , 2005, ISI.

[48]  Lek-Heng Lim,et al.  Singular values and eigenvalues of tensors: a variational approach , 2005, 1st IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing, 2005..

[49]  Aapo Hyvärinen,et al.  Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[50]  Liqun Qi,et al.  Eigenvalues of a real supersymmetric tensor , 2005, J. Symb. Comput..

[51]  Lieven De Lathauwer,et al.  Fourth-Order Cumulant-Based Blind Identification of Underdetermined Mixtures , 2007, IEEE Transactions on Signal Processing.

[52]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[53]  Tim Austin On exchangeable random variables and the statistics of large graphs and hypergraphs , 2008, 0801.1698.

[54]  Grant Schoenebeck,et al.  Linear Level Lasserre Lower Bounds for Certain k-CSPs , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[55]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[56]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[57]  Pierre Comon,et al.  Handbook of Blind Source Separation: Independent Component Analysis and Applications , 2010 .

[58]  Trac D. Tran,et al.  Tensor sparsification via a bound on the spectral norm of random tensors , 2010, ArXiv.

[59]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[60]  Nando de Freitas,et al.  On Autoencoders and Score Matching for Energy Based Models , 2011, ICML.

[61]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[62]  Y. Takane,et al.  Generalized Inverse Matrices , 2011 .

[63]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[64]  Tamara G. Kolda,et al.  Shifted Power Method for Computing Tensor Eigenpairs , 2010, SIAM J. Matrix Anal. Appl..

[65]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[66]  Sham M. Kakade,et al.  Identifiability and Unmixing of Latent Parse Trees , 2012, NIPS.

[67]  Anima Anandkumar,et al.  A Method of Moments for Mixture Models and Hidden Markov Models , 2012, COLT.

[68]  Anima Anandkumar,et al.  Learning Mixtures of Tree Graphical Models , 2012, NIPS.

[69]  Anima Anandkumar,et al.  A Tensor Spectral Approach to Learning Mixed Membership Community Models , 2013, COLT.

[70]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[71]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[72]  Sham M. Kakade,et al.  Learning mixtures of spherical gaussians: moment methods and spectral decompositions , 2012, ITCS '13.

[73]  Christopher J. Hillar,et al.  Most Tensor Problems Are NP-Hard , 2009, JACM.

[74]  Ran Raz Tensor-Rank and Lower Bounds for Arithmetic Formulas , 2013, JACM.

[75]  Andrea Montanari,et al.  A statistical model for tensor PCA , 2014, NIPS.

[76]  Yoshua Bengio,et al.  What regularized auto-encoders learn from the data-generating distribution , 2012, J. Mach. Learn. Res..

[77]  Aditya Bhaskara,et al.  Smoothed analysis of tensor decompositions , 2013, STOC.

[78]  David Steurer,et al.  Sum-of-squares proofs and the quest toward optimal algorithms , 2014, Electron. Colloquium Comput. Complex..

[79]  Anima Anandkumar,et al.  A Spectral Algorithm for Latent Dirichlet Allocation , 2012, Algorithmica.

[80]  Anima Anandkumar,et al.  Score Function Features for Discriminative Learning: Matrix and Tensor Framework , 2014, ArXiv.

[81]  Tengyu Ma,et al.  Decomposing Overcomplete 3rd Order Tensors using Sum-of-Squares Algorithms , 2015, APPROX-RANDOM.

[82]  Qingqing Huang,et al.  Learning Mixtures of Gaussians in High Dimensions , 2015, STOC.

[83]  Jonathan Shi,et al.  Tensor principal component analysis via sum-of-square proofs , 2015, COLT.

[84]  Ivan V. Oseledets,et al.  Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition , 2014, ICLR.

[85]  Anima Anandkumar,et al.  Online tensor methods for learning latent variable models , 2013, J. Mach. Learn. Res..

[86]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[87]  Anima Anandkumar,et al.  Learning Overcomplete Latent Variable Models through Tensor Methods , 2014, COLT.

[88]  Alexander Novikov,et al.  Tensorizing Neural Networks , 2015, NIPS.

[89]  Sham M. Kakade,et al.  Efficient Algorithms for Large-scale Generalized Eigenvector Computation and Canonical Correlation Analysis , 2016, ICML.

[90]  Tengyu Ma,et al.  Polynomial-Time Tensor Decompositions with Sum-of-Squares , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[91]  Anima Anandkumar,et al.  Provable Tensor Methods for Learning Mixtures of Generalized Linear Models , 2014, AISTATS.

[92]  Eunhyeok Park,et al.  Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications , 2015, ICLR.

[93]  Ankur Moitra,et al.  Noisy tensor completion via the sum-of-squares hierarchy , 2015, Mathematical Programming.

[94]  Xiaogang Wang,et al.  Convolutional neural networks with low-rank regularization , 2015, ICLR.

[95]  Nathan Srebro,et al.  Globally Convergent Stochastic Optimization for Canonical Correlation Analysis , 2016, ArXiv.

[96]  Aapo Hyvärinen,et al.  Density Estimation in Infinite Dimensional Exponential Families , 2013, J. Mach. Learn. Res..

[97]  Sanjeev Arora,et al.  Provable learning of noisy-OR networks , 2016, STOC.

[98]  Hossein Mobahi,et al.  Homotopy Analysis for Tensor PCA , 2016, COLT.

[99]  Yuanzhi Li,et al.  First Efficient Convergence for Streaming k-PCA: A Global, Gap-Free, and Near-Optimal Rate , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[100]  Anima Anandkumar,et al.  Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods , 2017 .

[101]  Stephan Günnemann,et al.  Introduction to Tensor Decompositions and their Applications in Machine Learning , 2017, ArXiv.

[102]  Yuanzhi Li,et al.  Doubly Accelerated Methods for Faster CCA and Generalized Eigendecomposition , 2016, ICML.

[103]  Seung-Ik Lee,et al.  CP-decomposition with Tensor Power Method for Convolutional Neural Networks compression , 2017, 2017 IEEE International Conference on Big Data and Smart Computing (BigComp).

[104]  Anima Anandkumar,et al.  Tensor Contraction Layers for Parsimonious Deep Nets , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[105]  Nikos D. Sidiropoulos,et al.  Tensors for Data Mining and Data Fusion , 2016, ACM Trans. Intell. Syst. Technol..

[106]  David Steurer,et al.  Exact tensor completion with sum-of-squares , 2017, COLT.

[107]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[108]  Nikos D. Sidiropoulos,et al.  Tensor Decomposition for Signal Processing and Machine Learning , 2016, IEEE Transactions on Signal Processing.

[109]  LIII , 2018, Out of the Shadow.

[110]  Maja Pantic,et al.  TensorLy: Tensor Learning in Python , 2016, J. Mach. Learn. Res..

[111]  Anima Anandkumar,et al.  Stochastically Rank-Regularized Tensor Regression Networks , 2019, ArXiv.

[112]  Maja Pantic,et al.  T-Net: Parametrizing Fully Convolutional Nets With a Single High-Order Tensor , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[113]  Anima Anandkumar,et al.  Tensor Regression Networks , 2017, J. Mach. Learn. Res..

[114]  Avrim Blum,et al.  Foundations of Data Science , 2020 .

[115]  Geelon So ags Tensor Decompositions , 2021, Matrix and Tensor Decompositions in Signal Processing.

[116]  Matthew N. O. Sadiku,et al.  General Intelligence , 2021, A Primer on Multiple Intelligences.