Two-way kernel matrix puncturing: towards resource-efficient PCA and spectral clustering

The article introduces an elementary cost and storage reduction method for spectral clustering and principal component analysis. The method consists in randomly “puncturing” both the data matrix X ∈ Cp×n (or Rp×n) and its corresponding kernel (Gram) matrix K through Bernoulli masks: S ∈ {0, 1}p×n for X and B ∈ {0, 1}n×n for K. The resulting “two-way punctured” kernel is thus given by K = 1 p [(X S) (X S)] B. We demonstrate that, for X composed of independent columns drawn from a Gaussian mixture model, as n, p → ∞ with p/n → c0 ∈ (0,∞), the spectral behavior of K – its limiting eigenvalue distribution, as well as its isolated eigenvalues and eigenvectors – is fully tractable and exhibits a series of counter-intuitive phenomena. We notably prove, and empirically confirm on various real image databases, that it is possible to drastically puncture the data, thereby providing possibly huge computational and storage gains, for a virtually constant (clustering or PCA) performance. This preliminary study opens as such the path towards rethinking, from a large dimensional standpoint, computational and storage costs in elementary machine learning models.

[1]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[2]  Mohamed-Slim Alouini,et al.  On the Precise Error Analysis of Support Vector Machines , 2020, IEEE Open Journal of Signal Processing.

[3]  Zhenyu Liao,et al.  The Dynamics of Learning: A Random Matrix Approach , 2018, ICML.

[4]  Jeffrey Pennington,et al.  Nonlinear random matrix theory for deep learning , 2019, NIPS.

[5]  Romain Couillet,et al.  A random matrix analysis and improvement of semi-supervised learning for large dimensional data , 2017, J. Mach. Learn. Res..

[6]  Romain Couillet,et al.  Large Dimensional Analysis and Improvement of Multi Task Learning , 2020, ArXiv.

[7]  Patrick Pérez,et al.  Sketching for Large-Scale Learning of Mixture Models. (Apprentissage de modèles de mélange à large échelle par Sketching) , 2017 .

[8]  Hanwen Huang,et al.  Asymptotic behavior of Support Vector Machine for spiked population model , 2017, J. Mach. Learn. Res..

[9]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[10]  L. Pastur,et al.  Eigenvalue Distribution of Large Random Matrices , 2011 .

[11]  Sanjoy Dasgupta,et al.  Learning the structure of manifolds using random projections , 2007, NIPS.

[12]  Zhenyu Liao,et al.  On the Spectrum of Random Features Maps of High Dimensional Data , 2018, ICML.

[13]  Zhou Fan,et al.  Empirical Bayes PCA in high dimensions , 2020, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[14]  Romain Couillet,et al.  Revisiting the Bethe-Hessian: Improved Community Detection in Sparse Heterogeneous Graphs , 2019, NeurIPS.

[15]  T. Cai,et al.  Sparse PCA: Optimal rates and adaptive estimation , 2012, 1211.1309.

[16]  V. Marčenko,et al.  DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES , 1967 .

[17]  Fionn Murtagh,et al.  Algorithms for hierarchical clustering: an overview , 2012, WIREs Data Mining Knowl. Discov..

[18]  Romain Couillet,et al.  Asymptotic Gaussian Fluctuations of Spectral Clustering Eigenvectors , 2019, 2019 IEEE 8th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[19]  I. Johnstone,et al.  Sparse Principal Components Analysis , 2009, 0901.4392.

[20]  Andrew M. Saxe,et al.  High-dimensional dynamics of generalization error in neural networks , 2017, Neural Networks.

[21]  Romain Couillet,et al.  Performance-Complexity Trade-Off in Large Dimensional Statistics , 2020, 2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP).

[22]  Shie Mannor,et al.  The kernel recursive least-squares algorithm , 2004, IEEE Transactions on Signal Processing.

[23]  Elchanan Mossel,et al.  Spectral redemption in clustering sparse networks , 2013, Proceedings of the National Academy of Sciences.

[24]  Andrea Montanari,et al.  Information-theoretically optimal sparse PCA , 2014, 2014 IEEE International Symposium on Information Theory.

[25]  L. Bottou Stochastic Gradient Learning in Neural Networks , 1991 .

[26]  E. Wigner On the Distribution of the Roots of Certain Symmetric Matrices , 1958 .