Methods of random matrices for large dimensional statistical learning

The BigData challenge induces a need for machine learning algorithms to evolve towards large dimensional and more efficient learning engines. Recently, a new direction of research has emerged that consists in analyzing learning methods in the modern regime where the number n and the dimension p of data samples are commensurately large. Compared to the conventional regime where n>>p, the regime with large and comparable n,p is particularly interesting as the learning performance in this regime remains sensitive to the tuning of hyperparameters, thus opening a path into the understanding and improvement of learning techniques for large dimensional datasets.The technical approach employed in this thesis draws on several advanced tools of high dimensional statistics, allowing us to conduct more elaborate analyses beyond the state of the art. The first part of this dissertation is devoted to the study of semi-supervised learning on high dimensional data. Motivated by our theoretical findings, we propose a superior alternative to the standard semi-supervised method of Laplacian regularization. The methods involving implicit optimizations, such as SVMs and logistic regression, are next investigated under realistic mixture models, providing exhaustive details on the learning mechanism. Several important consequences are thus revealed, some of which are even in contradiction with common belief.

[1]  B. Nadler,et al.  Semi-supervised learning with the graph Laplacian: the limit of infinite unlabelled data , 2009, NIPS 2009.

[2]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[3]  Robert D. Nowak,et al.  Multi-Manifold Semi-Supervised Learning , 2009, AISTATS.

[4]  Mikhail Belkin,et al.  Regularization and Semi-supervised Learning on Large Graphs , 2004, COLT.

[5]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[6]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[7]  Stephen E. Fienberg,et al.  Testing Statistical Hypotheses , 2005 .

[8]  S. Portnoy Asymptotic Behavior of $M$-Estimators of $p$ Regression Parameters when $p^2/n$ is Large. I. Consistency , 1984 .

[9]  David A. Landgrebe,et al.  The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon , 1994, IEEE Trans. Geosci. Remote. Sens..

[10]  R. Couillet,et al.  Spectral analysis of the Gram matrix of mixture models , 2015, 1510.03463.

[11]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[12]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[13]  Raj Rao Nadakuditi,et al.  The singular values and vectors of low rank perturbations of large rectangular random matrices , 2011, J. Multivar. Anal..

[14]  Boaz Nadler,et al.  Minimax-optimal semi-supervised regression on unknown manifolds , 2016, AISTATS.

[15]  E. Candès,et al.  The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression , 2018, The Annals of Statistics.

[16]  Romain Couillet,et al.  A random matrix analysis and improvement of semi-supervised learning for large dimensional data , 2017, J. Mach. Learn. Res..

[17]  Noureddine El Karoui,et al.  Asymptotic behavior of unregularized and ridge-regularized high-dimensional robust regression estimators : rigorous results , 2013, 1311.2445.

[18]  R. Cooke Real and Complex Analysis , 2011 .

[19]  Thomas G. Dietterich,et al.  To transfer or not to transfer , 2005, NIPS 2005.

[20]  Sunil K. Narang,et al.  Localized iterative methods for interpolation in graph structured data , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[21]  Vladimir Vapnik,et al.  Principles of Risk Minimization for Learning Theory , 1991, NIPS.

[22]  Zhenyu Liao,et al.  A Large Scale Analysis of Logistic Regression: Asymptotic Performance and New Insights , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Romain Couillet,et al.  Random matrix improved subspace clustering , 2016, 2016 50th Asilomar Conference on Signals, Systems and Computers.

[24]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[25]  Fabio Roli,et al.  A Theoretical Analysis of Bagging as a Linear Combination of Classifiers , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Mérouane Debbah,et al.  A Deterministic Equivalent for the Analysis of Correlated MIMO Multiple Access Channels , 2009, IEEE Transactions on Information Theory.

[28]  Edward R. Dougherty,et al.  Generalized Consistent Error Estimator of Linear Discriminant Analysis , 2015, IEEE Transactions on Signal Processing.

[29]  Tommi S. Jaakkola,et al.  Partially labeled classification with Markov random walks , 2001, NIPS.

[30]  Romain Couillet,et al.  Consistent Semi-Supervised Graph Regularization for High Dimensional Data , 2020, J. Mach. Learn. Res..

[31]  Romain Couillet,et al.  Concentration of Measure and Large Random Matrices with an application to Sample Covariance Matrices , 2018, 1805.08295.

[32]  Fabio Gagliardi Cozman,et al.  Unlabeled Data Can Degrade Classification Performance of Generative Classifiers , 2002, FLAIRS.

[33]  Sunil K. Narang,et al.  Signal processing techniques for interpolation in graph structured data , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[34]  Shai Ben-David,et al.  On the Di cultyof Approximately Maximizing Agreements , 2000 .

[35]  Nuno Vasconcelos,et al.  On the Design of Loss Functions for Classification: theory, robustness to outliers, and SavageBoost , 2008, NIPS.

[36]  Andrea Montanari,et al.  High dimensional robust M-estimation: asymptotic variance via approximate message passing , 2013, Probability Theory and Related Fields.

[37]  V. Marčenko,et al.  DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES , 1967 .

[38]  Shai Ben-David,et al.  Does Unlabeled Data Provably Help? Worst-case Analysis of the Sample Complexity of Semi-Supervised Learning , 2008, COLT.

[39]  Romain Couillet,et al.  Kernel Random Matrices of Large Concentrated Data: the Example of GAN-Generated Images , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[40]  Larry A. Wasserman,et al.  Statistical Analysis of Semi-Supervised Regression , 2007, NIPS.

[41]  Michel Verleysen,et al.  The Concentration of Fractional Distances , 2007, IEEE Transactions on Knowledge and Data Engineering.

[42]  Romain Couillet,et al.  Random Matrix Asymptotics of Inner Product Kernel Spectral Clustering , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[43]  J. W. Silverstein,et al.  No eigenvalues outside the support of the limiting spectral distribution of large-dimensional sample covariance matrices , 1998 .

[44]  P. Bickel,et al.  Local polynomial regression on unknown manifolds , 2007, 0708.0983.

[45]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[46]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[47]  Mikhail Belkin,et al.  Semi-Supervised Learning on Riemannian Manifolds , 2004, Machine Learning.

[48]  R. Couillet,et al.  Random Matrix Methods for Wireless Communications , 2011 .

[49]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[50]  E. Candès,et al.  A modern maximum-likelihood theory for high-dimensional logistic regression , 2018, Proceedings of the National Academy of Sciences.

[51]  Romain Couillet,et al.  Semi-Supervised Spectral Clustering , 2018, 2018 52nd Asilomar Conference on Signals, Systems, and Computers.

[52]  Antonio Ortega,et al.  Active semi-supervised learning using sampling theory for graph signals , 2014, KDD.

[53]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[54]  Romain Couillet,et al.  Revisiting and Improving Semi-supervised Learning: A Large Dimensional Approach , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[55]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[56]  Konstantin Avrachenkov,et al.  Generalized Optimization Framework for Graph-based Semi-supervised Learning , 2011, SDM.

[57]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[58]  J. W. Silverstein,et al.  On the empirical distribution of eigenvalues of a class of large dimensional random matrices , 1995 .

[59]  R. Couillet,et al.  Kernel spectral clustering of large dimensional data , 2015, 1510.03547.

[60]  J. Sherman,et al.  Adjustment of an Inverse Matrix Corresponding to a Change in One Element of a Given Matrix , 1950 .

[61]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[62]  Zhenyu Liao,et al.  Classification Asymptotics in the Random Matrix Regime , 2018, 2018 26th European Signal Processing Conference (EUSIPCO).

[63]  Antonio Ortega,et al.  Asymptotic justification of bandlimited interpolation of graph signals for semi-supervised learning , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[64]  J. W. Silverstein,et al.  Spectral Analysis of Large Dimensional Random Matrices , 2009 .

[65]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[66]  J. W. Silverstein,et al.  Eigenvalues of large sample covariance matrices of spiked population models , 2004, math/0408165.

[67]  Fabrizio Angiulli,et al.  On the Behavior of Intrinsically High-Dimensional Spaces: Distances, Direct and Reverse Nearest Neighbors, and Hubness , 2017, J. Mach. Learn. Res..

[68]  Thorsten Joachims,et al.  Transductive Learning via Spectral Graph Partitioning , 2003, ICML.

[69]  Hanwen Huang,et al.  Asymptotic behavior of Support Vector Machine for spiked population model , 2017, J. Mach. Learn. Res..

[70]  Yuxin Chen,et al.  The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled Chi-square , 2017, Probability Theory and Related Fields.

[71]  Mikhail Belkin,et al.  Consistency of spectral clustering , 2008, 0804.0678.

[72]  J. Norris Appendix: probability and measure , 1997 .

[73]  S. Shalev-Shwartz,et al.  Effective Semi-supervised Learning on Manifolds , 2017 .

[74]  Pascal Frossard,et al.  The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains , 2012, IEEE Signal Processing Magazine.

[75]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[76]  Mohamed-Slim Alouini,et al.  Asymptotic performance of regularized quadratic discriminant analysis based classifiers , 2017, 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP).

[77]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[78]  Zhenyu Liao,et al.  A Large Dimensional Analysis of Least Squares Support Vector Machines , 2017, IEEE Transactions on Signal Processing.

[79]  Lorenzo Rosasco,et al.  Are Loss Functions All the Same? , 2004, Neural Computation.