On the Sample Complexity of Subspace Learning

A large number of algorithms in machine learning, from principal component analysis (PCA), and its non-linear (kernel) extensions, to more recent spectral embedding and support estimation methods, rely on estimating a linear subspace from samples. In this paper we introduce a general formulation of this problem and derive novel learning error estimates. Our results rely on natural assumptions on the spectral properties of the covariance operator associated to the data distribution, and hold for a wide class of metrics between subspaces. As special cases, we discuss sharp error estimates for the reconstruction properties of PCA and spectral support estimation. Key to our analysis is an operator theoretic approach that has broad applicability to spectral learning methods.

[1]  Lorenzo Rosasco,et al.  Spectral Regularization for Support Estimation , 2010, NIPS.

[2]  Massimiliano Pontil,et al.  $K$ -Dimensional Coding Schemes in Hilbert Spaces , 2010, IEEE Transactions on Information Theory.

[3]  J. Tropp User-Friendly Tools for Random Matrices: An Introduction , 2012 .

[4]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[5]  Kilian Q. Weinberger,et al.  Unsupervised Learning of Image Manifolds by Semidefinite Programming , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[6]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[7]  M. Naderi Think globally... , 2004, HIV prevention plus!.

[8]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[9]  Nicolas Le Roux,et al.  Learning Eigenfunctions Links Spectral Embedding and Kernel PCA , 2004, Neural Computation.

[10]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[11]  古田 孝之,et al.  NORM INEQUALITIES EQUIVALENT TO LOWNER-HEINZ THEOREM , 1991 .

[12]  Jean-Christophe Bourin Some inequalities for norms on matrices and operators , 1999 .

[13]  Christopher K. I. Williams On a Connection between Kernel PCA and Metric Multidimensional Scaling , 2004, Machine Learning.

[14]  Nello Cristianini,et al.  On the eigenspectrum of the gram matrix and the generalization error of kernel-PCA , 2005, IEEE Transactions on Information Theory.

[15]  Stephen P. Boyd,et al.  The Fastest Mixing Markov Process on a Graph and a Connection to a Maximum Variance Unfolding Problem , 2006, SIAM Rev..

[16]  Gerald Beer,et al.  Topologies on Closed and Closed Convex Sets , 1993 .

[17]  T. Andô,et al.  Norm inequalities related to operator monotone functions , 1999 .

[18]  F. Hansen An operator inequality , 1980 .

[19]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[20]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[21]  Nicolas Le Roux,et al.  Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , 2003, NIPS.

[22]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[23]  E. D. Vito,et al.  Learning Sets with Separating Kernels , 2012, 1204.3573.

[24]  Bernhard Schölkopf,et al.  A kernel view of the dimensionality reduction of manifolds , 2004, ICML.

[25]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[26]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[27]  J. Retherford,et al.  Hilbert Space: Compact Operators and the Trace Theorem , 1993 .

[28]  Israel Gohberg,et al.  Basic Classes of Linear Operators , 2004 .

[29]  I. Pinelis OPTIMUM BOUNDS FOR THE DISTRIBUTIONS OF MARTINGALES IN BANACH SPACES , 1994, 1208.2200.

[30]  A E Bostwick,et al.  THE THEORY OF PROBABILITIES. , 1896, Science.

[31]  Gilles Blanchard,et al.  Statistical properties of Kernel Prinicipal Component Analysis , 2019 .