The Curse of Dimensionality for Local Kernel Machines

We present a series of theoretical arguments supporting the claim that a large class of modern learning algorithms based on local kernels are sensitive to the curse of dimensionality. These include local manifold learning algorithms such as Isomap and LLE, support vector classifiers with Gaussian or other local kernels, and graph-based semisupervised learning algorithms using a local similarity function. These algorithms are shown to be local in the sense that crucial properties of the learned function at x depend mostly on the neighbors of x in the training set. This makes them sensitive to the curse of dimensionality, well studied for classical non-parametric statistical learning. There is a large class of data distributions for which non-local solutions could be expressed compactly and potentially be learned with few examples, but which will require a large number of local bases and therefore a large number of training examples when using a local learning algorithm.

[1]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[2]  R. Taylor,et al.  The Numerical Treatment of Integral Equations , 1978 .

[3]  C. Micchelli Interpolation of scattered data: Distance matrices and conditionally positive definite functions , 1986 .

[4]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[5]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[6]  K Nearest Neighbor Classifier , 1997 .

[7]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[8]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[9]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[10]  Yair Weiss,et al.  Segmentation using eigenvectors: a unifying view , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[11]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[12]  Christopher K. I. Williams,et al.  The Effect of the Input Density Distribution on Kernel-based Classifiers , 2000, ICML.

[13]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[14]  Thomas G. Dietterich,et al.  Editors. Advances in Neural Information Processing Systems , 2002 .

[15]  Matthew Brand,et al.  Charting a Manifold , 2002, NIPS.

[16]  Michael Schmitt,et al.  Descartes' Rule of Signs for Radial Basis Function Neural Networks , 2002, Neural Computation.

[17]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[18]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[19]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[20]  Chih-Jen Lin,et al.  Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel , 2003, Neural Computation.

[21]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[22]  Mikhail Belkin,et al.  Regularization and Semi-supervised Learning on Large Graphs , 2004, COLT.

[23]  Nicolas Le Roux,et al.  Learning Eigenfunctions Links Spectral Embedding and Kernel PCA , 2004, Neural Computation.

[24]  Yoshua Bengio,et al.  Non-Local Manifold Tangent Learning , 2004, NIPS.

[25]  Nicolas Le Roux,et al.  Efficient Non-Parametric Function Induction in Semi-Supervised Learning , 2004, AISTATS.

[26]  Robert A. Lordo,et al.  Nonparametric and Semiparametric Models , 2005, Technometrics.