Classification with Gaussians and Convex Loss

This paper considers binary classification algorithms generated from Tikhonov regularization schemes associated with general convex loss functions and varying Gaussian kernels. Our main goal is to provide fast convergence rates for the excess misclassification error. Allowing varying Gaussian kernels in the algorithms improves learning rates measured by regularization error and sample error. Special structures of Gaussian kernels enable us to construct, by a nice approximation scheme with a Fourier analysis technique, uniformly bounded regularizing functions achieving polynomial decays of the regularization error under a Sobolev smoothness condition. The sample error is estimated by using a projection operator and a tight bound for the covering numbers of reproducing kernel Hilbert spaces generated by Gaussian kernels. The convexity of the general loss function plays a very important role in our analysis.

[1]  Charles A. Micchelli,et al.  A DC-programming algorithm for kernel selection , 2006, ICML.

[2]  S. Smale,et al.  Learning Theory Estimates via Integral Operators and Their Approximations , 2007 .

[3]  Lorenzo Rosasco,et al.  Model Selection for Regularized Least-Squares Algorithm in Learning Theory , 2005, Found. Comput. Math..

[4]  Ding-Xuan Zhou,et al.  Learning and approximation by Gaussians on Riemannian manifolds , 2009, Adv. Comput. Math..

[5]  Yiming Ying,et al.  Multi-kernel regularized classifiers , 2007, J. Complex..

[6]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[7]  Felipe Cucker,et al.  Learning Theory: An Approximation Theory Viewpoint: On the bias–variance problem , 2007 .

[8]  Felipe Cucker,et al.  Learning Theory: An Approximation Theory Viewpoint (Cambridge Monographs on Applied & Computational Mathematics) , 2007 .

[9]  Yuan Yao,et al.  On Complexity Issues of Online Learning Algorithms , 2010, IEEE Transactions on Information Theory.

[10]  Ingo Steinwart,et al.  Fast rates for support vector machines using Gaussian kernels , 2007, 0708.1838.

[11]  Ding-Xuan Zhou,et al.  SVM Soft Margin Classifiers: Linear Programming versus Quadratic Programming , 2005, Neural Computation.

[12]  S. Smale,et al.  Shannon sampling and function reconstruction from point values , 2004 .

[13]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[14]  Don R. Hush,et al.  An Explicit Description of the Reproducing Kernel Hilbert Spaces of Gaussian RBF Kernels , 2006, IEEE Transactions on Information Theory.

[15]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[16]  Constantin F. Aliferis,et al.  A theoretical characterization of linear SVM-based feature selection , 2004, ICML '04.

[17]  S. Smale,et al.  ESTIMATING THE APPROXIMATION ERROR IN LEARNING THEORY , 2003 .

[18]  Kurt Jetter,et al.  Approximation with polynomial kernels and SVM classifiers , 2006, Adv. Comput. Math..

[19]  Yiming Ying,et al.  Convergence analysis of online algorithms , 2007, Adv. Comput. Math..

[20]  Ding-Xuan Zhou,et al.  The covering number in learning theory , 2002, J. Complex..

[21]  Lorenzo Rosasco,et al.  Some Properties of Regularized Kernel Methods , 2004, J. Mach. Learn. Res..

[22]  Yiming Ying,et al.  Learnability of Gaussians with Flexible Variances , 2007, J. Mach. Learn. Res..

[23]  Yiming Ying,et al.  Support Vector Machine Soft Margin Classifiers: Error Analysis , 2004, J. Mach. Learn. Res..

[24]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[25]  Ding-Xuan Zhou,et al.  Capacity of reproducing kernel spaces in learning theory , 2003, IEEE Transactions on Information Theory.

[26]  Ding-Xuan Zhou,et al.  Fully online classification by regularization , 2007 .

[27]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[28]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[29]  Sayan Mukherjee,et al.  Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization , 2006, Adv. Comput. Math..

[30]  Dao-Hong Xiang,et al.  Parzen windows for multi-class classification , 2008, J. Complex..

[31]  Robert Schaback,et al.  Linearly constrained reconstruction of functions by kernels with applications to machine learning , 2006, Adv. Comput. Math..

[32]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[33]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[34]  H. Triebel,et al.  Function Spaces, Entropy Numbers, Differential Operators: Function Spaces , 1996 .

[35]  G. Wahba Spline models for observational data , 1990 .