Least Square Regression with lp-Coefficient Regularization

The selection of the penalty functional is critical for the performance of a regularized learning algorithm, and thus it deserves special attention. In this article, we present a least square regression algorithm based on lp-coefficient regularization. Comparing with the classical regularized least square regression, the new algorithm is different in the regularization term. Our primary focus is on the error analysis of the algorithm. An explicit learning rate is derived under some ordinary assumptions.

[1]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[2]  Vojislav Kecman,et al.  Support vectors selection by linear programming , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[3]  Felipe Cucker,et al.  Best Choices for Regularization Parameters in Learning Theory: On the Bias—Variance Problem , 2002, Found. Comput. Math..

[4]  A. Caponnetto,et al.  Optimal Rates for the Regularized Least-Squares Algorithm , 2007, Found. Comput. Math..

[5]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[6]  O. Mangasarian,et al.  Massive data discrimination via linear support vector machines , 2000 .

[7]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[8]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[9]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[10]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[11]  S. Geer,et al.  Classifiers of support vector machine type with \ell1 complexity regularization , 2006 .

[12]  Ding-Xuan Zhou,et al.  Learning with sample dependent hypothesis spaces , 2008, Comput. Math. Appl..

[13]  Ding-Xuan Zhou,et al.  Capacity of reproducing kernel spaces in learning theory , 2003, IEEE Transactions on Information Theory.

[14]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[15]  Ding-Xuan Zhou,et al.  LEARNING BY NONSYMMETRIC KERNELS WITH DATA DEPENDENT SPACES AND , 2010 .

[16]  Sara A. van de Geer,et al.  Classifiers of support vector machine type with \ell1 complexity regularization , 2006 .

[17]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[18]  Federico Girosi,et al.  On the Relationship between Generalization Error, Hypothesis Complexity, and Sample Complexity for Radial Basis Functions , 1996, Neural Computation.

[19]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[20]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[21]  S. Smale,et al.  ESTIMATING THE APPROXIMATION ERROR IN LEARNING THEORY , 2003 .

[22]  S. Smale,et al.  Shannon sampling II: Connections to learning theory , 2005 .

[23]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[24]  Ding-Xuan Zhou,et al.  SVM Soft Margin Classifiers: Linear Programming versus Quadratic Programming , 2005, Neural Computation.

[25]  S. Smale,et al.  Learning Theory Estimates via Integral Operators and Their Approximations , 2007 .

[26]  Guohui Song,et al.  Reproducing Kernel Banach Spaces with the ℓ1 Norm II: Error Analysis for Regularized Least Square Regression , 2011, Neural Computation.

[27]  Yiming Ying,et al.  Learning Rates of Least-Square Regularized Regression , 2006, Found. Comput. Math..