Regularization Techniques and Suboptimal Solutions to Optimization Problems in Learning from Data

Various regularization techniques are investigated in supervised learning from data. Theoretical features of the associated optimization problems are studied, and sparse suboptimal solutions are searched for. Rates of approximate optimization are estimated for sequences of suboptimal solutions formed by linear combinations of n-tuples of computational units, and statistical learning bounds are derived. As hypothesis sets, reproducing kernel Hilbert spaces and their subsets are considered.

[1]  Shahar Mendelson,et al.  A Few Notes on Statistical Learning Theory , 2002, Machine Learning Summer School.

[2]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[3]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[4]  Tong Zhang,et al.  Sequential greedy approximation for certain convex optimization problems , 2003, IEEE Trans. Inf. Theory.

[5]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[6]  Federico Girosi,et al.  Regularization Theory, Radial Basis Functions and Networks , 1994 .

[7]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[8]  Abhimanyu Das,et al.  Algorithms for subset selection in linear regression , 2008, STOC.

[9]  D. Serre Matrices: Theory and Applications , 2002 .

[10]  Marcello Sanguineti,et al.  Learning with generalization capability by kernel methods of bounded complexity , 2005, J. Complex..

[11]  J. V. Hamme Generalized inverses of linear operators in Hilbert spaces , 1989 .

[12]  Yin Zhang,et al.  Fixed-Point Continuation for l1-Minimization: Methodology and Convergence , 2008, SIAM J. Optim..

[13]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[14]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[15]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[16]  H. Engl,et al.  Regularization of Inverse Problems , 1996 .

[17]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[18]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[19]  Marcello Sanguineti,et al.  Tight Bounds on Rates of Neural-Network Approximation , 2001, ICANN.

[20]  Federico Girosi,et al.  An Equivalence Between Sparse Approximation and Support Vector Machines , 1998, Neural Computation.

[21]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[22]  Jeffrey Rauch Partial Differential Equations , 2018, Explorations in Numerical Analysis.

[23]  J. Ortega Numerical Analysis: A Second Course , 1974 .

[24]  Volker Roth,et al.  The generalized LASSO , 2004, IEEE Transactions on Neural Networks.

[25]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[26]  K. Miller Least Squares Methods for Ill-Posed Problems with a Prescribed Bound , 1970 .

[27]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[28]  Y. Makovoz Random Approximants and Neural Networks , 1996 .

[29]  Pierre Vandergheynst,et al.  On the exponential convergence of matching pursuits in quasi-incoherent dictionaries , 2006, IEEE Transactions on Information Theory.

[30]  Vera Kurková,et al.  Neural Network Learning as an Inverse Problem , 2005, Log. J. IGPL.

[31]  Paul Honeine,et al.  On-line Nonlinear Sparse Approximation of Functions , 2007, 2007 IEEE International Symposium on Information Theory.

[32]  Lorenzo Rosasco,et al.  Elastic-net regularization in learning theory , 2008, J. Complex..

[33]  V. V. Vasin Relationship of several variational methods for the approximate solution of ill-posed problems , 1970 .

[34]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[35]  Felipe Cucker,et al.  Best Choices for Regularization Parameters in Learning Theory: On the Bias—Variance Problem , 2002, Found. Comput. Math..

[36]  Bernhard Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.

[37]  A. Friedman Foundations of modern analysis , 1970 .

[38]  S. Muthukrishnan,et al.  Approximation of functions over redundant dictionaries using coherence , 2003, SODA '03.

[39]  S. Mallat,et al.  Adaptive greedy approximations , 1997 .

[40]  Pascal Vincent,et al.  Kernel Matching Pursuit , 2002, Machine Learning.

[41]  Věra Kůrková,et al.  Dimension-Independent Rates of Approximation by Neural Networks , 1997 .

[42]  R. A. Silverman,et al.  Introductory Real Analysis , 1972 .

[43]  O. SIAMJ.,et al.  Error Estimates for Approximate Optimization by the Extended Ritz Method , 2005, SIAM J. Optim..

[44]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[45]  Marcello Sanguineti,et al.  On a Variational Norm Tailored to Variable-Basis Approximation Schemes , 2011, IEEE Transactions on Information Theory.

[46]  W. D. Evans,et al.  PARTIAL DIFFERENTIAL EQUATIONS , 1941 .

[47]  John Shawe-Taylor,et al.  Theory of matching pursuit , 2008, NIPS.

[48]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[49]  Vera Kurková,et al.  Learning from Data as an Optimization and Inverse Problem , 2010, IJCCI.

[50]  T. Zolezzi,et al.  Well-Posed Optimization Problems , 1993 .

[51]  Lorenzo Rosasco,et al.  Learning from Examples as an Inverse Problem , 2005, J. Mach. Learn. Res..

[52]  C. Berg,et al.  Harmonic Analysis on Semigroups , 1984 .

[53]  Giorgio Gnecco,et al.  Functional optimization by variable-basis approximation schemes , 2011, 4OR.

[54]  Roman Neruda,et al.  Uniqueness of Functional Representations by Gaussian Basis Function Networks , 1994 .

[55]  Marcello Sanguineti,et al.  Comparison of worst case errors in linear and neural network approximation , 2002, IEEE Trans. Inf. Theory.

[56]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[57]  Tong Zhang Approximation Bounds for Some Sparse Kernel Regression Algorithms , 2002, Neural Computation.

[58]  Tong Zhang,et al.  A General Greedy Approximation Algorithm with Applications , 2001, NIPS.

[59]  M. Bertero Linear Inverse and III-Posed Problems , 1989 .

[60]  T. Poggio,et al.  Regression and Classification with Regularization , 2003 .