ST ] 1 D ec 2 01 5 Consistent Learning by Composite Proximal Thresholding ∗

We investigate the modeling and the numerical solution of machine learning problems with prediction functions which are linear combinations of elements of a possibly infinite-dimensional dictionary. We propose a novel flexible composite regularization model, which makes it possible to incorporate various priors on the coefficients of the prediction function, including sparsity and hard constraints. We show that the estimators obtained by minimizing the regularized empirical risk are consistent in a statistical sense, and we design an error-tolerant composite proximal thresholding algorithm for computing such estimators. New results on the asymptotic behavior of the proximal forward-backward splitting method are derived and exploited to establish the convergence properties of the proposed algorithm. In particular, our method features a o(1/m) convergence rate in objective values.

[1]  Valérie R. Wajs,et al.  A variational formulation for frame-based inverse problems , 2007 .

[2]  Patrick L. Combettes,et al.  Proximal Thresholding Algorithm for Minimization over Orthonormal Bases , 2007, SIAM J. Optim..

[3]  J. Pesquet,et al.  Wavelet thresholding for some classes of non–Gaussian noise , 2002 .

[4]  K. Bredies A forward–backward splitting algorithm for the minimization of non-smooth convex functionals in Banach space , 2008, 0807.0778.

[5]  Damek Davis,et al.  Convergence Rate Analysis of Several Splitting Schemes , 2014, 1406.4834.

[6]  V. Yurinsky Sums and Gaussian Vectors , 1995 .

[7]  C. Zălinescu Convex analysis in general vector spaces , 2002 .

[8]  Luca Baldassarre,et al.  Accelerated and Inexact Forward-Backward Algorithms , 2013, SIAM J. Optim..

[9]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[10]  C. Zălinescu,et al.  On Uniform Convexity, Total Convexity and Convergence of the Proximal Point and Outer Bregman Projection Algorithms in Banach Spaces , 2003 .

[11]  Patrick L. Combettes,et al.  Strong Convergence of Block-Iterative Outer Approximation Methods for Convex Optimization , 2000, SIAM J. Control. Optim..

[12]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[13]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[14]  Lorenzo Rosasco,et al.  Elastic-net regularization in learning theory , 2008, J. Complex..

[15]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[16]  Wenjiang J. Fu Penalized Regressions: The Bridge versus the Lasso , 1998 .

[17]  Yurii Nesterov,et al.  Gradient methods for minimizing composite functions , 2012, Mathematical Programming.

[18]  Patrick L. Combettes,et al.  Consistency of Regularized Learning Schemes in Banach Spaces , 2014 .

[19]  Lorenzo Rosasco,et al.  Some Properties of Regularized Kernel Methods , 2004, J. Mach. Learn. Res..

[20]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[21]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[22]  Patrick L. Combettes,et al.  Signal Recovery by Proximal Forward-Backward Splitting , 2005, Multiscale Model. Simul..

[23]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[24]  Mark W. Schmidt,et al.  Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization , 2011, NIPS.

[25]  V. Koltchinskii Sparsity in penalized empirical risk minimization , 2009 .

[26]  Hédy Attouch,et al.  Viscosity Solutions of Minimization Problems , 1996, SIAM J. Optim..

[27]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[28]  Lorenzo Rosasco,et al.  Learning from Examples as an Inverse Problem , 2005, J. Mach. Learn. Res..

[29]  J. Moreau Fonctions convexes duales et points proximaux dans un espace hilbertien , 1962 .

[30]  Saverio Salzo,et al.  Inexact and accelerated proximal point algorithms , 2011 .

[31]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[32]  V. R. Wajs,et al.  DECOMPOSITIONS ET ALGORITHMES PROXIMAUX POUR L'ANALYSE ET LE TRAITEMENT ITERATIF DES SIGNAUX , 2007 .

[33]  J.-C. Pesquet,et al.  A Douglas–Rachford Splitting Approach to Nonsmooth Convex Variational Signal Recovery , 2007, IEEE Journal of Selected Topics in Signal Processing.