Performance limits of single-agent and multi-agent sub-gradient stochastic learning

This work examines the performance of stochastic sub-gradient learning strategies, for both cases of stand-alone and networked agents, under weaker conditions than usually considered in the literature. It is shown that these conditions are automatically satisfied by several important cases of interest, including support-vector machines and sparsity-inducing learning solutions. The analysis establishes that sub-gradient strategies can attain exponential convergence rates, as opposed to sub-linear rates, and that they can approach the optimal solution within O(p), for sufficiently small step-sizes, p. A realizable exponential-weighting procedure is proposed to smooth the intermediate iterates and to guarantee these desirable performance properties.

[1]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[2]  Angelia Nedic,et al.  Incremental Stochastic Subgradient Algorithms for Convex Optimization , 2008, SIAM J. Optim..

[3]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[4]  Carl D. Meyer,et al.  Matrix Analysis and Applied Linear Algebra , 2000 .

[5]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[6]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[7]  Ali H. Sayed,et al.  Diffusion Strategies Outperform Consensus Strategies for Distributed Estimation Over Adaptive Networks , 2012, IEEE Transactions on Signal Processing.

[8]  K. Kiwiel Methods of Descent for Nondifferentiable Optimization , 1985 .

[9]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[10]  L. Rudin,et al.  Nonlinear total variation based noise removal algorithms , 1992 .

[11]  Soummya Kar,et al.  Distributed Consensus Algorithms in Sensor Networks With Imperfect Communication: Link Failures and Channel Noise , 2007, IEEE Transactions on Signal Processing.

[12]  Stephen P. Boyd,et al.  Stochastic Subgradient Methods , 2007 .

[13]  Ali H. Sayed,et al.  On the Learning Behavior of Adaptive Networks—Part II: Performance Analysis , 2013, IEEE Transactions on Information Theory.

[14]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[15]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[16]  Léon Bottou,et al.  Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.

[17]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[18]  Marc Teboulle,et al.  Fast Gradient-Based Algorithms for Constrained Total Variation Image Denoising and Deblurring Problems , 2009, IEEE Transactions on Image Processing.

[19]  Ali H. Sayed,et al.  Stability and Performance Limits of Adaptive Primal-Dual Networks , 2014, IEEE Transactions on Signal Processing.

[20]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[21]  Wenwu Yu,et al.  Distributed Consensus Filtering in Sensor Networks , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[22]  Ali H. Sayed,et al.  Performance Limits of Online Stochastic Sub-Gradient Learning , 2015, ArXiv.

[23]  Ali H. Sayed,et al.  On the Learning Behavior of Adaptive Networks—Part I: Transient Analysis , 2013, IEEE Transactions on Information Theory.

[24]  Ali H. Sayed,et al.  Adaptive Networks , 2014, Proceedings of the IEEE.

[25]  Ali Sayed,et al.  Adaptation, Learning, and Optimization over Networks , 2014, Found. Trends Mach. Learn..

[26]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[27]  Franziska Wulf,et al.  Minimization Methods For Non Differentiable Functions , 2016 .

[28]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[29]  Angelia Nedic,et al.  Distributed Stochastic Subgradient Projection Algorithms for Convex Optimization , 2008, J. Optim. Theory Appl..

[30]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.