论文信息 - Performance limits of single-agent and multi-agent sub-gradient stochastic learning

Performance limits of single-agent and multi-agent sub-gradient stochastic learning

This work examines the performance of stochastic sub-gradient learning strategies, for both cases of stand-alone and networked agents, under weaker conditions than usually considered in the literature. It is shown that these conditions are automatically satisfied by several important cases of interest, including support-vector machines and sparsity-inducing learning solutions. The analysis establishes that sub-gradient strategies can attain exponential convergence rates, as opposed to sub-linear rates, and that they can approach the optimal solution within O(p), for sufficiently small step-sizes, p. A realizable exponential-weighting procedure is proposed to smooth the intermediate iterates and to guarantee these desirable performance properties.

Ali H. Sayed | Bicheng Ying | Bicheng Ying | A. H. Sayed

[1] O. Nelles,et al. An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[2] Angelia Nedic,et al. Incremental Stochastic Subgradient Algorithms for Convex Optimization , 2008, SIAM J. Optim..

[3] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[4] Carl D. Meyer,et al. Matrix Analysis and Applied Linear Algebra , 2000 .

[5] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[6] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[7] Ali H. Sayed,et al. Diffusion Strategies Outperform Consensus Strategies for Distributed Estimation Over Adaptive Networks , 2012, IEEE Transactions on Signal Processing.

[8] K. Kiwiel. Methods of Descent for Nondifferentiable Optimization , 1985 .

[9] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.

[10] L. Rudin,et al. Nonlinear total variation based noise removal algorithms , 1992 .

[11] Soummya Kar,et al. Distributed Consensus Algorithms in Sensor Networks With Imperfect Communication: Link Failures and Channel Noise , 2007, IEEE Transactions on Signal Processing.

[12] Stephen P. Boyd,et al. Stochastic Subgradient Methods , 2007 .

[13] Ali H. Sayed,et al. On the Learning Behavior of Adaptive Networks—Part II: Performance Analysis , 2013, IEEE Transactions on Information Theory.

[14] Asuman E. Ozdaglar,et al. Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[15] Eric R. Ziegel,et al. The Elements of Statistical Learning , 2003, Technometrics.

[16] Léon Bottou,et al. Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.

[17] Yoram Singer,et al. Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[18] Marc Teboulle,et al. Fast Gradient-Based Algorithms for Constrained Total Variation Image Denoising and Deblurring Problems , 2009, IEEE Transactions on Image Processing.

[19] Ali H. Sayed,et al. Stability and Performance Limits of Adaptive Primal-Dual Networks , 2014, IEEE Transactions on Signal Processing.

[20] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .

[21] Wenwu Yu,et al. Distributed Consensus Filtering in Sensor Networks , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[22] Ali H. Sayed,et al. Performance Limits of Online Stochastic Sub-Gradient Learning , 2015, ArXiv.

[23] Ali H. Sayed,et al. On the Learning Behavior of Adaptive Networks—Part I: Transient Analysis , 2013, IEEE Transactions on Information Theory.

[24] Ali H. Sayed,et al. Adaptive Networks , 2014, Proceedings of the IEEE.

[25] Ali Sayed,et al. Adaptation, Learning, and Optimization over Networks , 2014, Found. Trends Mach. Learn..

[26] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[27] Franziska Wulf,et al. Minimization Methods For Non Differentiable Functions , 2016 .

[28] John C. Platt,et al. Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[29] Angelia Nedic,et al. Distributed Stochastic Subgradient Projection Algorithms for Convex Optimization , 2008, J. Optim. Theory Appl..

[30] B. Ripley,et al. Pattern Recognition , 1968, Nature.