论文信息 - Projected Stochastic Primal-Dual Method for Constrained Online Learning With Kernels

Projected Stochastic Primal-Dual Method for Constrained Online Learning With Kernels

We consider the problem of stochastic optimization with nonlinear constraints, where the decision variable is not vector-valued but instead a function belonging to a reproducing Kernel Hilbert Space (RKHS). Currently, there exist solutions to only special cases of this problem. To solve this constrained problem with kernels, we first generalize the Representer Theorem to a class of saddle-point problems defined over RKHS. Then, we develop a primal-dual method which that executes alternating projected primal/dual stochastic gradient descent/ascent on the dual-augmented Lagrangian of the problem. The primal projection sets are low-dimensional subspaces of the ambient function space, which are greedily constructed using matching pursuit. By tuning the projection-induced error to the algorithm step-size, we are able to establish mean convergence in both primal objective sub-optimality and constraint violation, to respective <inline-formula><tex-math notation="LaTeX">${\mathcal O}(\sqrt{T})$</tex-math></inline-formula> and <inline-formula><tex-math notation="LaTeX">${\mathcal O}(T^{3/4})$</tex-math></inline-formula> neighborhoods. Here, <inline-formula><tex-math notation="LaTeX">$T$</tex-math></inline-formula> is the final iteration index and the constant step-size is chosen as <inline-formula><tex-math notation="LaTeX">$1/\sqrt{T}$</tex-math></inline-formula> with <inline-formula><tex-math notation="LaTeX">$1/T$</tex-math></inline-formula> approximation budget. Finally, we demonstrate experimentally the effectiveness of the proposed method for risk-aware supervised learning.

Tamer Başar | Alec Koppel | Hao Zhu | Kaiqing Zhang

[1] Alejandro Ribeiro,et al. Navigation Functions for Convex Potentials in a Space With Convex Obstacles , 2016, IEEE Transactions on Automatic Control.

[2] Deanna Needell,et al. Linear Convergence of Stochastic Iterative Greedy Algorithms With Sparse Constraints , 2014, IEEE Transactions on Information Theory.

[3] Zhao Zhang,et al. Spectrum prediction and channel selection for sensing-based spectrum sharing scheme using online learning techniques , 2015, 2015 IEEE 26th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC).

[4] Ketan Rajawat,et al. EXACT NONPARAMETRIC DECENTRALIZED ONLINE OPTIMIZATION , 2018, 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[5] Sergios Theodoridis,et al. Online Learning in Reproducing Kernel Hilbert Spaces , 2014 .

[6] Ohad Shamir,et al. Spurious Local Minima are Common in Two-Layer ReLU Neural Networks , 2017, ICML.

[7] Shabbir Ahmed,et al. Convexity and decomposition of mean-risk stochastic programs , 2006, Math. Program..

[8] Alejandro Ribeiro,et al. Parsimonious Online Learning with Kernels via sparse projections in function space , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9] G. Wahba,et al. Some results on Tchebycheffian spline functions , 1971 .

[10] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[11] Vladimir I. Norkin,et al. On Stochastic Optimization and Statistical Learning in Reproducing Kernel Hilbert Spaces by Support Vector Machines (SVM) , 2009, Informatica.

[12] Cédric Richard,et al. Decentralized Online Learning With Kernels , 2017, IEEE Transactions on Signal Processing.

[13] S. Vajda. Studies in Linear and Non-Linear Programming. (Stanford Mathematical Studies in the Social Sciences.) By K. J. Arrow, L. Hurwicz, and H. Uzawa. Pp. 229. 60s. 1958. (Stanford Univ. Press) , 1960, The Mathematical Gazette.

[14] Koby Crammer,et al. Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training , 2012, J. Mach. Learn. Res..

[15] Andrew Packard,et al. Control Applications of Sum of Squares Programming , 2005 .

[16] Rajesh Arora,et al. Optimization: Algorithms and Applications , 2015 .

[17] David Ruppert,et al. Semiparametric regression during 2003-2007. , 2009, Electronic journal of statistics.

[18] Ji Zhu,et al. Kernel Logistic Regression and the Import Vector Machine , 2001, NIPS.

[19] Kevin P. Murphy,et al. Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[20] C. D. Bailey. Hamilton's principle and the calculus of variations , 1982 .

[21] Hisashi Tanizaki,et al. Nonlinear Filters: Estimation and Applications , 1993 .

[22] Neil D. Lawrence,et al. Gaussian Processes for Big Data , 2013, UAI.

[23] Bernhard Schölkopf,et al. A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[24] Alexander J. Smola,et al. Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[25] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[26] Hao Zhu,et al. Projected Stochastic Primal-Dual Method for Constrained Online Learning with Kernels , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[27] David M. Blei,et al. Variational Inference: A Review for Statisticians , 2016, ArXiv.

[28] Rong Jin,et al. Trading regret for efficiency: online convex optimization with long term constraints , 2011, J. Mach. Learn. Res..

[29] R. Bellman. Calculus of Variations (L. E. Elsgolc) , 1963 .

[30] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[31] Richard G. Baraniuk,et al. Random Filters for Compressive Sampling and Reconstruction , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[32] Pascal Vincent,et al. Kernel Matching Pursuit , 2002, Machine Learning.

[33] Nello Cristianini,et al. Kernel Methods for Pattern Analysis , 2004 .

[34] Sergios Theodoridis,et al. Adaptive Constrained Learning in Reproducing Kernel Hilbert Spaces: The Robust Beamforming Case , 2009, IEEE Transactions on Signal Processing.

[35] Angelia Nedic,et al. Subgradient Methods for Saddle-Point Problems , 2009, J. Optimization Theory and Applications.

[36] Alejandro Ribeiro,et al. Ergodic Stochastic Optimization Algorithms for Wireless Communication and Networking , 2010, IEEE Transactions on Signal Processing.

[37] Joel A. Tropp,et al. Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[38] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .

[39] R. Rockafellar,et al. Optimization of conditional value-at risk , 2000 .

[40] Stéphane Mallat,et al. Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[41] Byron Boots,et al. Functional Gradient Motion Planning in Reproducing Kernel Hilbert Spaces , 2016, Robotics: Science and Systems.

[42] Brian M. Sadler,et al. Proximity without consensus in online multi-agent optimization , 2016, ICASSP.

[43] Cédric Archambeau,et al. Online optimization and regret guarantees for non-additive long-term constraints , 2016, ArXiv.

[44] Amir-massoud Farahmand,et al. Learning Positive Functions in a Hilbert Space , 2015 .

[45] Alejandro Ribeiro,et al. Safe online navigation of convex potentials in spaces with convex obstacles , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[46] Simon Haykin,et al. Neural Networks: A Comprehensive Foundation , 1998 .

[47] Charles Richter,et al. Polynomial Trajectory Planning for Aggressive Quadrotor Flight in Dense Indoor Environments , 2016, ISRR.

[48] Alexander Shapiro,et al. Convex Approximations of Chance Constrained Programs , 2006, SIAM J. Optim..