Adaptive Kernel Learning in Heterogeneous Networks

We consider learning in decentralized heterogeneous networks: agents seek to minimize a convex functional that aggregates data across the network, while only having access to their local data streams. We focus on the case where agents seek to estimate a regression \emph{function} that belongs to a reproducing kernel Hilbert space (RKHS). To incentivize coordination while respecting network heterogeneity, we impose nonlinear proximity constraints. To solve the constrained stochastic program, we propose applying a functional variant of stochastic primal-dual (Arrow-Hurwicz) method which yields a decentralized algorithm. To handle the fact that agents' functions have complexity proportional to time (owing to the RKHS parameterization), we project the primal iterates onto subspaces greedily constructed from kernel evaluations of agents' local observations. The resulting scheme, dubbed Heterogeneous Adaptive Learning with Kernels (HALK), when used with constant step-sizes, yields $\ccalO(\sqrt{T})$ attenuation in sub-optimality and exactly satisfies the constraints in the long run, which improves upon the state of the art rates for vector-valued problems.

[1]  Alejandro Ribeiro,et al.  Parsimonious Online Learning with Kernels via sparse projections in function space , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[3]  Angelia Nedic,et al.  Subgradient Methods for Saddle-Point Problems , 2009, J. Optimization Theory and Applications.

[4]  Sudheendra Hangal,et al.  PrPl: a decentralized social networking infrastructure , 2010, MCS '10.

[5]  A. Zygmund,et al.  Measure and integral : an introduction to real analysis , 1977 .

[6]  Xiaohan Wei,et al.  Online Convex Optimization with Stochastic Constraints , 2017, NIPS.

[7]  Michael M. Zavlanos,et al.  Distributed primal-dual methods for online constrained optimization , 2016, 2016 American Control Conference (ACC).

[8]  Ding-Xuan Zhou,et al.  The covering number in learning theory , 2002, J. Complex..

[9]  Kirk Martinez,et al.  Environmental Sensor Networks: A revolution in the earth system science? , 2006 .

[10]  Angelia Nedic,et al.  Distributed Stochastic Subgradient Projection Algorithms for Convex Optimization , 2008, J. Optim. Theory Appl..

[11]  Guanghui Lan,et al.  Algorithms for stochastic optimization with expectation constraints , 2016, 1604.03887.

[12]  Vladimir I. Norkin,et al.  On Stochastic Optimization and Statistical Learning in Reproducing Kernel Hilbert Spaces by Support Vector Machines (SVM) , 2009, Informatica.

[13]  Ketan Rajawat,et al.  EXACT NONPARAMETRIC DECENTRALIZED ONLINE OPTIMIZATION , 2018, 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[14]  Joel J. P. C. Rodrigues,et al.  Wireless Sensor Networks: a Survey on Environmental Monitoring , 2011, J. Commun..

[15]  R. S-A. Gatsaeva,et al.  On the representation of continuous functions of several variables as superpositions of continuous functions of one variable and addition , 2018 .

[16]  Hao Yu,et al.  A Low Complexity Algorithm with O(√T) Regret and O(1) Constraint Violations for Online Convex Optimization with Long Term Constraints , 2020, J. Mach. Learn. Res..

[17]  Prateek Jain,et al.  Non-convex Optimization for Machine Learning , 2017, Found. Trends Mach. Learn..

[18]  H. Robbins A Stochastic Approximation Method , 1951 .

[19]  Ali H. Sayed,et al.  Distributed processing over adaptive networks , 2007, 2007 9th International Symposium on Signal Processing and Its Applications.

[20]  Deanna Needell,et al.  Greedy signal recovery review , 2008, 2008 42nd Asilomar Conference on Signals, Systems and Computers.

[21]  José Carlos Príncipe,et al.  Information theoretic learning with adaptive kernels , 2011, Signal Process..

[22]  Vijay Kumar,et al.  A Multi-robot Control Policy for Information Gathering in the Presence of Unknown Hazards , 2011, ISRR.

[23]  Qing Ling,et al.  On the Linear Convergence of the ADMM in Decentralized Consensus Optimization , 2013, IEEE Transactions on Signal Processing.

[24]  Mehran Mesbahi,et al.  Online distributed optimization via dual averaging , 2013, 52nd IEEE Conference on Decision and Control.

[25]  Alexander Shapiro,et al.  Lectures on Stochastic Programming: Modeling and Theory , 2009 .

[26]  Brian M. Sadler,et al.  Proximity Without Consensus in Online Multiagent Optimization , 2016, IEEE Transactions on Signal Processing.

[27]  Tamer Başar,et al.  Projected Stochastic Primal-Dual Method for Constrained Online Learning With Kernels , 2019, IEEE Transactions on Signal Processing.

[28]  Richard M. Murray,et al.  Consensus problems in networks of agents with switching topology and time-delays , 2004, IEEE Transactions on Automatic Control.

[29]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[30]  Jianjun Yuan,et al.  Online Convex Optimization for Cumulative Constraints , 2018, NeurIPS.

[31]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[32]  R. Pemantle,et al.  Nonconvergence to Unstable Points in Urn Models and Stochastic Approximations , 1990 .

[33]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[34]  S. Mitter,et al.  Recursive stochastic algorithms for global optimization in R d , 1991 .

[35]  Martin J. Wainwright,et al.  Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.

[36]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[37]  Cédric Richard,et al.  Decentralized Online Learning With Kernels , 2017, IEEE Transactions on Signal Processing.

[38]  Ketan Rajawat,et al.  Asynchronous Saddle Point Algorithm for Stochastic Optimization in Heterogeneous Networks , 2019, IEEE Transactions on Signal Processing.

[39]  Sergios Theodoridis,et al.  Special Issue on Advances in Kernel-Based Learning for Signal Processing , 2013 .

[40]  Koby Crammer,et al.  Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training , 2012, J. Mach. Learn. Res..

[41]  Ketan Rajawat,et al.  Asynchronous Online Learning in Multi-Agent Systems With Proximity Constraints , 2019, IEEE Transactions on Signal and Information Processing over Networks.

[42]  Avinash N. Madavan,et al.  Subgradient Methods for Risk-Sensitive Optimization , 2019 .

[43]  G. Wahba,et al.  Some results on Tchebycheffian spline functions , 1971 .

[44]  Jeng-Shyang Pan,et al.  Kernel Learning Algorithms for Face Recognition , 2013 .

[45]  Pedro Sánchez,et al.  Wireless Sensor Networks for Oceanographic Monitoring: A Systematic Review , 2010, Sensors.

[46]  Cédric Archambeau,et al.  Adaptive Algorithms for Online Convex Optimization with Long-term Constraints , 2015, ICML.

[47]  Y. C. Pati,et al.  Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[48]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[49]  Pascal Vincent,et al.  Kernel Matching Pursuit , 2002, Machine Learning.

[50]  Alejandro Ribeiro,et al.  A Saddle Point Algorithm for Networked Online Convex Optimization , 2014, IEEE Transactions on Signal Processing.

[51]  Rong Jin,et al.  Trading regret for efficiency: online convex optimization with long term constraints , 2011, J. Mach. Learn. Res..

[52]  Sergios Theodoridis,et al.  Online Learning in Reproducing Kernel Hilbert Spaces , 2014 .

[53]  B. V. Dean,et al.  Studies in Linear and Non-Linear Programming. , 1959 .

[54]  Shie Mannor,et al.  The kernel recursive least-squares algorithm , 2004, IEEE Transactions on Signal Processing.

[55]  Christian P. Robert,et al.  Statistics for Spatio-Temporal Data , 2014 .

[56]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[57]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[58]  Brian M. Sadler,et al.  Information retrieval and processing in sensor networks: deterministic scheduling vs. random access , 2004, International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings..

[59]  Ketan Rajawat,et al.  Asynchronous Incremental Stochastic Dual Descent Algorithm for Network Resource Allocation , 2017, IEEE Transactions on Signal Processing.

[60]  Brian M. Sadler,et al.  Source localization with distributed sensor arrays and partial spatial coherence , 2004, IEEE Transactions on Signal Processing.