Optimally Compressed Nonparametric Online Learning

Batch training of machine learning models based on neural networks is now well established, whereas to date streaming methods are largely based on linear models. To go beyond linear in the online setting, nonparametric methods are of interest due to their universality and ability to stably incorporate new information via convexity or Bayes' Rule. Unfortunately, when used online, nonparametric methods suffer a "curse of dimensionality" which precludes their use: their complexity scales at least with the time index. We survey online compression tools which bring their memory under control and attain approximate convergence. The asymptotic bias depends on a compression parameter that trades off memory and accuracy. Further, the applications to robotics, communications, economics, and power are discussed, as well as extensions to multi-agent systems.

[1]  R. S-A. Gatsaeva,et al.  On the representation of continuous functions of several variables as superpositions of continuous functions of one variable and addition , 2018 .

[2]  Mengdi Wang,et al.  Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions , 2014, Mathematical Programming.

[3]  Ketan Rajawat,et al.  EXACT NONPARAMETRIC DECENTRALIZED ONLINE OPTIMIZATION , 2018, 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[4]  Alec Koppel,et al.  Nonparametric Compositional Stochastic Optimization , 2019, 1902.06011.

[5]  Koby Crammer,et al.  Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training , 2012, J. Mach. Learn. Res..

[6]  Cédric Richard,et al.  Decentralized Online Learning With Kernels , 2017, IEEE Transactions on Signal Processing.

[7]  Slobodan Vucetic,et al.  Online Passive-Aggressive Algorithms on a Budget , 2010, AISTATS.

[8]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[9]  Trung Le,et al.  Nonparametric Budgeted Stochastic Gradient Descent , 2016, AISTATS.

[10]  Deanna Needell,et al.  Greedy signal recovery review , 2008, 2008 42nd Asilomar Conference on Signals, Systems and Computers.

[11]  Shabbir Ahmed,et al.  Convexity and decomposition of mean-risk stochastic programs , 2006, Math. Program..

[12]  Michael Rabadi,et al.  Kernel Methods for Machine Learning , 2015 .

[13]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[14]  Angelia Nedic,et al.  Distributed optimization over time-varying directed graphs , 2013, 52nd IEEE Conference on Decision and Control.

[15]  Trung Le,et al.  Dual Space Gradient Descent for Online Learning , 2016, NIPS.

[16]  Manfred Morari,et al.  Model predictive control: Theory and practice - A survey , 1989, Autom..

[17]  Pascal Vincent,et al.  Kernel Matching Pursuit , 2002, Machine Learning.

[18]  Philippe Artzner,et al.  Coherent Measures of Risk , 1999 .

[19]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[20]  Petar M. Djuric,et al.  Adapting the Number of Particles in Sequential Monte Carlo Methods Through an Online Scheme for Convergence Assessment , 2015, IEEE Transactions on Signal Processing.

[21]  AI Koan,et al.  Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.

[22]  Ji Zhu,et al.  Kernel Logistic Regression and the Import Vector Machine , 2001, NIPS.

[23]  Daniel Pérez Palomar,et al.  Demand-Side Management via Distributed Energy Generation and Storage Optimization , 2013, IEEE Transactions on Smart Grid.

[24]  Brian M. Sadler,et al.  Proximity Without Consensus in Online Multiagent Optimization , 2016, IEEE Transactions on Signal Processing.

[25]  Torsten Koller,et al.  Learning-based Model Predictive Control for Safe Exploration and Reinforcement Learning , 2019, ArXiv.

[26]  Alec Koppel,et al.  Consistent online Gaussian process regression without the sample complexity bottleneck , 2019, Statistics and Computing.

[27]  Alejandro Ribeiro,et al.  Sparse Learning of Parsimonious Reproducing Kernel Hilbert Space Models , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Alejandro Ribeiro,et al.  Ergodic Stochastic Optimization Algorithms for Wireless Communication and Networking , 2010, IEEE Transactions on Signal Processing.

[29]  Ignacio Santamaría,et al.  Multiple Importance Sampling for Efficient Symbol Error Rate Estimation , 2019, IEEE Signal Processing Letters.

[30]  Qing Ling,et al.  EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, 1404.6264.

[31]  Michael D. Lemmon,et al.  Event-triggered distributed optimization in sensor networks , 2009, 2009 International Conference on Information Processing in Sensor Networks.

[32]  H. M. Markowitz Approximating Expected Utility by a Function of Mean and Variance , 2016 .

[33]  Alejandro Ribeiro,et al.  Parsimonious Online Learning with Kernels via sparse projections in function space , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[34]  Ghassan Kawas Kaleh,et al.  Joint parameter estimation and symbol detection for linear or nonlinear unknown channels , 1994, IEEE Trans. Commun..

[35]  Shie Mannor,et al.  The kernel recursive least-squares algorithm , 2004, IEEE Transactions on Signal Processing.

[36]  Nicholas G. Polson,et al.  Particle Filtering , 2006 .