Trading Dynamic Regret for Model Complexity in Nonstationary Nonparametric Optimization

Online convex optimization against dynamic comparators in the literature is limited to linear models. In this work, we relax this requirement and propose a memory-efficient online universal function approximator based on compressed kernel methods. Our approach hinges upon viewing non-stationary learning as online convex optimization with dynamic comparators, for which performance is quantified by dynamic regret. Prior works control dynamic regret growth only for linear models. In contrast, we hypothesize actions belong to reproducing kernel Hilbert spaces (RKHS). We propose a functional variant of online gradient descent (OGD) operating in tandem with greedy subspace projections. Projections are necessary to surmount the fact that RKHS functions have complexity proportional to time. For this scheme, we establish sublinear dynamic regret growth in terms of the functional path length, and that the memory of the function sequence remains moderate. Experiments demonstrate the usefulness of the proposed technique for online nonlinear regression and classification problems with non-stationary data.

[1]  W. Rudin Principles of mathematical analysis , 1964 .

[2]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[3]  A. Paulraj,et al.  A simple scheme for transmit diversity using partial channel feedback , 1998, Conference Record of Thirty-Second Asilomar Conference on Signals, Systems and Computers (Cat. No.98CH36284).

[4]  Annette ten Teije,et al.  Subseries of Lecture Notes in Computer Science , 2016 .

[5]  Sergey Levine,et al.  Deep Online Learning via Meta-Learning: Continual Adaptation for Model-Based RL , 2018, ICLR.

[6]  Shahin Shahrampour,et al.  Online Optimization : Competing with Dynamic Comparators , 2015, AISTATS.

[7]  M. Mohri,et al.  Stability Bounds for Stationary φ-mixing and β-mixing Processes , 2010 .

[8]  Shalabh Bhatnagar,et al.  Two Timescale Stochastic Approximation with Controlled Markov noise , 2015, Math. Oper. Res..

[9]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[10]  G. Wahba,et al.  Some results on Tchebycheffian spline functions , 1971 .

[11]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[12]  D. Brillinger Time series - data analysis and theory , 1981, Classics in applied mathematics.

[13]  Alejandro Ribeiro,et al.  D4L: Decentralized dynamic discriminative dictionary learning , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14]  Pascal Vincent,et al.  Kernel Matching Pursuit , 2002, Machine Learning.

[15]  Rebecca Willett,et al.  Online Convex Optimization in Dynamic Environments , 2015, IEEE Journal of Selected Topics in Signal Processing.

[16]  Sergey Levine,et al.  Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm , 2017, ICLR.

[17]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[18]  Brian M. Sadler,et al.  Nonstationary Nonparametric Online Learning: Balancing Dynamic Regret and Model Parsimony , 2019, ArXiv.

[19]  R. S-A. Gatsaeva,et al.  On the representation of continuous functions of several variables as superpositions of continuous functions of one variable and addition , 2018 .

[20]  Omar Besbes,et al.  Non-Stationary Stochastic Optimization , 2013, Oper. Res..

[21]  Koby Crammer,et al.  Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training , 2012, J. Mach. Learn. Res..

[22]  Lei Xu,et al.  Input Convex Neural Networks : Supplementary Material , 2017 .

[23]  H. Akaike Fitting autoregressive models for prediction , 1969 .

[24]  Jooyoung Park,et al.  Universal Approximation Using Radial-Basis-Function Networks , 1991, Neural Computation.

[25]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[26]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[27]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[28]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[29]  Vivek S. Borkar,et al.  Stochastic approximation with 'controlled Markov' noise , 2006, Systems & control letters (Print).

[30]  Ji Zhu,et al.  Kernel Logistic Regression and the Import Vector Machine , 2001, NIPS.

[31]  Andreas Krause,et al.  Safe Exploration in Finite Markov Decision Processes with Gaussian Processes , 2016, NIPS.

[32]  Wolfram Burgard,et al.  OctoMap : A Probabilistic , Flexible , and Compact 3 D Map Representation for Robotic Systems , 2010 .

[33]  Le Song,et al.  Scalable Kernel Methods via Doubly Stochastic Gradients , 2014, NIPS.

[34]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[35]  Trung Le,et al.  Nonparametric Budgeted Stochastic Gradient Descent , 2016, AISTATS.

[36]  Ketan Rajawat,et al.  Tracking Moving Agents via Inexact Online Gradient Descent Algorithm , 2017, IEEE Journal of Selected Topics in Signal Processing.

[37]  C A Nelson,et al.  Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.

[38]  Ah Chung Tsoi,et al.  Universal Approximation Using Feedforward Neural Networks: A Survey of Some Existing Methods, and Some New Results , 1998, Neural Networks.

[39]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[40]  Stergios B. Fotopoulos,et al.  All of Nonparametric Statistics , 2007, Technometrics.

[41]  Aryan Mokhtari,et al.  Optimization in Dynamic Environments : Improved Regret Rates for Strongly Convex Problems , 2016 .

[42]  James R. Zeidler,et al.  Adaptive tracking of linear time-variant systems by extended RLS algorithms , 1997, IEEE Trans. Signal Process..

[43]  Georgios B. Giannakis,et al.  Random Feature-based Online Multi-kernel Learning in Environments with Unknown Dynamics , 2017, J. Mach. Learn. Res..

[44]  James M. Rehg,et al.  Learning Visual Object Categories for Robot Affordance Prediction , 2010, Int. J. Robotics Res..

[45]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[46]  V. Borkar Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .

[47]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[48]  Alec Koppel,et al.  Consistent online Gaussian process regression without the sample complexity bottleneck , 2019, Statistics and Computing.

[49]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[50]  Alejandro Ribeiro,et al.  Parsimonious Online Learning with Kernels via sparse projections in function space , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).