Streaming kernel regression with provably adaptive mean, variance, and regularization

We consider the problem of streaming kernel regression, when the observations arrive sequentially and the goal is to recover the underlying mean function, assumed to belong to an RKHS. The variance of the noise is not assumed to be known. In this context, we tackle the problem of tuning the regularization parameter adaptively at each time step, while maintaining tight confidence bounds estimates on the value of the mean function at each point. To this end, we first generalize existing results for finite-dimensional linear regression with fixed regularization and known variance to the kernel setup with a regularization parameter allowed to be a measurable function of past observations. Then, using appropriate self-normalized inequalities we build upper and lower bound estimates for the variance, leading to Bersntein-like concentration bounds. The later is used in order to define the adaptive regularization. The bounds resulting from our technique are valid uniformly over all observation points and all time steps, and are compared against the literature with numerical experiments. Finally, the potential of these tools is illustrated by an application to kernelized bandits, where we revisit the Kernel UCB and Kernel Thompson Sampling procedures, and show the benefits of the novel adaptive kernel tuning strategy.

[1]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[2]  Odalric-Ambrym Maillard Self-normalization techniques for streaming confident regression , 2016 .

[3]  Massimiliano Pontil,et al.  Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.

[4]  Alan Fern,et al.  Using trajectory data to improve bayesian optimization for reinforcement learning , 2014, J. Mach. Learn. Res..

[5]  Nathan Srebro,et al.  Explicit Approximations of the Gaussian Kernel , 2011, ArXiv.

[6]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[7]  Alessandro Lazaric,et al.  Linear Thompson Sampling Revisited , 2016, AISTATS.

[8]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[9]  M. Abramowitz,et al.  Handbook of Mathematical Functions With Formulas, Graphs and Mathematical Tables (National Bureau of Standards Applied Mathematics Series No. 55) , 1965 .

[10]  Fabio Tozeto Ramos,et al.  Bayesian Optimisation for informative continuous path planning , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[12]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[13]  Nello Cristianini,et al.  Finite-Time Analysis of Kernelised Contextual Bandits , 2013, UAI.

[14]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[15]  Nando de Freitas,et al.  Active Preference Learning with Discrete Choice Data , 2007, NIPS.

[16]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[17]  Shipra Agrawal,et al.  Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.

[18]  Andreas Krause,et al.  Contextual Gaussian Process Bandit Optimization , 2011, NIPS.

[19]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[20]  Wei Chu,et al.  Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[21]  Nando de Freitas,et al.  Theoretical Analysis of Bayesian Optimisation with Unknown Gaussian Process Hyper-Parameters , 2014, ArXiv.

[22]  S. Loustau Penalized empirical risk minimization over Besov spaces , 2009 .

[23]  Milton Abramowitz,et al.  Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .

[24]  S. Canu,et al.  M L ] 6 O ct 2 00 9 Functional learning through kernel , 2009 .