Efficient Linear Bandits through Matrix Sketching

We prove that two popular linear contextual bandit algorithms, OFUL and Thompson Sampling, can be made efficient using Frequent Directions, a deterministic online sketching technique. More precisely, we show that a sketch of size $m$ allows a $\mathcal{O}(md)$ update time for both algorithms, as opposed to $\Omega(d^2)$ required by their non-sketched versions in general (where $d$ is the dimension of context vectors). This computational speedup is accompanied by regret bounds of order $(1+\varepsilon_m)^{3/2}d\sqrt{T}$ for OFUL and of order $\big((1+\varepsilon_m)d\big)^{3/2}\sqrt{T}$ for Thompson Sampling, where $\varepsilon_m$ is bounded by the sum of the tail eigenvalues not covered by the sketch. In particular, when the selected contexts span a subspace of dimension at most $m$, our algorithms have a regret bound matching that of their slower, non-sketched counterparts. Experiments on real-world datasets corroborate our theoretical results.

[1]  Luís Torgo,et al.  OpenML: networked science in machine learning , 2014, SKDD.

[2]  Claudio Gentile,et al.  A Gang of Bandits , 2013, NIPS.

[3]  David P. Woodruff,et al.  Frequent Directions: Simple and Deterministic Matrix Sketching , 2015, SIAM J. Comput..

[4]  David P. Woodruff Sketching as a Tool for Numerical Linear Algebra , 2014, Found. Trends Theor. Comput. Sci..

[5]  Michael R. Lyu,et al.  CBRAP: Contextual Bandits with RAndom Projection , 2017, AAAI.

[6]  Daniele Calandriello,et al.  Efficient Second-Order Online Kernel Learning with Adaptive Embedding , 2017, NIPS.

[7]  Shipra Agrawal,et al.  Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.

[8]  Nicholas J. Higham,et al.  Functions of matrices - theory and computation , 2008 .

[9]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[10]  Francesco Orabona,et al.  Solving Ridge Regression using Sketched Preconditioned SVRG , 2016, ICML.

[11]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[12]  Robert D. Nowak,et al.  Scalable Generalized Linear Bandits: Online Computation and Hashing , 2017, NIPS.

[13]  Thomas P. Hayes,et al.  Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[14]  Haipeng Luo,et al.  Efficient Second Order Online Learning by Sketching , 2016, NIPS.

[15]  Alessandro Lazaric,et al.  Linear Thompson Sampling Revisited , 2016, AISTATS.