论文信息 - Efficient Linear Bandits through Matrix Sketching

Efficient Linear Bandits through Matrix Sketching

We prove that two popular linear contextual bandit algorithms, OFUL and Thompson Sampling, can be made efficient using Frequent Directions, a deterministic online sketching technique. More precisely, we show that a sketch of size $m$ allows a $\mathcal{O}(md)$ update time for both algorithms, as opposed to $\Omega(d^2)$ required by their non-sketched versions in general (where $d$ is the dimension of context vectors). This computational speedup is accompanied by regret bounds of order $(1+\varepsilon_m)^{3/2}d\sqrt{T}$ for OFUL and of order $\big((1+\varepsilon_m)d\big)^{3/2}\sqrt{T}$ for Thompson Sampling, where $\varepsilon_m$ is bounded by the sum of the tail eigenvalues not covered by the sketch. In particular, when the selected contexts span a subspace of dimension at most $m$, our algorithms have a regret bound matching that of their slower, non-sketched counterparts. Experiments on real-world datasets corroborate our theoretical results.

[1] Luís Torgo,et al. OpenML: networked science in machine learning , 2014, SKDD.

[2] Claudio Gentile,et al. A Gang of Bandits , 2013, NIPS.

[3] David P. Woodruff,et al. Frequent Directions: Simple and Deterministic Matrix Sketching , 2015, SIAM J. Comput..

[4] David P. Woodruff. Sketching as a Tool for Numerical Linear Algebra , 2014, Found. Trends Theor. Comput. Sci..

[5] Michael R. Lyu,et al. CBRAP: Contextual Bandits with RAndom Projection , 2017, AAAI.

[6] Daniele Calandriello,et al. Efficient Second-Order Online Kernel Learning with Adaptive Embedding , 2017, NIPS.

[7] Shipra Agrawal,et al. Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.

[8] Nicholas J. Higham,et al. Functions of matrices - theory and computation , 2008 .

[9] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[10] Francesco Orabona,et al. Solving Ridge Regression using Sketched Preconditioned SVRG , 2016, ICML.

[11] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[12] Robert D. Nowak,et al. Scalable Generalized Linear Bandits: Online Computation and Hashing , 2017, NIPS.

[13] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[14] Haipeng Luo,et al. Efficient Second Order Online Learning by Sketching , 2016, NIPS.

[15] Alessandro Lazaric,et al. Linear Thompson Sampling Revisited , 2016, AISTATS.