High-Dimensional Sparse Linear Bandits

Stochastic linear bandits with high-dimensional sparse features are a practical model for a variety of domains, including personalized medicine and online advertising. We derive a novel $\Omega(n^{2/3})$ dimension-free minimax regret lower bound for sparse linear bandits in the data-poor regime where the horizon is smaller than the ambient dimension and where the feature vectors admit a well-conditioned exploration distribution. This is complemented by a nearly matching upper bound for an explore-then-commit algorithm showing that that $\Theta(n^{2/3})$ is the optimal rate in the data-poor regime. The results complement existing bounds for the data-rich regime and provide another example where carefully balancing the trade-off between information and regret is necessary. Finally, we prove a dimension-free $O(\sqrt{n})$ regret upper bound under an additional assumption on the magnitude of the signal for relevant features.

[1]  Xue Wang,et al.  Minimax Concave Penalized Multi-Armed Bandit Model with High-Dimensional Convariates , 2018, ICML.

[2]  Zhiwei Steven Wu,et al.  Structured Linear Contextual Bandits: A Sharp and Geometric Smoothed Analysis , 2020, ICML.

[3]  Shipra Agrawal,et al.  Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.

[4]  Csaba Szepesvari,et al.  Bandit Algorithms , 2020 .

[5]  Koby Crammer,et al.  Linear Multi-Resource Allocation with Semi-Bandit Feedback , 2015, NIPS.

[6]  Clayton Scott,et al.  Simple Regret Minimization for Contextual Bandits , 2018, ArXiv.

[7]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[8]  YuBin,et al.  Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls , 2011 .

[9]  Mohsen Bayati,et al.  Online Decision-Making with High-Dimensional Covariates , 2015 .

[10]  Martin J. Wainwright,et al.  High-Dimensional Statistics , 2019 .

[11]  Benjamin Van Roy,et al.  Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..

[12]  Csaba Szepesvári,et al.  Online-to-Confidence-Set Conversions and Application to Sparse Stochastic Bandits , 2012, AISTATS.

[13]  Sara van de Geer,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .

[14]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[15]  Stephen P. Boyd,et al.  CVXPY: A Python-Embedded Modeling Language for Convex Optimization , 2016, J. Mach. Learn. Res..

[16]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[17]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[18]  Tor Lattimore,et al.  Adaptive Exploration in Linear Contextual Bandit , 2020, AISTATS.

[19]  Csaba Szepesvári,et al.  Partial Monitoring - Classification, Regret Bounds, and Algorithms , 2014, Math. Oper. Res..

[20]  Adel Javanmard,et al.  Confidence intervals and hypothesis testing for high-dimensional regression , 2013, J. Mach. Learn. Res..

[21]  Alexandre Proutière,et al.  Minimal Exploration in Structured Stochastic Bandits , 2017, NIPS.

[22]  Tor Lattimore,et al.  The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits , 2016, AISTATS.

[23]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[24]  Gi-Soo Kim,et al.  Doubly-Robust Lasso Bandit , 2019, NeurIPS.

[25]  Tor Lattimore,et al.  Learning with Good Feature Representations in Bandits and in RL with a Generative Model , 2020, ICML.

[26]  John N. Tsitsiklis,et al.  Linearly Parameterized Bandits , 2008, Math. Oper. Res..

[27]  Shuheng Zhou,et al.  25th Annual Conference on Learning Theory Reconstruction from Anisotropic Random Measurements , 2022 .

[28]  Rémi Munos,et al.  Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit , 2012, AISTATS.

[29]  Andrea Montanari,et al.  Linear bandits in high dimension and recommendation systems , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[30]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[31]  Adel Javanmard,et al.  Debiasing the lasso: Optimal sample size for Gaussian designs , 2015, The Annals of Statistics.

[32]  A. Belloni,et al.  Least Squares After Model Selection in High-Dimensional Sparse Models , 2009, 1001.0188.

[33]  Martin J. Wainwright,et al.  Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls , 2009, IEEE Transactions on Information Theory.

[34]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[35]  Wei Chu,et al.  Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[36]  Thomas P. Hayes,et al.  Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.