Sequential Experimental Design for Transductive Linear Bandits

In this paper we introduce the transductive linear bandit problem: given a set of measurement vectors $\mathcal{X}\subset \mathbb{R}^d$, a set of items $\mathcal{Z}\subset \mathbb{R}^d$, a fixed confidence $\delta$, and an unknown vector $\theta^{\ast}\in \mathbb{R}^d$, the goal is to infer $\text{argmax}_{z\in \mathcal{Z}} z^\top\theta^\ast$ with probability $1-\delta$ by making as few sequentially chosen noisy measurements of the form $x^\top\theta^{\ast}$ as possible. When $\mathcal{X}=\mathcal{Z}$, this setting generalizes linear bandits, and when $\mathcal{X}$ is the standard basis vectors and $\mathcal{Z}\subset \{0,1\}^d$, combinatorial bandits. Such a transductive setting naturally arises when the set of measurement vectors is limited due to factors such as availability or cost. As an example, in drug discovery the compounds and dosages $\mathcal{X}$ a practitioner may be willing to evaluate in the lab in vitro due to cost or safety reasons may differ vastly from those compounds and dosages $\mathcal{Z}$ that can be safely administered to patients in vivo. Alternatively, in recommender systems for books, the set of books $\mathcal{X}$ a user is queried about may be restricted to well known best-sellers even though the goal might be to recommend more esoteric titles $\mathcal{Z}$. In this paper, we provide instance-dependent lower bounds for the transductive setting, an algorithm that matches these up to logarithmic factors, and an evaluation. In particular, we provide the first non-asymptotic algorithm for linear bandits that nearly achieves the information theoretic lower bound.

[1]  Rémi Munos,et al.  Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.

[2]  J. S. Hunter,et al.  Statistics for Experimenters: Design, Innovation, and Discovery , 2006 .

[3]  R. Tyrrell Rockafellar,et al.  Convex Analysis , 1970, Princeton Landmarks in Mathematics and Physics.

[4]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[5]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[6]  Oren Somekh,et al.  Almost Optimal Exploration in Multi-Armed Bandits , 2013, ICML.

[7]  J. Kiefer,et al.  The Equivalence of Two Extremum Problems , 1960, Canadian Journal of Mathematics.

[8]  Marta Soare,et al.  Sequential Resource Allocation in Linear Stochastic Bandits , 2015 .

[9]  Zohar S. Karnin Verification Based Solution for Structured MAB Problems , 2016, NIPS.

[10]  Matthew Malloy,et al.  lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits , 2013, COLT.

[11]  Masashi Sugiyama,et al.  Fully adaptive algorithm for pure exploration in linear bandits , 2017, 1710.05552.

[12]  Jian Li,et al.  On the Optimal Sample Complexity for Best Arm Identification , 2015, ArXiv.

[13]  Yuanzhi Li,et al.  Near-optimal discrete optimization for experimental design: a regret minimization approach , 2017, Mathematical Programming.

[14]  Jian Li,et al.  Pure Exploration of Multi-armed Bandit Under Matroid Constraints , 2016, COLT.

[15]  Shie Mannor,et al.  Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..

[16]  Jinbo Bi,et al.  Active learning via transductive experimental design , 2006, ICML.

[17]  Aurélien Garivier,et al.  On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..

[18]  F. Pukelsheim Optimal Design of Experiments , 1993 .

[19]  Nando de Freitas,et al.  On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning , 2014, AISTATS.

[20]  Stefano Ermon,et al.  Best arm identification in multi-armed bandits with delayed feedback , 2018, AISTATS.

[21]  Alessandro Lazaric,et al.  Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence , 2012, NIPS.

[22]  Tor Lattimore,et al.  The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits , 2016, AISTATS.

[23]  Akshay Krishnamurthy,et al.  Disagreement-Based Combinatorial Pure Exploration: Sample Complexity Bounds and an Efficient Algorithm , 2017, COLT.

[24]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[25]  Ruosong Wang,et al.  Nearly Optimal Sampling Algorithms for Combinatorial Pure Exploration , 2017, COLT.

[26]  Thomas P. Hayes,et al.  Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[27]  Akshay Krishnamurthy,et al.  Disagreement-based combinatorial pure exploration: Efficient algorithms and an analysis with localization , 2017, ArXiv.

[28]  Alessandro Lazaric,et al.  Best-Arm Identification in Linear Bandits , 2014, NIPS.

[29]  Lawrence M. Wein,et al.  Best Arm Identification in Generalized Linear Bandits , 2019, Oper. Res. Lett..

[30]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[31]  Yi Liu,et al.  An Efficient Bandit Algorithm for Realtime Multivariate Optimization , 2017, KDD.

[32]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[33]  Yuan Zhou,et al.  Best Arm Identification in Linear Bandits with Linear Dimension Dependency , 2018, ICML.

[34]  Wei Chen,et al.  Combinatorial Pure Exploration of Multi-Armed Bandits , 2014, NIPS.

[35]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.