Linear Multi-Resource Allocation with Semi-Bandit Feedback

We study an idealised sequential resource allocation problem. In each time step the learner chooses an allocation of several resource types between a number of tasks. Assigning more resources to a task increases the probability that it is completed. The problem is challenging because the alignment of the tasks to the resource types is unknown and the feedback is noisy. Our main contribution is the new setting and an algorithm with nearly-optimal regret analysis. Along the way we draw connections to the problem of minimising regret for stochastic linear bandits with heteroscedastic noise. We also present some new results for stochastic linear bandits on the hypercube that significantly improve on existing work, especially in the sparse case.

[1]  G. Bennett Probability Inequalities for the Sum of Independent Random Variables , 1962 .

[2]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[3]  D. Freedman On Tail Probabilities for Martingales , 1975 .

[4]  Kristin P. Bennett,et al.  Bilinear separation of two sets inn-space , 1993, Comput. Optim. Appl..

[5]  T. Sowell Is Reality Optional?: And Other Essays , 1993 .

[6]  M. Habib Probabilistic methods for algorithmic discrete mathematics , 1998 .

[7]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[8]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[9]  Thomas P. Hayes,et al.  Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[10]  John N. Tsitsiklis,et al.  Linearly Parameterized Bandits , 2008, Math. Oper. Res..

[11]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[12]  Marek Petrik,et al.  Robust Approximate Bilinear Programming for Value Function Approximation , 2011, J. Mach. Learn. Res..

[13]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[14]  Csaba Szepesvári,et al.  Online-to-Confidence-Set Conversions and Application to Sparse Stochastic Bandits , 2012, AISTATS.

[15]  Shipra Agrawal,et al.  Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.

[16]  Koby Crammer,et al.  Optimal Resource Allocation with Semi-Bandit Feedback , 2014, UAI.

[17]  Wtt Wtt Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits , 2015 .

[18]  Zheng Wen,et al.  Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits , 2014, AISTATS.

[19]  Akshay Krishnamurthy,et al.  Efficient Contextual Semi-Bandit Learning , 2015, ArXiv.