Parametric Bandits: The Generalized Linear Case

We consider structured multi-armed bandit problems based on the Generalized Linear Model (GLM) framework of statistics. For these bandits, we propose a new algorithm, called GLM-UCB. We derive finite time, high probability bounds on the regret of the algorithm, extending previous analyses developed for the linear bandits to the non-linear case. The analysis highlights a key difficulty in generalizing linear bandit algorithms to the non-linear case, which is solved in GLM-UCB by focusing on the reward space rather than on the parameter space. Moreover, as the actual effectiveness of current parameterized bandit algorithms is often poor in practice, we provide a tuning method based on asymptotic arguments, which leads to significantly better practical performance. We present two numerical experiments on real-world data that illustrate the potential of the GLM-UCB approach.

[1]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[2]  John Langford,et al.  The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.

[3]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[4]  J. Langford,et al.  The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.

[5]  H. Robbins,et al.  Asymptotically efficient adaptive allocation rules , 1985 .

[6]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[7]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[8]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9]  Csaba Szepesvári,et al.  Tuning Bandit Algorithms in Stochastic Environments , 2007, ALT.

[10]  Ambuj Tewari,et al.  Efficient bandit algorithms for online multiclass prediction , 2008, ICML '08.

[11]  T. Lai,et al.  SELF-NORMALIZED PROCESSES: EXPONENTIAL INEQUALITIES, MOMENT BOUNDS AND ITERATED LOGARITHM LAWS , 2004, math/0410102.

[12]  John N. Tsitsiklis,et al.  Linearly Parameterized Bandits , 2008, Math. Oper. Res..

[13]  Kani Chen,et al.  Strong consistency of maximum quasi-likelihood estimators in generalized linear models with fixed and adaptive designs , 1999 .

[14]  Yasin Abbasi-Yadkori Forced-Exploration Based Algorithms for Playing in Stochastic Linear Bandits , 2009 .

[15]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[16]  Thomas P. Hayes,et al.  Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[17]  H. Vincent Poor,et al.  Bandit problems with side observations , 2005, IEEE Transactions on Automatic Control.

[18]  Deepayan Chakrabarti,et al.  Multi-armed bandit problems with dependent arms , 2007, ICML '07.