Spectrum bandit optimization

We consider the problem of allocating radio channels to links in a wireless network. Links interact through interference, modelled as a conflict graph (i.e., two interfering links cannot be simultaneously active on the same channel). We aim at identifying the channel allocation maximizing the total network throughput over a finite time horizon. Should we know the average radio conditions on each channel and on each link, an optimal allocation would be obtained by solving an Integer Linear Program (ILP). When radio conditions are unknown a priori, we look for a sequential channel allocation policy that converges to the optimal allocation while minimizing on the way the throughput loss or regret due to the need for exploring suboptimal allocations. We formulate this problem as a generic linear bandit problem, and analyze it in a stochastic setting where radio conditions are driven by a i.i.d. stochastic process, and in an adversarial setting where radio conditions can evolve arbitrarily. We provide, in both settings, algorithms whose regret upper bounds outperform those of existing algorithms.

[1]  Nicolò Cesa-Bianchi,et al.  Combinatorial Bandits , 2012, COLT.

[2]  Robert E. Schapire,et al.  Non-Stochastic Bandit Slate Problems , 2010, NIPS.

[3]  Manfred K. Warmuth,et al.  Learning Permutations with Exponential Weights , 2007, COLT.

[4]  Alexander Schrijver,et al.  Combinatorial optimization. Polyhedra and efficiency. , 2003 .

[5]  J. Moon,et al.  On cliques in graphs , 1965 .

[6]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[7]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[8]  Ananthram Swami,et al.  Distributed Algorithms for Learning and Cognitive Medium Access with Logarithmic Regret , 2010, IEEE Journal on Selected Areas in Communications.

[9]  Baruch Awerbuch,et al.  Online linear optimization and adaptive routing , 2008, J. Comput. Syst. Sci..

[10]  Naumaan Nayyar,et al.  Multi-player multi-armed bandits: Decentralized learning with IID rewards , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[11]  Dinan Gunawardena,et al.  Dynamic channel, rate selection and scheduling for white spaces , 2011, CoNEXT '11.

[12]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 1985 .

[13]  Imre Csiszár,et al.  Information Theory and Statistics: A Tutorial , 2004, Found. Trends Commun. Inf. Theory.

[14]  T. L. Graves,et al.  Asymptotically Efficient Adaptive Choice of Control Laws inControlled Markov Chains , 1997 .

[15]  Naumaan Nayyar,et al.  Decentralized learning for multi-player multi-armed bandits , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[16]  Tamás Linder,et al.  The Shortest Path Problem Under Partial Monitoring , 2006, COLT.

[17]  Bhaskar Krishnamachari,et al.  Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations , 2010, IEEE/ACM Transactions on Networking.

[18]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[19]  John N. Tsitsiklis,et al.  Linearly Parameterized Bandits , 2008, Math. Oper. Res..

[20]  H. Vincent Poor,et al.  Cognitive Medium Access: Exploration, Exploitation, and Competition , 2007, IEEE Transactions on Mobile Computing.