Joint Channel Selection and Power Control in Infrastructureless Wireless Networks: A Multiplayer Multiarmed Bandit Framework

This paper deals with the problem of efficient resource allocation in dynamic infrastructureless wireless networks. In a reactive interference-limited scenario, at each transmission trial, every transmitter selects a frequency channel from some common pool, together with a power level. As a result, for all transmitters, not only the fading gain, but the number and the power of interfering transmissions as well, vary over time. Due to the absence of a central controller and time varying network characteristics, it is highly inefficient for transmitters to acquire the global channel and network knowledge. Therefore, given no information, each transmitter selfishly intends to maximize its average reward, which is a function of the channel quality, as well as the joint selection profile of all transmitters. This scenario is modeled as an adversarial multiplayer multiarmed bandit game, where players attempt to minimize their so-called regret, while, at the network side, achieving equilibrium in some sense. Based on this model and to solve the resource allocation problem, in this paper, we develop two joint power level and channel selection strategies. We prove that the gap between the average rewards achieved by our approaches and that based on the best fixed strategy converges to zero asymptotically. Moreover, the empirical joint frequencies of the game converge to the set of correlated equilibria, which is characterized for two relaxed versions of the designed game.

[1]  Gábor Lugosi,et al.  Internal Regret in On-Line Portfolio Selection , 2005, Machine Learning.

[2]  José Niño-Mora,et al.  Sensor scheduling for hunting elusive hiding targets via whittle's restless bandit index policy , 2011, International Conference on NETwork Games, Control and Optimization (NetGCooP 2011).

[3]  Setareh Maghsudi,et al.  A hybrid centralized-decentralized resource allocation scheme for two-hop transmission , 2011, 2011 8th International Symposium on Wireless Communication Systems.

[4]  Naumaan Nayyar,et al.  Decentralized Learning for Multiplayer Multiarmed Bandits , 2014, IEEE Transactions on Information Theory.

[5]  Yang Liu,et al.  A new Q-learning algorithm based on the metropolis criterion , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[6]  Alagan Anpalagan,et al.  Opportunistic Spectrum Access with Spatial Reuse: Graphical Game and Uncoupled Learning Solutions , 2013, IEEE Transactions on Wireless Communications.

[7]  James Hannan,et al.  4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[8]  Abraham Neyman,et al.  Correlated equilibrium and potential games , 1997, Int. J. Game Theory.

[9]  Setareh Maghsudi,et al.  Relay selection with no side information: An adversarial bandit approach , 2013, 2013 IEEE Wireless Communications and Networking Conference (WCNC).

[10]  Tapio Elomaa,et al.  Following the Perturbed Leader to Gamble at Multi-armed Bandits , 2007, ALT.

[11]  Naumaan Nayyar,et al.  Decentralized learning for multi-player multi-armed bandits , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[12]  Shi Jin,et al.  Performance enhanced transmission in device-to-device communications: Beamforming or interference cancellation? , 2012, 2012 IEEE Global Communications Conference (GLOBECOM).

[13]  Bhaskar Krishnamachari,et al.  Decentralized multi-armed bandit with imperfect observations , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[14]  S. Haykin,et al.  A Q-learning-based dynamic channel assignment technique for mobile communication systems , 1999 .

[15]  Ao Tang,et al.  Opportunistic Spectrum Access with Multiple Users: Learning under Competition , 2010, 2010 Proceedings IEEE INFOCOM.

[16]  Bhaskar Krishnamachari,et al.  Distributed learning under imperfect sensing in cognitive radio networks , 2010, 2010 Conference Record of the Forty Fourth Asilomar Conference on Signals, Systems and Computers.

[17]  Qing Zhao,et al.  Cooperative Game in Dynamic Spectrum Access with Unknown Model and Imperfect Sensing , 2012, IEEE Transactions on Wireless Communications.

[18]  Andreu Mas-Colell,et al.  A General Class of Adaptive Strategies , 1999, J. Econ. Theory.

[19]  Tapio Elomaa,et al.  On Following the Perturbed Leader in the Bandit Setting , 2005, ALT.

[20]  Yishay Mansour,et al.  From External to Internal Regret , 2005, J. Mach. Learn. Res..

[21]  Takashi Ui,et al.  Correlated equilibrium and concave games , 2008, Int. J. Game Theory.

[22]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[23]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[24]  Alagan Anpalagan,et al.  Opportunistic Spectrum Access in Unknown Dynamic Environment: A Game-Theoretic Stochastic Learning Solution , 2012, IEEE Transactions on Wireless Communications.

[25]  Rong Zheng,et al.  Approximate online learning for passive monitoring of multi-channel wireless networks , 2013, 2013 Proceedings IEEE INFOCOM.

[26]  Walid Saad,et al.  Hedonic Coalition Formation for Distributed Task Allocation among Wireless Agents , 2010, IEEE Transactions on Mobile Computing.

[27]  Luciano Bononi,et al.  Learning with the Bandit: A Cooperative Spectrum Selection Scheme for Cognitive Radio Networks , 2011, 2011 IEEE Global Telecommunications Conference - GLOBECOM 2011.

[28]  Hyuck M. Kwon,et al.  General Auction-Theoretic Strategies for Distributed Partner Selection in Cooperative Wireless Networks , 2010, IEEE Transactions on Communications.

[29]  Sheng Zhong,et al.  Stimulating Cooperation in Vehicular Ad Hoc Networks: A Coalitional Game Theoretic Approach , 2011, IEEE Transactions on Vehicular Technology.

[30]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[31]  J. Shawe-Taylor Potential-Based Algorithms in On-Line Prediction and Game Theory ∗ , 2001 .

[32]  Narayan B. Mandayam,et al.  Pricing for enabling forwarding in self-configuring ad hoc networks , 2004, 2004 IEEE Wireless Communications and Networking Conference (IEEE Cat. No.04TH8733).

[33]  Xi Fang,et al.  Taming Wheel of Fortune in the Air: An Algorithmic Framework for Channel Selection Strategy in Cognitive Radio Networks , 2013, IEEE Transactions on Vehicular Technology.

[34]  Yuguang Fang,et al.  Stochastic Channel Selection in Cognitive Radio Networks , 2007, IEEE GLOBECOM 2007 - IEEE Global Telecommunications Conference.

[35]  Fabrizio Germano,et al.  Global Nash Convergence of Foster and Young's Regret Testing , 2004, Games Econ. Behav..

[36]  Qing Zhao,et al.  Distributed Learning in Multi-Armed Bandit With Multiple Players , 2009, IEEE Transactions on Signal Processing.

[37]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[38]  Bhaskar Krishnamachari,et al.  Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations , 2010, IEEE/ACM Transactions on Networking.

[39]  Philip Wolfe,et al.  Contributions to the theory of games , 1953 .

[40]  A. Neyman Correlated equilibrium and potential games , 1997 .

[41]  Rong Zheng,et al.  Sequential learning for optimal monitoring of multi-channel wireless networks , 2011, 2011 Proceedings IEEE INFOCOM.

[42]  Y. Freund,et al.  The non-stochastic multi-armed bandit problem , 2001 .

[43]  Eytan Modiano,et al.  Wireless channel allocation using an auction algorithm , 2006, IEEE Journal on Selected Areas in Communications.

[44]  Qing Zhao,et al.  Learning in a Changing World: Restless Multiarmed Bandit With Unknown Dynamics , 2010, IEEE Transactions on Information Theory.

[45]  Alagan Anpalagan,et al.  Opportunistic Spectrum Access Using Partially Overlapping Channels: Graphical Game and Uncoupled Learning , 2013, IEEE Transactions on Communications.

[46]  Vikram Krishnamurthy,et al.  Structured Threshold Policies for Dynamic Sensor Scheduling—A Partially Observed Markov Decision Process Approach , 2007, IEEE Transactions on Signal Processing.

[47]  Marcus Hutter,et al.  Adaptive Online Prediction by Following the Perturbed Leader , 2005, J. Mach. Learn. Res..