Multi-agent Q-learning for autonomous D2D communication

This paper is devoted to autonomous device-to-device (D2D) communication in cellular networks. The aim of each D2D pair is to maximize its throughput subject to the minimum signal-to-interference-plus-noise ratio (SINR) constraints. This problem is represented by a stochastic non-cooperative game where the players (D2D pairs) have no prior information on the availability and quality of selected channels. Therefore, each player in this game becomes a “learner” which explores all of its possible strategies based on the locally-observed throughput and state (defined by the channel quality). Consequently, we propose a multi-agent Q-learning algorithm based on the players' “beliefs” about the strategies of their counterparts and show its implementation in a Long Term Evolution - Advanced (LTE-A) network. As follows from simulations, the algorithm achieves a near-optimal performance after a small number of iterations.

[1]  Ekram Hossain,et al.  Distributed Resource Allocation for Relay-Aided Device-to-Device Communication Under Channel Uncertainties: A Stable Matching Approach , 2015, IEEE Transactions on Communications.

[2]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[3]  Jinsong Wu,et al.  Energy-Efficient Resource Allocation for Device-to-Device Communications Overlaying LTE Networks , 2015, 2015 IEEE 82nd Vehicular Technology Conference (VTC2015-Fall).

[4]  Xu Chen,et al.  Exploiting Social Ties for Cooperative D2D Communications: A Mobile Social Networking Case , 2015, IEEE/ACM Transactions on Networking.

[5]  Ryszard Kowalczyk,et al.  Dynamic analysis of multiagent Q-learning with ε-greedy exploration , 2009, ICML '09.

[6]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[7]  M. Dufwenberg Game theory. , 2011, Wiley interdisciplinary reviews. Cognitive science.

[8]  Gerald Tesauro,et al.  Extending Q-Learning to General Adaptive Multi-Agent Systems , 2003, NIPS.

[9]  Alagan Anpalagan,et al.  Opportunistic Spectrum Access Using Partially Overlapping Channels: Graphical Game and Uncoupled Learning , 2013, IEEE Transactions on Communications.

[10]  Ekram Hossain,et al.  Distributed resource allocation in D2D-enabled multi-tier cellular networks: An auction approach , 2015, 2015 IEEE International Conference on Communications (ICC).

[11]  Liqun Fu,et al.  Energy Efficient D2D Communications in Dynamic TDD Systems , 2015, IEEE Transactions on Communications.

[12]  I. Koutsopoulos,et al.  Mechanisms and Games for Dynamic Spectrum Allocation: Auction-driven market mechanisms for dynamic spectrum management* , 2013 .

[13]  Yoshikazu Miyanaga,et al.  An Autonomous Learning-Based Algorithm for Joint Channel and Power Level Selection by D2D Pairs in Heterogeneous Cellular Networks , 2016, IEEE Transactions on Communications.

[14]  Yoshikazu Miyanaga,et al.  Dynamic Buffer Status-Based Control for LTE-A Network With Underlay D2D Communication , 2016, IEEE Transactions on Communications.

[15]  Yoshikazu Miyanaga,et al.  QoS-Oriented Mode, Spectrum, and Power Allocation for D2D Communication Underlaying LTE-A Network , 2016, IEEE Transactions on Vehicular Technology.

[16]  Naumaan Nayyar,et al.  Decentralized Learning for Multiplayer Multiarmed Bandits , 2014, IEEE Transactions on Information Theory.

[17]  Alagan Anpalagan,et al.  Opportunistic Spectrum Access with Spatial Reuse: Graphical Game and Uncoupled Learning Solutions , 2013, IEEE Transactions on Wireless Communications.

[18]  Setareh Maghsudi,et al.  Channel Selection for Network-Assisted D2D Communication via No-Regret Bandit Learning With Calibrated Forecasting , 2014, IEEE Transactions on Wireless Communications.

[19]  Daesik Hong,et al.  Adaptive Mode Selection in D2D Communications Considering the Bursty Traffic Model , 2016, IEEE Communications Letters.

[20]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.

[21]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[22]  Zhu Han,et al.  A Bayesian Overlapping Coalition Formation Game for Device-to-Device Spectrum Sharing in Cellular Networks , 2015, IEEE Transactions on Wireless Communications.