论文信息 - Multi-agent Q-learning for autonomous D2D communication

Multi-agent Q-learning for autonomous D2D communication

This paper is devoted to autonomous device-to-device (D2D) communication in cellular networks. The aim of each D2D pair is to maximize its throughput subject to the minimum signal-to-interference-plus-noise ratio (SINR) constraints. This problem is represented by a stochastic non-cooperative game where the players (D2D pairs) have no prior information on the availability and quality of selected channels. Therefore, each player in this game becomes a “learner” which explores all of its possible strategies based on the locally-observed throughput and state (defined by the channel quality). Consequently, we propose a multi-agent Q-learning algorithm based on the players' “beliefs” about the strategies of their counterparts and show its implementation in a Long Term Evolution - Advanced (LTE-A) network. As follows from simulations, the algorithm achieves a near-optimal performance after a small number of iterations.

Yoshikazu Miyanaga | Alia Asheralieva

[1] Ekram Hossain,et al. Distributed Resource Allocation for Relay-Aided Device-to-Device Communication Under Channel Uncertainties: A Stable Matching Approach , 2015, IEEE Transactions on Communications.

[2] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[3] Jinsong Wu,et al. Energy-Efficient Resource Allocation for Device-to-Device Communications Overlaying LTE Networks , 2015, 2015 IEEE 82nd Vehicular Technology Conference (VTC2015-Fall).

[4] Xu Chen,et al. Exploiting Social Ties for Cooperative D2D Communications: A Mobile Social Networking Case , 2015, IEEE/ACM Transactions on Networking.

[5] Ryszard Kowalczyk,et al. Dynamic analysis of multiagent Q-learning with ε-greedy exploration , 2009, ICML '09.

[6] Michael P. Wellman,et al. Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[7] M. Dufwenberg. Game theory. , 2011, Wiley interdisciplinary reviews. Cognitive science.

[8] Gerald Tesauro,et al. Extending Q-Learning to General Adaptive Multi-Agent Systems , 2003, NIPS.

[9] Alagan Anpalagan,et al. Opportunistic Spectrum Access Using Partially Overlapping Channels: Graphical Game and Uncoupled Learning , 2013, IEEE Transactions on Communications.

[10] Ekram Hossain,et al. Distributed resource allocation in D2D-enabled multi-tier cellular networks: An auction approach , 2015, 2015 IEEE International Conference on Communications (ICC).

[11] Liqun Fu,et al. Energy Efficient D2D Communications in Dynamic TDD Systems , 2015, IEEE Transactions on Communications.

[12] I. Koutsopoulos,et al. Mechanisms and Games for Dynamic Spectrum Allocation: Auction-driven market mechanisms for dynamic spectrum management* , 2013 .

[13] Yoshikazu Miyanaga,et al. An Autonomous Learning-Based Algorithm for Joint Channel and Power Level Selection by D2D Pairs in Heterogeneous Cellular Networks , 2016, IEEE Transactions on Communications.

[14] Yoshikazu Miyanaga,et al. Dynamic Buffer Status-Based Control for LTE-A Network With Underlay D2D Communication , 2016, IEEE Transactions on Communications.

[15] Yoshikazu Miyanaga,et al. QoS-Oriented Mode, Spectrum, and Power Allocation for D2D Communication Underlaying LTE-A Network , 2016, IEEE Transactions on Vehicular Technology.

[16] Naumaan Nayyar,et al. Decentralized Learning for Multiplayer Multiarmed Bandits , 2014, IEEE Transactions on Information Theory.

[17] Alagan Anpalagan,et al. Opportunistic Spectrum Access with Spatial Reuse: Graphical Game and Uncoupled Learning Solutions , 2013, IEEE Transactions on Wireless Communications.

[18] Setareh Maghsudi,et al. Channel Selection for Network-Assisted D2D Communication via No-Regret Bandit Learning With Calibrated Forecasting , 2014, IEEE Transactions on Wireless Communications.

[19] Daesik Hong,et al. Adaptive Mode Selection in D2D Communications Considering the Bursty Traffic Model , 2016, IEEE Communications Letters.

[20] Keith B. Hall,et al. Correlated Q-Learning , 2003, ICML.

[21] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[22] Zhu Han,et al. A Bayesian Overlapping Coalition Formation Game for Device-to-Device Spectrum Sharing in Cellular Networks , 2015, IEEE Transactions on Wireless Communications.