Delay-Optimal Dynamic Mode Selection and Resource Allocation in Device-to-Device Communications—Part II: Practical Algorithm

In Part I of this paper (“Delay-Optimal Dynamic Mode Selection and Resource Allocation in Device-to-Device Communications-Part I: Optimal Policy”), we investigated dynamic mode selection and subchannel allocation for an orthogonal frequency-division multiple access (OFDMA) cellular network with device-to-device (D2D) communications to minimize the average end-to-end delay performance under the dropping probability constraint. We formulated the optimal resource control problem into an infinite-horizon average-reward constrained Markov decision process (CMDP), and the optimal control policy derived in Part I using the brute-force offline value iteration algorithm based on the reduced-state equivalent Bellman equation still faces the well-known curse-of-dimensionality problem, which limits its practical application in realistic scenarios with multiple D2D users and cellular users. In Part II of this paper, we use linear value approximation techniques to further reduce the state space. Moreover, an online stochastic learning algorithm with two timescales is applied to update the value functions and Lagrangian multipliers (LMs) based on the real-time observations of channel state information (CSI) and queue state information (QSI). The combined online stochastic learning solution converges almost surely to a global optimal solution under some realistic conditions. Simulation results show that the proposed approach achieves nearly the same performance as the offline value iteration algorithm and outperforms the conventional CSI-only scheme and throughput-optimal scheme in a stability sense.

[1]  V. Borkar Asynchronous Stochastic Approximations , 1998 .

[2]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[3]  Chen-Khong Tham,et al.  Distributed Reinforcement Learning Frameworks for Cooperative Retransmission in Wireless Networks , 2010, IEEE Transactions on Vehicular Technology.

[4]  Dimitri P. Bertsekas,et al.  Dynamic programming and optimal control, 3rd Edition , 2005 .

[5]  Stefan Parkvall,et al.  Design aspects of network assisted device-to-device communications , 2012, IEEE Communications Magazine.

[6]  Xuemin Shen,et al.  Operator controlled device-to-device communications in LTE-advanced networks , 2012, IEEE Wireless Communications.

[7]  Chuang Lin,et al.  Stochastic Performance Analysis of a Wireless Finite-State Markov Channel , 2013, IEEE Transactions on Wireless Communications.

[8]  G. Pflug,et al.  Stochastic approximation and optimization of random systems , 1992 .

[9]  Leandros Tassiulas,et al.  Resource Allocation and Cross-Layer Control in Wireless Networks , 2006, Found. Trends Netw..

[10]  Zhu Han,et al.  Wireless Device-to-Device Communications and Networks , 2015 .

[11]  Vincent K. N. Lau,et al.  Delay-Aware Two-Hop Cooperative Relay Communications via Approximate MDP and Stochastic Learning , 2013, IEEE Transactions on Information Theory.

[12]  Leandros Tassiulas,et al.  Resource Allocation and Cross Layer Control in Wireless Networks (Foundations and Trends in Networking, V. 1, No. 1) , 2006 .

[13]  Vivek S. Borkar,et al.  An actor-critic algorithm for constrained Markov decision processes , 2005, Syst. Control. Lett..

[14]  V. Borkar Stochastic approximation with two time scales , 1997 .

[15]  Vincent K. N. Lau,et al.  A Survey on Delay-Aware Resource Control for Wireless Systems—Large Deviation Theory, Stochastic Lyapunov Drift, and Distributed Stochastic Learning , 2011, IEEE Transactions on Information Theory.

[16]  Ekram Hossain,et al.  Delay-Optimal Distributed Scheduling in Multi-User Multi-Relay Cellular Wireless Networks , 2013, IEEE Transactions on Communications.

[17]  Xuemin Shen,et al.  Delay-Optimal Dynamic Mode Selection and Resource Allocation in Device-to-Device Communications—Part I: Optimal Policy , 2016, IEEE Transactions on Vehicular Technology.

[18]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[19]  Georgios B. Giannakis,et al.  Queuing with adaptive modulation and coding over wireless links: cross-Layer analysis and design , 2005, IEEE Transactions on Wireless Communications.

[20]  Nei Kato,et al.  Relay-by-smartphone: realizing multihop device-to-device communications , 2014, IEEE Communications Magazine.