An actor-critic reinforcement learning-based resource management in mobile edge computing systems

Reinforcement learning (RL) as an effective tool has attracted great attention in wireless communication field nowadays. In this paper, we investigate the offloading decision and resource allocation problem in mobile edge computing (MEC) systems based on RL methods. Different from existing literature, our research focuses on improving mobile operators’ revenue by maximizing the amount of the offloaded tasks while decreasing the energy expenditure and time-delays. Considering the dynamic characteristics of wireless environment, the above problem is modeled as a Markov decision process (MDP). Since the action space of the MDP is multidimensional continuous variables mixed with discrete variables, traditional RL algorithms are powerless. Therefore, an actor-critic (AC) with eligibility traces algorithm is proposed to resolve the problem. The actor part introduces the parameterized normal distribution to generate the probabilities of continuous stochastic actions, and the critic part employs a linear approximator to estimate the value of states, based on which the actor part updates policy parameters in the direction of performance improvement. Furthermore, an advantage function is designed to reduce the variance of the learning process. Simulation results indicate that the proposed algorithm can find the best strategy to maximize the amount of the tasks executed by the MEC server while decreasing the energy consumption and time-delays.

[1]  Elmar Wolfstetter Topics in Microeconomics: Frontmatter , 1999 .

[2]  Zdenek Becvar,et al.  Mobile Edge Computing: A Survey on Architecture and Computation Offloading , 2017, IEEE Communications Surveys & Tutorials.

[3]  Mick Wilson,et al.  Toward QoE-Assured 4K Video-on-Demand Delivery Through Mobile Edge Virtualization With Adaptive Prefetching , 2017, IEEE Transactions on Multimedia.

[4]  Jeongho Kwak,et al.  Dual-Side Optimization for Cost-Delay Tradeoff in Mobile Edge Computing , 2017, IEEE Transactions on Vehicular Technology.

[5]  Jeongho Kwak,et al.  Dual-side dynamic controls for cost minimization in mobile cloud computing systems , 2015, 2015 13th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt).

[6]  Tao Li,et al.  A Framework for Partitioning and Execution of Data Stream Applications in Mobile Cloud Computing , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[7]  Qianbin Chen,et al.  Computation Offloading and Resource Allocation in Wireless Cellular Networks With Mobile Edge Computing , 2017, IEEE Transactions on Wireless Communications.

[8]  Gaofeng Nie,et al.  Energy-Saving Offloading by Jointly Allocating Radio and Computational Resources for Mobile Edge Computing , 2017, IEEE Access.

[9]  Nan Zhao,et al.  Integrated Networking, Caching, and Computing for Connected Vehicles: A Deep Reinforcement Learning Approach , 2018, IEEE Transactions on Vehicular Technology.

[10]  Sudarshan Guruacharya,et al.  Multi-Operator Spectrum Sharing for Small Cell Networks: A Matching Game Perspective , 2016, IEEE Transactions on Wireless Communications.

[11]  F. Richard Yu,et al.  Enhancing QoE-Aware Wireless Edge Caching With Software-Defined Wireless Networks , 2017, IEEE Transactions on Wireless Communications.

[12]  Sergio Barbarossa,et al.  Joint Optimization of Radio and Computational Resources for Multicell Mobile-Edge Computing , 2014, IEEE Transactions on Signal and Information Processing over Networks.

[13]  Geoffrey Ye Li,et al.  Energy-efficient link adaptation in frequency-selective channels , 2010, IEEE Transactions on Communications.

[14]  Nei Kato,et al.  Postdisaster User Location Maneuvering Method for Improving the QoE Guaranteed Service Time in Energy Harvesting Small Cell Networks , 2017, IEEE Transactions on Vehicular Technology.

[15]  Osvaldo Simeone,et al.  Energy-Efficient Resource Allocation for Mobile Edge Computing-Based Augmented Reality Applications , 2016, IEEE Wireless Communications Letters.

[16]  Jeongho Kwak,et al.  DREAM: Dynamic Resource and Task Allocation for Energy Minimization in Mobile Cloud Systems , 2015, IEEE Journal on Selected Areas in Communications.

[17]  Tony Q. S. Quek,et al.  Offloading in Mobile Edge Computing: Task Allocation and Computational Frequency Scaling , 2017, IEEE Transactions on Communications.

[18]  F. Richard Yu,et al.  Joint Offloading and Resource Allocation in Mobile Edge Computing Systems: An Actor-Critic Approach , 2018, 2018 IEEE Global Communications Conference (GLOBECOM).

[19]  F. Richard Yu,et al.  QoS Aware Transcoding for Live Streaming in Edge-Clouds Aided HetNets: An Enhanced Actor-Critic Approach , 2019, IEEE Transactions on Vehicular Technology.

[20]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[21]  Khaled Ben Letaief,et al.  Dynamic Computation Offloading for Mobile-Edge Computing With Energy Harvesting Devices , 2016, IEEE Journal on Selected Areas in Communications.

[22]  Zhu Han,et al.  Joint Optimization of Caching, Computing, and Radio Resources for Fog-Enabled IoT Using Natural Actor–Critic Deep Reinforcement Learning , 2019, IEEE Internet of Things Journal.

[23]  Elmar Wolfstetter,et al.  Topics in microeconomics - industrial organization, auctions, and incentives (repr.) , 1999 .

[24]  Thomas D. Burd,et al.  Processor design for portable systems , 1996, J. VLSI Signal Process..

[25]  Kaibin Huang,et al.  Energy-Efficient Resource Allocation for Mobile-Edge Computation Offloading , 2016, IEEE Transactions on Wireless Communications.

[26]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[27]  Vijay R. Konda,et al.  OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..

[28]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[29]  Zhisheng Niu,et al.  Policy Optimization for Content Push via Energy Harvesting Small Cells in Heterogeneous Networks , 2017, IEEE Transactions on Wireless Communications.

[30]  Jun Zhang,et al.  Stochastic Joint Radio and Computational Resource Management for Multi-User Mobile-Edge Computing Systems , 2017, IEEE Transactions on Wireless Communications.

[31]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[32]  Qianbin Chen,et al.  Joint Computation Offloading and Interference Management in Wireless Cellular Networks with Mobile Edge Computing , 2017, IEEE Transactions on Vehicular Technology.

[33]  Karel De Vogeleer,et al.  The Energy/Frequency Convexity Rule: Modeling and Experimental Validation on Mobile Devices , 2013, PPAM.

[34]  H. Vincent Poor,et al.  Cooperation and Storage Tradeoffs in Power Grids With Renewable Energy Resources , 2014, IEEE Journal on Selected Areas in Communications.

[35]  Abolfazl Mehbodniya,et al.  Online Ski Rental for ON/OFF Scheduling of Energy Harvesting Base Stations , 2016, IEEE Transactions on Wireless Communications.

[36]  Elmar G. Wolfstetter Topics in Microeconomics: Technical Supplements , 1999 .