Bandit Learning-based Service Placement and Resource Allocation for Mobile Edge Computing

Service placement is a significant issue in mobile edge computing (MEC) system. Many works have proposed efficient offline approaches for service placement problems in MEC system. However, because of the randomness and uncertainty of mobile networks, it is impractical for these approaches to be implemented. Facing these uncertainty, we propose an online service placement scheme for MEC system without knowing service demand and network states in advance. In order to maximize the long-term accumulated reward obtained by service placement with limited resource constraint, we analyse this problem by a combinatorial multi-armed bandit (MAB) framework. In addition, because we simultaneously consider the service placement and resource allocation among services, it can be formulated as a multiple choice knapsack problem (MCKP) in each time slot. To solve this long-term reward maximization problem, we first propose a combinatorial upper bound confidence(CUCB)-based online service placement and resource allocation scheme. Then, we analyse the performance of this algorithm theoretically. Finally, simulation results show the efficiency of the algorithm.

[1]  Jie Xu,et al.  Budget-Constrained Edge Service Provisioning With Demand Estimation via Bandit Learning , 2019, IEEE Journal on Selected Areas in Communications.

[2]  Wei Chen,et al.  Combinatorial Multi-Armed Bandit: General Framework and Applications , 2013, ICML.

[3]  Shaolei Ren,et al.  Spatio–Temporal Edge Service Placement: A Bandit Learning Approach , 2018, IEEE Transactions on Wireless Communications.

[4]  Deeparnab Chakrabarty,et al.  Knapsack Problems , 2008 .

[5]  Jie Xu,et al.  Joint Service Caching and Task Offloading for Mobile Edge Computing in Dense Networks , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[6]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[7]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[8]  Deniz Gündüz,et al.  Multi-armed bandit optimization of cache content in wireless infostation networks , 2014, 2014 IEEE International Symposium on Information Theory.

[9]  Deniz Gündüz,et al.  Learning-based optimization of cache content in a small cell base station , 2014, 2014 IEEE International Conference on Communications (ICC).

[10]  Eugene L. Lawler,et al.  Fast approximation algorithms for knapsack problems , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).