Transfer restless multi-armed bandit policy for energy-efficient heterogeneous cellular network

This paper proposes a learning policy to improve the energy efficiency (EE) of heterogeneous cellular networks. The combination of active and inactive base stations (BS) that allows for maximizing EE is identified as a combinatorial learning problem and requires high computational complexity as well as a large signaling overhead. This paper aims at presenting a learning policy that dynamically switches a BS to ON or OFF status in order to follow the traffic load variation during the day. The network traffic load is represented as a Markov decision process, and we propose a modified upper confidence bound algorithm based on restless Markov multi-armed bandit framework for the BS switching operation. Moreover, to cope with initial reward loss and to speed up the convergence of the learning algorithm, the transfer learning concept is adapted to our algorithm in order to benefit from the transferred knowledge observed in historical periods from the same region. Based on our previous work, a convergence theorem is provided for the proposed policy. Extensive simulations demonstrate that the proposed algorithms follow the traffic load variation during the day and contribute to a performance jump-start in EE improvement under various practical traffic load profiles. It also demonstrates that proposed schemes can significantly reduce the total energy consumption of cellular network, e.g., up to 70% potential energy savings based on a real traffic profile.

[1]  Song Chong,et al.  TAES: Traffic-aware energy-saving base station sleeping and clustering in cooperative networks , 2015, 2015 13th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt).

[2]  Hyundong Shin,et al.  Energy Efficient Heterogeneous Cellular Networks , 2013, IEEE Journal on Selected Areas in Communications.

[3]  Haiyun Luo,et al.  Traffic-driven power saving in operational 3G cellular networks , 2011, MobiCom.

[4]  Mingyan Liu,et al.  Online Learning of Rested and Restless Bandits , 2011, IEEE Transactions on Information Theory.

[5]  Weisi Guo,et al.  Dynamic Cell Expansion with Self-Organizing Cooperation , 2013, IEEE Journal on Selected Areas in Communications.

[6]  Adrian S. Poulton,et al.  Traffic-and-interference aware base station switching for green cellular networks , 2013, 2013 IEEE 18th International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD).

[7]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[8]  Zhisheng Niu,et al.  Cell zooming for cost-efficient green cellular networks , 2010, IEEE Communications Magazine.

[9]  Zhang Chao,et al.  Green Mobile Access Network with Dynamic Base Station Energy Saving , 2009 .

[10]  Bhaskar Krishnamachari,et al.  Dynamic Base Station Switching-On/Off Strategies for Green Cellular Networks , 2013, IEEE Transactions on Wireless Communications.

[11]  Vivek S. Borkar,et al.  Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..

[12]  Cheng-Xiang Wang,et al.  Reinforcement learning approaches and evaluation criteria for opportunistic spectrum access , 2014, 2014 IEEE International Conference on Communications (ICC).

[13]  L. Chiaraviglio,et al.  Optimal Energy Savings in Cellular Access Networks , 2009, 2009 IEEE International Conference on Communications Workshops.

[14]  Marco Ajmone Marsan,et al.  Energy efficient management of two cellular access networks , 2010, PERV.

[15]  Bhaskar Krishnamachari,et al.  Energy Savings through Dynamic Base Station Switching in Cellular Wireless Access Networks , 2010, 2010 IEEE Global Telecommunications Conference GLOBECOM 2010.

[16]  Xianfu Chen,et al.  Energy saving through a learning framework in greener cellular radio access networks , 2012, 2012 IEEE Global Communications Conference (GLOBECOM).

[17]  Zhisheng Niu,et al.  Traffic-Aware Network Planning and Green Operation with BS Sleeping and Cell Zooming , 2014, IEICE transactions on communications.

[18]  Zhisheng Niu,et al.  A Dynamic Programming Approach for Base Station Sleeping in Cellular Networks , 2012, IEICE Trans. Commun..

[19]  Xianfu Chen,et al.  TACT: A Transfer Actor-Critic Learning Framework for Energy Saving in Cellular Radio Access Networks , 2012, IEEE Transactions on Wireless Communications.

[20]  P. Lezaud Chernoff-type bound for finite Markov chains , 1998 .

[21]  Bhaskar Krishnamachari,et al.  Base Station Operation and User Association Mechanisms for Energy-Delay Tradeoffs in Green Cellular Networks , 2011, IEEE Journal on Selected Areas in Communications.

[22]  Ya-Ju Yu,et al.  Decentralized energy-efficient base station operation for green cellular networks , 2012, GLOBECOM.

[23]  Gustavo de Veciana,et al.  Dynamic association for load balancing and interference avoidance in multi-cell networks , 2007, IEEE Transactions on Wireless Communications.

[24]  Xiangqian Liu,et al.  Conference Record of the Asilomar Conference on Signals, Systems and Computers , 2001 .

[25]  Holger Karl Data Transmission in Mobile Communication Systems , 2004, Location-Based Services.

[26]  H. Vincent Poor,et al.  A sensing policy based on confidence bounds and a restless multi-armed bandit model , 2012, 2012 Conference Record of the Forty Sixth Asilomar Conference on Signals, Systems and Computers (ASILOMAR).

[27]  Mingyan Liu,et al.  Online learning in opportunistic spectrum access: A restless bandit approach , 2010, 2011 Proceedings IEEE INFOCOM.

[28]  Albrecht J. Fehske,et al.  Energy Efficiency Improvements through Micro Sites in Cellular Mobile Radio Networks , 2009, 2009 IEEE Globecom Workshops.

[29]  Holger Karl,et al.  An overview of energy-efficiency techniques for mobile communication systems , 2003 .

[30]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[31]  H. Vincent Poor,et al.  Bandit problems with side observations , 2005, IEEE Transactions on Automatic Control.

[32]  Zhisheng Niu,et al.  Toward dynamic energy-efficient operation of cellular network infrastructure , 2011, IEEE Communications Magazine.

[33]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[34]  Mingyan Liu,et al.  Online algorithms for the multi-armed bandit problem with Markovian rewards , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[35]  Visa Koivunen,et al.  An Order Optimal Policy for Exploiting Idle Spectrum in Cognitive Radio Networks , 2015, IEEE Transactions on Signal Processing.

[36]  Ajmone Marsan,et al.  [IEEE 2009 IEEE International Conference on Communications Workshops - Dresden, Germany (2009.06.14-2009.06.18)] 2009 IEEE International Conference on Communications Workshops - Optimal Energy Savings in Cellular Access Networks , 2009 .

[37]  A. F. Adams,et al.  The Survey , 2021, Dyslexia in Higher Education.

[38]  Christophe Moy,et al.  QoS Driven Channel Selection Algorithm for Cognitive Radio Network: Multi-User Multi-Armed Bandit Approach , 2017, IEEE Transactions on Cognitive Communications and Networking.

[39]  K. J. Ray Liu,et al.  Energy-Efficient Base-Station Cooperative Operation with Guaranteed QoS , 2013, IEEE Transactions on Communications.

[40]  Zhifeng Zhao,et al.  GM-PAB: A grid-based energy saving scheme with predicted traffic load guidance for cellular networks , 2012, 2012 IEEE International Conference on Communications (ICC).

[41]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[42]  Laurence Dooley,et al.  A scalable multimode base station switching model for green cellular networks , 2015, 2015 IEEE Wireless Communications and Networking Conference (WCNC).

[43]  Abbas Jamalipour,et al.  Distributed Inter-BS Cooperation Aided Energy Efficient Load Balancing for Cellular Networks , 2013, IEEE Transactions on Wireless Communications.

[44]  J. Walrand,et al.  Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards , 1987 .