Pure-Exploration Bandits for Channel Selection in Mission-Critical Wireless Communications

In emergency communications, guaranteeing ultrareliable and low-latency communication is challenging yet crucial to save human lives and to coordinate the operations of first responders. To address this problem, we introduce a general approach for channel selection in mission-critical communications, i.e., choose channels with the best quality timely and accurately via channel probing. Since the channel conditions are dynamic and initially unknown to wireless users, choosing channels with the best conditions is nontrivial. Thus, we adopt online learning methods to let users probe channels and predict the channel conditions by a restricted time interval of observation. We formulate this problem as an emerging branch of the classic multiarmed bandit (MAB) problem, namely the pure-exploration bandit problem, to achieve a tradeoff between sampling time/resource budget and the channel selection accuracy (i.e., the probability of selecting optimal channels). The goal of the learning process is to choose the “optimal subset” of channels after a limited time period of channel probing. We propose and evaluate one learning policy for the single-user case and three learning policies for the distributed multiuser cases. We take communication costs and interference costs into account, and analyze the tradeoff between these costs and the accuracy of channel selection. Extensive simulations are conducted and the results show that the proposed algorithms can achieve considerably higher channel selection accuracy than previous exploration bandit approaches and classic MAB methods.

[1]  Xiaofu Ma,et al.  Next generation public safety networks: A spectrum sharing approach , 2016, IEEE Communications Magazine.

[2]  Yi Gai,et al.  Distributed Stochastic Online Learning Policies for Opportunistic Spectrum Access , 2014, IEEE Transactions on Signal Processing.

[3]  George Tsirtsis,et al.  LTE for public safety networks , 2013, IEEE Communications Magazine.

[4]  Vahid Tarokh,et al.  On Sequential Elimination Algorithms for Best-Arm Identification in Multi-Armed Bandits , 2016, IEEE Transactions on Signal Processing.

[5]  Danda B. Rawat,et al.  nROAR: Near Real-Time Opportunistic Spectrum Access and Management in Cloud-Based Database-Driven Cognitive Radio Networks , 2017, IEEE Transactions on Network and Service Management.

[6]  Peter Stone,et al.  Efficient Selection of Multiple Bandit Arms: Theory and Practice , 2010, ICML.

[7]  Wei Chen,et al.  Combinatorial Pure Exploration of Multi-Armed Bandits , 2014, NIPS.

[8]  Qing Zhao,et al.  Distributed learning in cognitive radio networks: Multi-armed bandit with distributed multiple players , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Zhi-Hua Zhou,et al.  Resource Allocation for Heterogeneous Cognitive Radio Networks with Imperfect Spectrum Sensing , 2013, IEEE Journal on Selected Areas in Communications.

[10]  Sachin Shetty,et al.  Dynamic Spectrum Access for Wireless Networks , 2015, SpringerBriefs in Electrical and Computer Engineering.

[11]  Alan Kaplan,et al.  Critical communications and public safety networks part 1: Standards, spectrum policy, and economics , 2016, IEEE Commun. Mag..

[12]  I-Jeng Wang,et al.  Characterization of Spectrum Activities in the U.S. Public Safety Band for Opportunistic Spectrum Access , 2007, 2007 2nd IEEE International Symposium on New Frontiers in Dynamic Spectrum Access Networks.

[13]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[14]  Ambuj Tewari,et al.  PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.

[15]  Ao Tang,et al.  Opportunistic Spectrum Access with Multiple Users: Learning under Competition , 2010, 2010 Proceedings IEEE INFOCOM.

[16]  Naumaan Nayyar,et al.  Decentralized Learning for Multiplayer Multiarmed Bandits , 2014, IEEE Transactions on Information Theory.

[17]  Qi Wang,et al.  Public safety and commercial spectrum sharing via network pricing and admission control , 2007, IEEE Journal on Selected Areas in Communications.

[18]  Shie Mannor,et al.  Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..

[19]  R. Munos,et al.  Best Arm Identification in Multi-Armed Bandits , 2010, COLT.

[20]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[21]  Eshcar Hillel,et al.  Distributed Exploration in Multi-Armed Bandits , 2013, NIPS.

[22]  Mingyan Liu,et al.  Online Learning of Rested and Restless Bandits , 2011, IEEE Transactions on Information Theory.

[23]  Ian F. Akyildiz,et al.  Cooperative spectrum sensing in cognitive radio networks: A survey , 2011, Phys. Commun..

[24]  Yu-Dong Yao,et al.  Cooperative relay techniques for cognitive radio systems: Spectrum sensing and secondary user transmissions , 2012, IEEE Communications Magazine.

[25]  Rémi Munos,et al.  Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.

[26]  Tao Jiang,et al.  Distributed Learning for Multi-Channel Selection in Wireless Network Monitoring , 2016, 2016 13th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON).

[27]  James R. Zeidler,et al.  Distributed Opportunistic Scheduling for Ad-Hoc Communications Under Delay Constraints , 2010, 2010 Proceedings IEEE INFOCOM.

[28]  Sébastien Bubeck,et al.  Multiple Identifications in Multi-Armed Bandits , 2012, ICML.