Distributed learning algorithm with synchronized epochs for dynamic spectrum access in unknown environment using multi-user restless multi-armed bandit

Abstract Dynamic spectrum access using cognitive radio has many application areas like smart-grid, Internet of Things and various other device-to-device communication paradigms. In dynamic spectrum access, a user picks a channel out of N channels to transmit during each time-slot. Thus, the user gets an arbitrary reward from a limited set of reward states, and the selected channel is termed as an active channel. The reward condition of the active channel evolves as per an unknown Markovian chain. In contrast, the reward condition of the passive channels evolves as an arbitrary strange random process. Notably, the objective of a channel selection strategy is to minimize regret by selecting the best channel in terms of mean-availability. So, a strategy based on consecutive selections (epochs) of channels, dubbed as Adaptive Sequencing of Exploration and Exploitation for Channel Selection in Unknown Environment (ASEE-CSUE) has been proposed. By reasonably planning the sequencing of epochs, ASEE-CSUE can achieve a logarithmic order of regret with time. Furthermore, the extensive simulation results indicate that collisions and switching cost is less than 7% and 2%, respectively, and the selection of the best channels is more than 90% of the total time-slots.

[1]  Visa Koivunen,et al.  An Order Optimal Policy for Exploiting Idle Spectrum in Cognitive Radio Networks , 2015, IEEE Transactions on Signal Processing.

[2]  Qing Wang,et al.  A Survey on Device-to-Device Communication in Cellular Networks , 2013, IEEE Communications Surveys & Tutorials.

[3]  Mingyan Liu,et al.  Mining Spectrum Usage Data: A Large-Scale Spectrum Measurement Study , 2012, IEEE Trans. Mob. Comput..

[4]  Marco Zennaro,et al.  A Survey of TV White Space Measurements , 2014, AFRICOMM.

[5]  Joseph Mitola,et al.  Cognitive radio: making software radios more personal , 1999, IEEE Wirel. Commun..

[6]  Maziar Nekovee Cognitive Radio Access to TV White Spaces: Spectrum Opportunities, Commercial Applications and Remaining Technology Challenges , 2010, 2010 IEEE Symposium on New Frontiers in Dynamic Spectrum (DySPAN).

[7]  Shie Mannor,et al.  Concurrent Bandits and Cognitive Radio Networks , 2014, ECML/PKDD.

[8]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[9]  Brian M. Sadler,et al.  A Survey of Dynamic Spectrum Access , 2007, IEEE Signal Processing Magazine.

[10]  R. N. Bradt,et al.  On Sequential Designs for Maximizing the Sum of $n$ Observations , 1956 .

[11]  Ananthram Swami,et al.  Decentralized cognitive MAC for opportunistic spectrum access in ad hoc networks: A POMDP framework , 2007, IEEE Journal on Selected Areas in Communications.

[12]  Lin Chen,et al.  On Optimality of Myopic Policy for Restless Multi-Armed Bandit Problem: An Axiomatic Approach , 2012, IEEE Transactions on Signal Processing.

[13]  Lilian Besson,et al.  {Multi-Player Bandits Revisited} , 2017, ALT.

[14]  Rohit Kumar,et al.  Distributed Learning and Coordination in Cognitive Infrastructureless Networks of Unknown Size , 2020, IEEE Systems Journal.

[15]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[16]  Qing Zhao,et al.  Learning in a Changing World: Restless Multiarmed Bandit With Unknown Dynamics , 2010, IEEE Transactions on Information Theory.

[17]  John N. Tsitsiklis,et al.  The Complexity of Optimal Queuing Network Control , 1999, Math. Oper. Res..

[18]  Ananthram Swami,et al.  Distributed Algorithms for Learning and Cognitive Medium Access with Logarithmic Regret , 2010, IEEE Journal on Selected Areas in Communications.

[19]  Himanshu Agrawal,et al.  Decentralized Learning for Opportunistic Spectrum Access: Multiuser Restless Multiarmed Bandit Formulation , 2020, IEEE Systems Journal.

[20]  Christophe Moy,et al.  QoS Driven Channel Selection Algorithm for Cognitive Radio Network: Multi-User Multi-Armed Bandit Approach , 2017, IEEE Transactions on Cognitive Communications and Networking.

[21]  Qing Zhao,et al.  Indexability of Restless Bandit Problems and Optimality of Whittle Index for Dynamic Multichannel Access , 2008, IEEE Transactions on Information Theory.

[22]  Mingyan Liu,et al.  Online learning in opportunistic spectrum access: A restless bandit approach , 2010, 2011 Proceedings IEEE INFOCOM.

[23]  Naumaan Nayyar,et al.  Decentralized Learning for Multiplayer Multiarmed Bandits , 2014, IEEE Transactions on Information Theory.

[24]  Qing Zhao,et al.  Distributed Learning in Multi-Armed Bandit With Multiple Players , 2009, IEEE Transactions on Signal Processing.

[25]  Mingyan Liu,et al.  Online Learning of Rested and Restless Bandits , 2011, IEEE Transactions on Information Theory.