Distributed Competitive Decision Making Using Multi-Armed Bandit Algorithms

This paper tackles the problem of Opportunistic Spectrum Access (OSA) in the Cognitive Radio (CR). The main challenge of a Secondary User (SU) in OSA is to learn the availability of existing channels in order to select and access the one with the highest vacancy probability. To reach this goal, we propose a novel Multi-Armed Bandit (MAB) algorithm called $$\epsilon$$ -UCB in order to enhance the spectrum learning of a SU and decrease the regret, i.e. the loss of reward by the selection of worst channels. We corroborate with simulations that the regret of the proposed algorithm has a logarithmic behavior. The last statement means that within a finite number of time slots, the SU can estimate the vacancy probability of targeted channels in order to select the best one for transmitting. Hereinafter, we extend $$\epsilon$$ -UCB to consider multiple priority users, where a SU can selfishly estimate and access the channels according to his prior rank. The simulation results show the superiority of the proposed algorithms for a single or multi-user cases compared to the existing MAB algorithms.

[1]  Rohit Kumar,et al.  Channel Selection for Secondary Users in Decentralized Network of Unknown Size , 2017, IEEE Communications Letters.

[2]  Mingyan Liu,et al.  Online learning in opportunistic spectrum access: A restless bandit approach , 2010, 2011 Proceedings IEEE INFOCOM.

[3]  Yi Gai,et al.  Decentralized Online Learning Algorithms for Opportunistic Spectrum Access , 2011, 2011 IEEE Global Telecommunications Conference - GLOBECOM 2011.

[4]  Shipra Agrawal,et al.  Further Optimal Regret Bounds for Thompson Sampling , 2012, AISTATS.

[5]  Steven L. Scott,et al.  A modern Bayesian look at the multi-armed bandit , 2010 .

[6]  Shipra Agrawal,et al.  Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.

[7]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[8]  C. Watkins Learning from delayed rewards , 1989 .

[9]  Bryan Paul,et al.  Radar-Communications Convergence: Coexistence, Cooperation, and Co-Design , 2017, IEEE Transactions on Cognitive Communications and Networking.

[10]  Lihong Li,et al.  An Empirical Evaluation of Thompson Sampling , 2011, NIPS.

[11]  Rémi Munos,et al.  A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences , 2011, COLT.

[12]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[13]  A. Assoum,et al.  Opportunistic Spectrum Access in Cognitive Radio for Tactical Network , 2018, 2018 2nd European Conference on Electrical Engineering and Computer Science (EECS).

[14]  Victor C. M. Leung,et al.  Rank-optimal channel selection strategy in cognitive networks , 2012, 2012 IEEE Global Communications Conference (GLOBECOM).

[15]  Jason L. Loeppky,et al.  A Survey of Online Experiment Design with the Stochastic Multi-Armed Bandit , 2015, ArXiv.

[16]  Mahmoud Almasri,et al.  All-Powerful Learning Algorithm for the Priority Access in Cognitive Network , 2019, 2019 27th European Signal Processing Conference (EUSIPCO).

[17]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[18]  Shie Mannor,et al.  Concurrent Bandits and Cognitive Radio Networks , 2014, ECML/PKDD.

[19]  Aurélien Garivier,et al.  On Bayesian Upper Confidence Bounds for Bandit Problems , 2012, AISTATS.

[20]  Rémi Munos,et al.  Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.

[21]  Augustin-Louis Cauchy Oeuvres complètes: ANALYSE MATHÉMATIQUE. – Sur la convergence des séries multiples , 2009 .

[22]  Christophe Moy,et al.  QoS Driven Channel Selection Algorithm for Cognitive Radio Network: Multi-User Multi-Armed Bandit Approach , 2017, IEEE Transactions on Cognitive Communications and Networking.

[23]  A. Assoum,et al.  Distributed Algorithm to Learn OSA Channels Availability and Enhance the Transmission Rate of Secondary Users , 2019, 2019 19th International Symposium on Communications and Information Technologies (ISCIT).

[24]  H. Robbins,et al.  Asymptotically efficient adaptive allocation rules , 1985 .

[25]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[26]  Augustin-Louis Cauchy Sur la convergence des séries , 2009 .

[27]  Ohad Shamir,et al.  Multi-player bandits: a musical chairs approach , 2016, ICML 2016.