Distributed algorithm under cooperative or competitive priority users in cognitive networks

Opportunistic spectrum access (OSA) problem in cognitive radio (CR) networks allows a secondary (unlicensed) user (SU) to access a vacant channel allocated to a primary (licensed) user (PU). By finding the availability of the best channel, i.e., the channel that has the highest availability probability, a SU can increase its transmission time and rate. To maximize the transmission opportunities of a SU, various learning algorithms are suggested: Thompson sampling (TS), upper confidence bound (UCB), ε -greedy, etc. In our study, we propose a modified UCB version called AUCB (Arctan-UCB) that can achieve a logarithmic regret similar to TS or UCB while further reducing the total regret, defined as the reward loss resulting from the selection of non-optimal channels. To evaluate AUCB’s performance for the multi-user case, we propose a novel uncooperative policy for a priority access where the k th user should access the k th best channel. This manuscript theoretically establishes the upper bound on the sum regret of AUCB under the single or multi-user cases. The users thus may, after finite time slots, converge to their dedicated channels. It also focuses on the Quality of Service AUCB (QoS-AUCB) using the proposed policy for the priority access. Our simulations corroborate AUCB’s performance compared to TS or UCB.

[1]  Joseph Mitola,et al.  Cognitive radio: making software radios more personal , 1999, IEEE Wirel. Commun..

[2]  Shipra Agrawal,et al.  Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.

[3]  Ying-Chang Liang,et al.  Optimal power allocation for fading channels in cognitive radio networks: Ergodic capacity and outage capacity , 2008, IEEE Transactions on Wireless Communications.

[4]  A. Assoum,et al.  Opportunistic Spectrum Access in Cognitive Radio for Tactical Network , 2018, 2018 2nd European Conference on Electrical Engineering and Computer Science (EECS).

[5]  Xianfu Chen,et al.  Stochastic Power Adaptation with Multiagent Reinforcement Learning for Cognitive Wireless Mesh Networks , 2013, IEEE Transactions on Mobile Computing.

[6]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[7]  Bhaskar Krishnamachari,et al.  Decentralized multi-armed bandit with imperfect observations , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[8]  Symeon Papavassiliou,et al.  Energy-efficient subcarrier allocation in SC-FDMA wireless networks based on multilateral model of bargaining , 2013, 2013 IFIP Networking Conference.

[9]  Symeon Papavassiliou,et al.  Uplink resource allocation in SC-FDMA wireless networks: A survey and taxonomy , 2016, Comput. Networks.

[10]  Tao Luo,et al.  An Energy Detection Algorithm Based on Double-Threshold in Cognitive Radio Systems , 2009, 2009 First International Conference on Information Science and Engineering.

[11]  Xiaoying Gan,et al.  Cooperative Spectrum Sharing in Cognitive Radio Networks: A Distributed Matching Approach , 2014, IEEE Transactions on Communications.

[12]  Christophe Moy,et al.  QoS Driven Channel Selection Algorithm for Cognitive Radio Network: Multi-User Multi-Armed Bandit Approach , 2017, IEEE Transactions on Cognitive Communications and Networking.

[13]  Ying-Chang Liang,et al.  Optimal power allocation for OFDM-based cognitive radio with new primary transmission protection criteria , 2010, IEEE Transactions on Wireless Communications.

[14]  Qing Zhao,et al.  Distributed Learning in Multi-Armed Bandit With Multiple Players , 2009, IEEE Transactions on Signal Processing.

[15]  Rémi Munos,et al.  A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences , 2011, COLT.

[16]  A. Assoum,et al.  Distributed Algorithm to Learn OSA Channels Availability and Enhance the Transmission Rate of Secondary Users , 2019, 2019 19th International Symposium on Communications and Information Technologies (ISCIT).

[17]  Ming Li,et al.  Blind Energy-based Detection for Spatial Spectrum Sensing , 2015, IEEE Wireless Communications Letters.

[18]  Shipra Agrawal,et al.  Further Optimal Regret Bounds for Thompson Sampling , 2012, AISTATS.

[19]  Wassim Jouini,et al.  Decision making for cognitive radio equipment: analysis of the first 10 years of exploration , 2012, EURASIP Journal on Wireless Communications and Networking.

[20]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[21]  Mahmoud Almasri,et al.  All-Powerful Learning Algorithm for the Priority Access in Cognitive Network , 2019, 2019 27th European Signal Processing Conference (EUSIPCO).

[22]  Mingyan Liu,et al.  Online algorithms for the multi-armed bandit problem with Markovian rewards , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[23]  Chandra R. Murthy,et al.  Performance comparison of energy, matched-filter and cyclostationarity-based spectrum sensing , 2010, 2010 IEEE 11th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC).

[24]  Rémi Munos,et al.  Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.

[25]  Yi Gai,et al.  Decentralized Online Learning Algorithms for Opportunistic Spectrum Access , 2011, 2011 IEEE Global Telecommunications Conference - GLOBECOM 2011.

[26]  Mingyan Liu,et al.  Online learning in opportunistic spectrum access: A restless bandit approach , 2010, 2011 Proceedings IEEE INFOCOM.

[27]  H. Tang,et al.  Some physical layer issues of wide-band cognitive radio systems , 2005, First IEEE International Symposium on New Frontiers in Dynamic Spectrum Access Networks, 2005. DySPAN 2005..

[28]  R. Agrawal Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.

[29]  Lihong Li,et al.  An Empirical Evaluation of Thompson Sampling , 2011, NIPS.

[30]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[31]  John G. van Bosse Signaling in Telecommunication Networks , 1997 .

[32]  Yi Gai,et al.  Learning Multiuser Channel Allocations in Cognitive Radio Networks: A Combinatorial Multi-Armed Bandit Formulation , 2010, 2010 IEEE Symposium on New Frontiers in Dynamic Spectrum (DySPAN).

[33]  Wassim Jouini,et al.  Multi-armed bandit based policies for cognitive radio's decision making issues , 2009, 2009 3rd International Conference on Signals, Circuits and Systems (SCS).

[34]  Bryan Paul,et al.  Radar-Communications Convergence: Coexistence, Cooperation, and Co-Design , 2017, IEEE Transactions on Cognitive Communications and Networking.

[35]  Ali Mansour,et al.  Spectrum sensing based on cumulative power spectral density , 2017, EURASIP Journal on Advances in Signal Processing.

[36]  Ananthram Swami,et al.  Distributed Algorithms for Learning and Cognitive Medium Access with Logarithmic Regret , 2010, IEEE Journal on Selected Areas in Communications.

[37]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[38]  M. Bóna A Walk Through Combinatorics: An Introduction to Enumeration and Graph Theory , 2006 .

[39]  D.J. Goodman,et al.  Single carrier FDMA for uplink wireless transmission , 2006, IEEE Vehicular Technology Magazine.

[40]  Jason L. Loeppky,et al.  A Survey of Online Experiment Design with the Stochastic Multi-Armed Bandit , 2015, ArXiv.

[41]  Qing Zhao,et al.  Learning in a Changing World: Restless Multiarmed Bandit With Unknown Dynamics , 2010, IEEE Transactions on Information Theory.

[42]  Ohad Shamir,et al.  Multi-player bandits: a musical chairs approach , 2016, ICML 2016.

[43]  Leonardo Badia,et al.  A Superprocess with Upper Confidence Bounds for Cooperative Spectrum Sharing , 2016, IEEE Transactions on Mobile Computing.

[44]  Victor C. M. Leung,et al.  Rank-optimal channel selection strategy in cognitive networks , 2012, 2012 IEEE Global Communications Conference (GLOBECOM).

[45]  D. Ernst,et al.  Upper Confidence Bound Based Decision Making Strategies and Dynamic Spectrum Access , 2010, 2010 IEEE International Conference on Communications.

[46]  Augustin-Louis Cauchy Sur la convergence des séries , 2009 .

[47]  Anant Sahai,et al.  Fundamental design tradeoffs in cognitive radio systems , 2006, TAPAS '06.

[48]  Steven L. Scott,et al.  A modern Bayesian look at the multi-armed bandit , 2010 .

[49]  Santiago Zazo,et al.  Upper Confidence Bound learning approach for real HF measurements , 2015, 2015 IEEE International Conference on Communication Workshop (ICCW).

[50]  Santiago Zazo,et al.  Hybrid UCB-HMM: A Machine Learning Strategy for Cognitive Radio in HF Band , 2015, IEEE Transactions on Cognitive Communications and Networking.

[51]  Aurélien Garivier,et al.  On Bayesian Upper Confidence Bounds for Bandit Problems , 2012, AISTATS.

[52]  John G. van Bosse,et al.  Signaling in Telecommunication Networks (Wiley Series in Telecommunications and Signal Processing) , 2006 .

[53]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[54]  J. I. Mararm,et al.  Energy Detection of Unknown Deterministic Signals , 2022 .