Opportunistic Spectrum Access with Multiple Users: Learning under Competition

The problem of cooperative allocation among multiple secondary users to maximize cognitive system throughput is considered. The channel availability statistics are initially unknown to the secondary users and are learnt via sensing samples. Two distributed learning and allocation schemes which maximize the cognitive system throughput or equivalently minimize the total regret in distributed learning and allocation are proposed. The first scheme assumes minimal prior information in terms of pre-allocated ranks for secondary users while the second scheme is fully distributed and assumes no such prior information. The two schemes have sum regret which is provably logarithmic in the number of sensing time slots. A lower bound is derived for any learning scheme which is asymptotically logarithmic in the number of slots. Hence, our schemes achieve asymptotic order optimality in terms of regret in distributed learning and allocation.

[1]  M. Bóna A Walk Through Combinatorics: An Introduction to Enumeration and Graph Theory , 2006 .

[2]  Hu Gang,et al.  Contention-Aware Spectrum Sensing and Access Algorithm of Cognitive Network , 2008, 2008 3rd International Conference on Cognitive Radio Oriented Wireless Networks and Communications (CrownCom 2008).

[3]  Brian M. Sadler,et al.  Cognitive Medium Access: Constraining Interference Based on Experimental Models , 2008, IEEE Journal on Selected Areas in Communications.

[4]  R. Agrawal Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.

[5]  Ananthram Swami,et al.  Decentralized cognitive MAC for opportunistic spectrum access in ad hoc networks: A POMDP framework , 2007, IEEE Journal on Selected Areas in Communications.

[6]  Qing Zhao,et al.  A Restless Bandit Formulation of Opportunistic Access: Indexablity and Index Policy , 2008, 2008 5th IEEE Annual Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks Workshops.

[7]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[8]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[9]  J. Walrand,et al.  Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards , 1987 .

[10]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[11]  Husheng Li,et al.  Multi-agent Q-learning of channel selection in multi-user cognitive radio systems: A two by two case , 2009, 2009 IEEE International Conference on Systems, Man and Cybernetics.

[12]  Duane W. DeTemple,et al.  Half integer approximations for the partial sums of the harmonic series , 1991 .

[13]  Mihaela van der Schaar,et al.  Learning to Compete for Resources in Wireless Stochastic Games , 2009, IEEE Transactions on Vehicular Technology.

[14]  Ananthram Swami,et al.  Distributed Algorithms for Learning and Cognitive Medium Access with Logarithmic Regret , 2010, IEEE Journal on Selected Areas in Communications.

[15]  Bhaskar Krishnamachari,et al.  A negotiation game for multichannel access in cognitive radio networks , 2008, WICON.

[16]  Éva Tardos,et al.  Multiplicative updates outperform generic no-regret learning in congestion games: extended abstract , 2009, STOC '09.

[17]  Qing Zhao,et al.  Decentralized multi-armed bandit with multiple distributed players , 2010, 2010 Information Theory and Applications Workshop (ITA).

[18]  Brian M. Sadler,et al.  A Survey of Dynamic Spectrum Access , 2007, IEEE Signal Processing Magazine.