Distributed Algorithms for Learning and Cognitive Medium Access with Logarithmic Regret

The problem of distributed learning and channel access is considered in a cognitive network with multiple secondary users. The availability statistics of the channels are initially unknown to the secondary users and are estimated using sensing decisions. There is no explicit information exchange or prior agreement among the secondary users and sensing and access decisions are undertaken by them in a completely distributed manner. We propose policies for distributed learning and access which achieve order-optimal cognitive system throughput (number of successful secondary transmissions) under self play, i.e., when implemented at all the secondary users. Equivalently, our policies minimize the sum regret in distributed learning and access, which is the loss in secondary throughput due to learning and distributed access. For the scenario when the number of secondary users is known to the policy, we prove that the total regret is logarithmic in the number of transmission slots. This policy achieves order-optimal regret based on a logarithmic lower bound for regret under any uniformly-good learning and access policy. We then consider the case when the number of secondary users is fixed but unknown, and is estimated at each user through feedback. We propose a policy whose sum regret grows only slightly faster than logarithmic in the number of transmission slots.

[1]  H. Robbins,et al.  Asymptotically efficient adaptive allocation rules , 1985 .

[2]  J. Walrand,et al.  Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards , 1987 .

[3]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[4]  R. Agrawal Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.

[5]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[6]  Ben Y. Zhao,et al.  A Markov-Based Channel Model Algorithm for Wireless Networks , 2001, MSWIM '01.

[7]  Ben Y. Zhao,et al.  A Markov-Based Channel Model Algorithm for Wireless Networks , 2003, Wirel. Networks.

[8]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[9]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[10]  M. Bóna A Walk Through Combinatorics: An Introduction to Enumeration and Graph Theory , 2006 .

[11]  Brian M. Sadler,et al.  A Survey of Dynamic Spectrum Access , 2007, IEEE Signal Processing Magazine.

[12]  V. Krishnamurthy,et al.  Game Theoretic Learning and Pricing for Dynamic Spectrum Access in Cognitive Radio , 2007 .

[13]  Ananthram Swami,et al.  Cognitive MAC Protocols for Dynamic Spectrum Access , 2007 .

[14]  Brian M. Sadler,et al.  Cognitive Medium Access: Constraining Interference Based on Experimental Models , 2008, IEEE Journal on Selected Areas in Communications.

[15]  Qian Zhang,et al.  Contention-Aware Spectrum Sensing and Access Algorithm of Cognitive Network , 2008, CrownCom.

[16]  Hua Liu,et al.  Cooperation and Learning in Multiuser Opportunistic Spectrum Access , 2008, ICC Workshops - 2008 IEEE International Conference on Communications Workshops.

[17]  Qing Zhao,et al.  A Restless Bandit Formulation of Opportunistic Access: Indexablity and Index Policy , 2008, 2008 5th IEEE Annual Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks Workshops.

[18]  Bhaskar Krishnamachari,et al.  A negotiation game for multichannel access in cognitive radio networks , 2008, WICON.

[19]  Husheng Li,et al.  Multi-agent Q-learning of channel selection in multi-user cognitive radio systems: A two by two case , 2009, 2009 IEEE International Conference on Systems, Man and Cybernetics.

[20]  Mihaela van der Schaar,et al.  Learning to Compete for Resources in Wireless Stochastic Games , 2009, IEEE Transactions on Vehicular Technology.

[21]  Éva Tardos,et al.  Multiplicative updates outperform generic no-regret learning in congestion games: extended abstract , 2009, STOC '09.

[22]  Yi Gai,et al.  Learning Multiuser Channel Allocations in Cognitive Radio Networks: A Combinatorial Multi-Armed Bandit Formulation , 2010, 2010 IEEE Symposium on New Frontiers in Dynamic Spectrum (DySPAN).

[23]  Ao Tang,et al.  Opportunistic Spectrum Access with Multiple Users: Learning under Competition , 2010, 2010 Proceedings IEEE INFOCOM.

[24]  Qing Zhao,et al.  Distributed Learning in Multi-Armed Bandit With Multiple Players , 2009, IEEE Transactions on Signal Processing.

[25]  Bhaskar Krishnamachari,et al.  Distributed learning under imperfect sensing in cognitive radio networks , 2010, 2010 Conference Record of the Forty Fourth Asilomar Conference on Signals, Systems and Computers.

[26]  Qing Zhao,et al.  Decentralized multi-armed bandit with multiple distributed players , 2010, 2010 Information Theory and Applications Workshop (ITA).