Cost-Aware Learning and Optimization for Opportunistic Spectrum Access

In this paper, we investigate cost-aware joint learning and optimization for multi-channel opportunistic spectrum access in a cognitive radio system. We investigate a discrete-time model where the time axis is partitioned into frames. Each frame consists of a sensing phase, followed by a transmission phase. During the sensing phase, the user is able to sense a subset of channels sequentially before it decides to use one of them in the following transmission phase. We assume the channel states alternate between busy and idle according to independent Bernoulli random processes from frame to frame. To capture the inherent uncertainty in channel sensing, we assume the reward of each transmission when the channel is idle is a random variable. We also associate random costs with sensing and transmission actions. Our objective is to understand how the costs and reward of the actions would affect the optimal behavior of the user in both offline and online settings, and design the corresponding opportunistic spectrum access strategies to maximize the expected cumulative net reward (i.e., reward-minus-cost).We start with an offline setting where the statistics of the channel status, costs and reward are known beforehand. We show that the the optimal policy exhibits a recursive double-threshold structure, and the user needs to compare the channel statistics with those thresholds sequentially in order to decide its actions. With such insights, we then study the online setting, where the statistical information of the channels, costs and reward are unknown a priori. We judiciously balance exploration and exploitation, and show that the cumulative regret scales in O(log T). We also establish a matched lower bound, which implies that our online algorithm is order-optimal. Simulation results corroborate our theoretical analysis.

[1]  Bhaskar Krishnamachari,et al.  On myopic sensing for multi-channel opportunistic access: structure, optimality, and performance , 2007, IEEE Transactions on Wireless Communications.

[2]  R. Agrawal Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.

[3]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[4]  Geoffrey Ye Li,et al.  Cognitive radio networking and communications: an overview , 2011, IEEE Transactions on Vehicular Technology.

[5]  Shipra Agrawal,et al.  Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.

[6]  Bhaskar Krishnamachari,et al.  Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations , 2010, IEEE/ACM Transactions on Networking.

[7]  Qing Zhao,et al.  Decentralized multi-armed bandit with multiple distributed players , 2010, 2010 Information Theory and Applications Workshop (ITA).

[8]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[9]  Ananthram Swami,et al.  Joint Design and Separation Principle for Opportunistic Spectrum Access in the Presence of Sensing Errors , 2007, IEEE Transactions on Information Theory.

[10]  Qing Zhao,et al.  A Restless Bandit Formulation of Opportunistic Access: Indexablity and Index Policy , 2008, 2008 5th IEEE Annual Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks Workshops.

[11]  Yi Gai,et al.  Learning Multiuser Channel Allocations in Cognitive Radio Networks: A Combinatorial Multi-Armed Bandit Formulation , 2010, 2010 IEEE Symposium on New Frontiers in Dynamic Spectrum (DySPAN).

[12]  H. Vincent Poor,et al.  Cognitive Medium Access: Exploration, Exploitation, and Competition , 2007, IEEE Transactions on Mobile Computing.

[13]  Sem C. Borst,et al.  Dynamic rate control algorithms for HDR throughput optimization , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[14]  Edward W. Knightly,et al.  Opportunistic Spectral Usage: Bounds and a Multi-Band CSMA/CA Protocol , 2007, IEEE/ACM Transactions on Networking.

[15]  Mingyan Liu,et al.  Optimal Channel Probing and Transmission Scheduling for Opportunistic Spectrum Access , 2007, IEEE/ACM Transactions on Networking.

[16]  Hai Jiang,et al.  Medium access in cognitive radio networks: A competitive multi-armed bandit framework , 2008, 2008 42nd Asilomar Conference on Signals, Systems and Computers.

[17]  Mingyan Liu,et al.  Multi-channel opportunistic access: A case of restless bandits with multiple plays , 2009, 2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[18]  Mingyan Liu,et al.  An Online Approach to Dynamic Channel Access and Transmission Scheduling , 2015, MobiHoc.

[19]  Zheng Wen,et al.  Cascading Bandits: Learning to Rank in the Cascade Model , 2015, ICML.

[20]  Pravin Varaiya,et al.  Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .

[21]  Marwan Krunz,et al.  Throughput-efficient sequential channel sensing and probing in cognitive radio networks under sensing errors , 2009, MobiCom '09.

[22]  Edward W. Knightly,et al.  Opportunistic fair scheduling over multiple wireless channels , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[23]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[24]  Edward W. Knightly,et al.  MOAR: a multi-channel opportunistic auto-rate media access protocol for ad hoc networks , 2004, First International Conference on Broadband Networks.

[25]  Matthew Andrews,et al.  Providing quality of service over a shared wireless link , 2001, IEEE Commun. Mag..

[26]  James R. Zeidler,et al.  Distributed Opportunistic Scheduling for Ad-Hoc Communications Under Delay Constraints , 2010, 2010 Proceedings IEEE INFOCOM.