Contention-Driven Feature Extraction for Low-Regret Contextual Bandit-Based Channel Selection Dedicated to Wireless LANs.

To achieve low-regret learning in a radio channel selection for wireless local area networks (WLANs), we propose a contention-driven feature extraction (FE) scheme for a contextual multi-armed bandit (CMAB) algorithm. This study aims to learn the optimal WLAN channel online particularly in a scalable manner with respect to the number of APs and channels, which is accomplished by leveraging the context, i.e., channel allocation information. The proposed FE is designed by focusing on contention with neighboring and same-channel APs where the key idea is to consolidate contexts ignoring APs that are not connected to the target AP on the contention graph. The simulation results confirm that contention-driven FE enables a target AP to learn the optimal channel in a scalable manner for the number of APs and available channels and to have low regret using the CMAB algorithm.

[1]  Shipra Agrawal,et al.  Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.

[2]  John Langford,et al.  Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.

[3]  Christophe Moy,et al.  QoS Driven Channel Selection Algorithm for Cognitive Radio Network: Multi-User Multi-Armed Bandit Approach , 2017, IEEE Transactions on Cognitive Communications and Networking.

[4]  Wei Chu,et al.  Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[5]  David López-Pérez,et al.  IEEE 802.11be Extremely High Throughput: The Next Generation of Wi-Fi Technology Beyond 802.11ax , 2019, IEEE Communications Magazine.

[6]  John Langford,et al.  The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.

[7]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[8]  Rong Zheng,et al.  Starvation Modeling and Identification in Dense 802.11 Wireless Community Networks , 2008, IEEE INFOCOM 2008 - The 27th Conference on Computer Communications.

[9]  Masahiro Morikura,et al.  Thompson Sampling-Based Channel Selection Through Density Estimation Aided by Stochastic Geometry , 2020, IEEE Access.

[10]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[11]  Pan Zhou,et al.  Human-Behavior and QoE-Aware Dynamic Channel Allocation for 5G Networks: A Latent Contextual Bandit Learning Approach , 2020, IEEE Transactions on Cognitive Communications and Networking.

[12]  Yi Gai,et al.  Distributed Stochastic Online Learning Policies for Opportunistic Spectrum Access , 2014, IEEE Transactions on Signal Processing.

[13]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[14]  Michael H. Bowling,et al.  Convergence and No-Regret in Multiagent Learning , 2004, NIPS.

[15]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.