Sequential Learning for Multi-Channel Wireless Network Monitoring With Channel Switching Costs

We consider the problem of optimally assigning p sniffers to K channels to monitor the transmission activities in a multichannel wireless network with switching costs. The activity of users is initially unknown to the sniffers and is to be learned along with channel assignment decisions to maximize the benefits of this assignment, resulting in the fundamental tradeoff between exploration and exploitation. Switching costs are incurred when sniffers change their channel assignments. As a result, frequent changes are undesirable. We formulate the sniffer-channel assignment with switching costs as a linear partial monitoring problem, a superclass of multiarmed bandits. As the number of arms (sniffer-channel assignments) is exponential, novel techniques are called for, to allow efficient learning. We use the linear bandit model to capture the dependency amongst the arms and develop a policy that takes advantage of this dependency. We prove that the proposed Upper Confident Bound-based (UCB) policy enjoys a logarithmic regret bound in time t that depends sublinearly on the number of arms, while its total switching cost grows in the order of O(loglog(t)).

[1]  J. Walrand,et al.  Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards , 1987 .

[2]  D. Teneketzis,et al.  Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching cost , 1988 .

[3]  R. Agrawal Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.

[4]  Deborah Estrin,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation Fine-grained Network Time Synchronization Using Reference Broadcasts , 2022 .

[5]  Paramvir Bahl,et al.  Characterizing user behavior and network performance in a public wireless LAN , 2002, SIGMETRICS '02.

[6]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[7]  DE Economist A SURVEY ON THE BANDIT PROBLEM WITH SWITCHING COSTS , 2004 .

[8]  Tristan Henderson,et al.  The changing usage of a mature campus-wide wireless network , 2004, MobiCom '04.

[9]  Moustafa Youssef,et al.  A framework for wireless LAN monitoring and its applications , 2004, WiSe '04.

[10]  Amit Kumar,et al.  Maximum Coverage Problem with Group Budget Constraints and Applications , 2004, APPROX-RANDOM.

[11]  Ratul Mahajan,et al.  Measurement-based characterization of 802.11 in a hotspot setting , 2005, E-WIND '05.

[12]  Stefan Savage,et al.  Jigsaw: solving the puzzle of enterprise 802.11 analysis , 2006, SIGCOMM.

[13]  Nicolò Cesa-Bianchi,et al.  Regret Minimization Under Partial Monitoring , 2006, 2006 IEEE Information Theory Workshop - ITW '06 Punta del Este.

[14]  Csaba Szepesvári,et al.  Tuning Bandit Algorithms in Stochastic Environments , 2007, ALT.

[15]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[16]  Elizabeth M. Belding-Royer,et al.  FreeMAC: framework for multi-channel mac development on 802.11 hardware , 2008, PRESTO '08.

[17]  Thomas P. Hayes,et al.  Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[18]  Aurélien Garivier,et al.  On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems , 2008, 0805.3415.

[19]  Hai Jiang,et al.  Medium access in cognitive radio networks: A competitive multi-armed bandit framework , 2008, 2008 42nd Asilomar Conference on Signals, Systems and Computers.

[20]  Nicolò Cesa-Bianchi,et al.  Combinatorial Bandits , 2012, COLT.

[21]  Saurabh Bagchi,et al.  Optimal monitoring in multi-channel multi-radio wireless mesh networks , 2009, MobiHoc '09.

[22]  Yi Gai,et al.  Learning Multiuser Channel Allocations in Cognitive Radio Networks: A Combinatorial Multi-Armed Bandit Formulation , 2010, 2010 IEEE Symposium on New Frontiers in Dynamic Spectrum (DySPAN).

[23]  John N. Tsitsiklis,et al.  Linearly Parameterized Bandits , 2008, Math. Oper. Res..

[24]  Ao Tang,et al.  Opportunistic Spectrum Access with Multiple Users: Learning under Competition , 2010, 2010 Proceedings IEEE INFOCOM.

[25]  Qing Zhao,et al.  Distributed Learning in Multi-Armed Bandit With Multiple Players , 2009, IEEE Transactions on Signal Processing.

[26]  Rong Zheng,et al.  Sequential learning for optimal monitoring of multi-channel wireless networks , 2011, 2011 Proceedings IEEE INFOCOM.

[27]  H. Vincent Poor,et al.  Cognitive Medium Access: Exploration, Exploitation, and Competition , 2007, IEEE Transactions on Mobile Computing.

[28]  Rong Zheng,et al.  Approximate online learning for passive monitoring of multi-channel wireless networks , 2013, 2013 Proceedings IEEE INFOCOM.

[29]  Rong Zheng,et al.  On Quality of Monitoring for Multichannel Wireless Infrastructure Networks , 2010, IEEE Transactions on Mobile Computing.

[30]  Rong Zheng,et al.  On Quality of Monitoring for Multichannel Wireless Infrastructure Networks , 2014, IEEE Trans. Mob. Comput..

[31]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .