Sequential learning for optimal monitoring of multi-channel wireless networks

We consider the problem of optimally assigning p sniffers to K channels to monitor the transmission activities in a multi-channel wireless network. The activity of users is initially unknown to the sniffers and is to be learned along with channel assignment decisions while maximizing the benefits of this assignment, resulting in the fundamental trade-off between exploration versus exploitation. We formulate it as the linear partial monitoring problem, a super-class of multi-armed bandits. As the number of arms (sniffer-channel assignments) is exponential, novel techniques are called for, to allow efficient learning. We use the linear bandit model to capture the dependency amongst the arms and develop two policies that take advantage of this dependency. Both policies enjoy logarithmic regret bound of time-slots with a term that is sub-linear in the number of arms.

[1]  J. Walrand,et al.  Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards , 1987 .

[2]  D. Teneketzis,et al.  Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching cost , 1988 .

[3]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[4]  R. Agrawal Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.

[5]  Peter Auer,et al.  Using upper confidence bounds for online learning , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[6]  Deborah Estrin,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation Fine-grained Network Time Synchronization Using Reference Broadcasts , 2022 .

[7]  Paramvir Bahl,et al.  Characterizing user behavior and network performance in a public wireless LAN , 2002, SIGMETRICS '02.

[8]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[9]  Tristan Henderson,et al.  The changing usage of a mature campus-wide wireless network , 2004, MobiCom '04.

[10]  Moustafa Youssef,et al.  A framework for wireless LAN monitoring and its applications , 2004, WiSe '04.

[11]  Amit Kumar,et al.  Maximum Coverage Problem with Group Budget Constraints and Applications , 2004, APPROX-RANDOM.

[12]  Moustafa Youssef,et al.  An accurate technique for measuring the wireless side of wireless networks , 2005, WiTMeMo '05.

[13]  Ratul Mahajan,et al.  Measurement-based characterization of 802.11 in a hotspot setting , 2005, E-WIND '05.

[14]  Stefan Savage,et al.  Jigsaw: solving the puzzle of enterprise 802.11 analysis , 2006, SIGCOMM.

[15]  Nicolò Cesa-Bianchi,et al.  Regret Minimization Under Partial Monitoring , 2006, 2006 IEEE Information Theory Workshop - ITW '06 Punta del Este.

[16]  Csaba Szepesvári,et al.  Tuning Bandit Algorithms in Stochastic Environments , 2007, ALT.

[17]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[18]  Thomas P. Hayes,et al.  Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[19]  Aurélien Garivier,et al.  On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems , 2008 .

[20]  Hai Jiang,et al.  Medium access in cognitive radio networks: A competitive multi-armed bandit framework , 2008, 2008 42nd Asilomar Conference on Signals, Systems and Computers.

[21]  Csaba Szepesvári,et al.  Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..

[22]  Nicolò Cesa-Bianchi,et al.  Combinatorial Bandits , 2012, COLT.

[23]  Saurabh Bagchi,et al.  Optimal monitoring in multi-channel multi-radio wireless mesh networks , 2009, MobiHoc '09.

[24]  Yi Gai,et al.  Learning Multiuser Channel Allocations in Cognitive Radio Networks: A Combinatorial Multi-Armed Bandit Formulation , 2010, 2010 IEEE Symposium on New Frontiers in Dynamic Spectrum (DySPAN).

[25]  John N. Tsitsiklis,et al.  Linearly Parameterized Bandits , 2008, Math. Oper. Res..

[26]  Ao Tang,et al.  Opportunistic Spectrum Access with Multiple Users: Learning under Competition , 2010, 2010 Proceedings IEEE INFOCOM.

[27]  Qian Zhang,et al.  Cooperative Communication-Aware Spectrum Leasing in Cognitive Radio Networks , 2010, 2010 IEEE Symposium on New Frontiers in Dynamic Spectrum (DySPAN).

[28]  Qing Zhao,et al.  Decentralized multi-armed bandit with multiple distributed players , 2010, 2010 Information Theory and Applications Workshop (ITA).

[29]  H. Vincent Poor,et al.  Cognitive Medium Access: Exploration, Exploitation, and Competition , 2007, IEEE Transactions on Mobile Computing.

[30]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .