Cooperation Speeds Surfing: Use Co-Bandit!

In this paper, we explore the benefit of cooperation in adversarial bandit settings. As a motivating example, we consider the problem of wireless network selection. Mobile devices are often required to choose the right network to associate with for optimal performance, which is non-trivial. The excellent theoretical properties of EXP3, a leading multi-armed bandit algorithm, suggest that it should work well for this type of problem. Yet, it performs poorly in practice. A major limitation is its slow rate of stabilization. Bandit-style algorithms perform better when global knowledge is available, i.e., when devices receive feedback about all networks after each selection. But, unfortunately, communicating full information to all devices is expensive. Therefore, we address the question of how much information is adequate to achieve better performance. We propose Co-Bandit, a novel cooperative bandit approach, that allows devices to occasionally share their observations and forward feedback received from neighbors; hence, feedback may be received with a delay. Devices perform network selection based on their own observation and feedback from neighbors. As such, they speed up each other's rate of learning. We prove that Co-Bandit is regret-minimizing and retains the convergence property of multiplicative weight update algorithms with full information. Through simulation, we show that a very small amount of information, even with a delay, is adequate to nudge each other to select the right network and yield significantly faster stabilization at the optimal state (about 630x faster than EXP3).

[1]  Mingyan Liu,et al.  Performance and Convergence of Multi-user Online Learning , 2011, GAMENETS.

[2]  John Langford,et al.  Efficient Optimal Learning for Contextual Bandits , 2011, UAI.

[3]  Claudio Gentile,et al.  Delay and Cooperation in Nonstochastic Bandits , 2016, COLT.

[4]  Kian-Lee Tan,et al.  Shrewd Selection Speeds Surfing: Use Smart EXP3! , 2017, 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS).

[5]  Mark Handley,et al.  TCP Extensions for Multipath Operation with Multiple Addresses , 2020, RFC.

[6]  István Hegedüs,et al.  Gossip-based distributed stochastic bandit algorithms , 2013, ICML.

[7]  R. Rosenthal A class of games possessing pure-strategy Nash equilibria , 1973 .

[8]  Mark Handley,et al.  TCP Extensions for Multipath Operation with Multiple Addresses , 2011 .

[9]  Paramvir Bahl,et al.  MultiNet: connecting to multiple IEEE 802.11 networks using a single wireless card , 2004, IEEE INFOCOM 2004.

[10]  Hari Balakrishnan,et al.  All your network are belong to us: a transport framework for mobile network selection , 2014, HotMobile.

[11]  Mihaela van der Schaar,et al.  Distributed Online Learning via Cooperative Contextual Bandits , 2013, IEEE Transactions on Signal Processing.

[12]  Hari Balakrishnan,et al.  WiFi, LTE, or Both?: Measuring Multi-Homed Wireless Internet Performance , 2014, Internet Measurement Conference.

[13]  Noga Alon,et al.  Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback , 2014, SIAM J. Comput..

[14]  Man Hon Cheung,et al.  Congestion-Aware Distributed Network Selection for Integrated Cellular and Wi-Fi Networks , 2017, ArXiv.

[15]  Mung Chiang,et al.  Max-Min Fair Resource Allocation in HetNets: Distributed Algorithms and Hybrid Architecture , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[16]  Dusit Niyato,et al.  Dynamics of Network Selection in Heterogeneous Wireless Networks: An Evolutionary Game Approach , 2009, IEEE Transactions on Vehicular Technology.

[17]  Mung Chiang,et al.  RAT selection games in HetNets , 2013, 2013 Proceedings IEEE INFOCOM.

[18]  Dan Pei,et al.  Characterizing and Improving WiFi Latency in Large-Scale Operational Networks , 2016, MobiSys.

[19]  Aravind Srinivasan,et al.  A Client-Driven Approach for Channel Management in Wireless LANs , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[20]  Kent Quanrud,et al.  Online Learning with Adversarial Delays , 2015, NIPS.

[21]  Dusit Niyato,et al.  Network Selection in Heterogeneous Wireless Networks: Evolution with Incomplete Information , 2010, 2010 IEEE Wireless Communication and Networking Conference.

[22]  Setareh Maghsudi,et al.  Relay selection with no side information: An adversarial bandit approach , 2013, 2013 IEEE Wireless Communications and Networking Conference (WCNC).

[23]  Qihui Wu,et al.  Traffic-Aware Online Network Selection in Heterogeneous Wireless Networks , 2016, IEEE Transactions on Vehicular Technology.

[24]  Seung-Jae Han,et al.  Fairness and Load Balancing in Wireless LANs Using Association Control , 2004, IEEE/ACM Transactions on Networking.

[25]  Erik Ordentlich,et al.  On delayed prediction of individual sequences , 2002, IEEE Trans. Inf. Theory.

[26]  Éva Tardos,et al.  Multiplicative updates outperform generic no-regret learning in congestion games: extended abstract , 2009, STOC '09.

[27]  András György,et al.  Online Learning under Delayed Feedback , 2013, ICML.

[28]  András György,et al.  Delay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms , 2016, AAAI.

[29]  Chris Mesterharm,et al.  On-line Learning with Delayed Label Feedback , 2005, ALT.

[30]  Umts Long Term Evolution (lte) Technology Introduction Application Note 1ma111 Lte/e-utra , 2022 .

[31]  Jafar Saniie,et al.  Convergence properties of general network selection games , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[32]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[33]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[34]  Konstantina Papagiannaki,et al.  Measurement-Based Self Organization of Interfering 802.11 Wireless Access Networks , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[35]  Vaibhav Srivastava,et al.  On distributed cooperative decision-making in multiarmed bandits , 2015, 2016 European Control Conference (ECC).

[36]  Haym Hirsh,et al.  Improving on-line learning , 2007 .

[37]  Aditya Gopalan,et al.  Collaborative learning of stochastic bandits over a social network , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[38]  Tim Roughgarden,et al.  Algorithmic Game Theory , 2007 .