An Online Learning Approach to Network Application Optimization with Guarantee

Network application optimization is essential for improving the performance of the application as well as its user experience. The network application parameters are crucial in making proper decisions for network application optimizations. However, many works are impractical by assuming a priori knowledge of the parameters which are usually unknown and need to be estimated. There have been studies that consider optimizing network application in an online learning context using multi-armed bandit models. However, existing frameworks are problematic as they only consider to find the optimal decisions to minimize the regret, but neglect the constraints (or guarantee) requirements which may be excessively violated. In this paper, we propose a novel online learning framework for network application optimizations with guarantee. To the best of our knowledge, we are the first to formulate the stochastic constrained multi-armed bandit model with time-varying “multi-level rewards” by taking both “regret” and “violation” into consideration. We are also the first to design a constrained bandit policy, Learning with Minimum Guarantee (LMG), with provable sub-linear regret and violation bounds. We illustrate how our framework can be applied to several emerging network application optimizations, namely, (1) opportunistic multichannel selection, (2) data-guaranteed crowdsensing, and (3) stability-guaranteed crowdsourced transcoding. To show the effectiveness of LMG in optimizing these applications with different minimum requirements, we also conduct extensive simulations by comparing LMG with existing state-of-the-art policies.

[1]  Omar Besbes,et al.  Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards , 2014, NIPS.

[2]  Andreas Krause,et al.  The next big one: Detecting earthquakes and other rare events from community-based sensors , 2011, Proceedings of the 10th ACM/IEEE International Conference on Information Processing in Sensor Networks.

[3]  Bowen Li,et al.  Online Sequential Channel Accessing Control: A Double Exploration vs. Exploitation Problem , 2015, IEEE Transactions on Wireless Communications.

[4]  Nikhil R. Devanur,et al.  An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives , 2015, COLT.

[5]  Bhaskar Krishnamachari,et al.  Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations , 2010, IEEE/ACM Transactions on Networking.

[6]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[7]  Rajiv Gandhi,et al.  Dependent rounding and its applications to approximation algorithms , 2006, JACM.

[8]  Erik G. Larsson,et al.  Spectrum Sensing for Cognitive Radio : State-of-the-Art and Recent Advances , 2012, IEEE Signal Processing Magazine.

[9]  Hojung Cha,et al.  Automatically characterizing places with opportunistic crowdsensing using smartphones , 2012, UbiComp.

[10]  Chi Harold Liu,et al.  Generic and Energy-Efficient Context-Aware Mobile Sensing , 2015 .

[11]  Mingyan Liu,et al.  Approximately optimal adaptive learning in opportunistic spectrum access , 2012, 2012 Proceedings IEEE INFOCOM.

[12]  Cong Zhang,et al.  CrowdTranscoding: Online Video Transcoding With Massive Viewers , 2017, IEEE Transactions on Multimedia.

[13]  R. Srikant,et al.  Bandits with Budgets , 2015, SIGMETRICS.

[14]  Dong Li,et al.  Opportunistic Channel Sharing in Cognitive Radio Networks , 2014 .

[15]  A. Appendix Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays , 2015 .

[16]  Sujit Dey,et al.  Adaptive Mobile Cloud Computing to Enable Rich Mobile Multimedia Applications , 2013, IEEE Transactions on Multimedia.

[17]  Geoffrey Ye Li,et al.  Cognitive radio networking and communications: an overview , 2011, IEEE Transactions on Vehicular Technology.

[18]  Nenghai Yu,et al.  Budgeted Multi-Armed Bandits with Multiple Plays , 2016, IJCAI.

[19]  Fan Ye,et al.  Mobile crowdsensing: current state and future challenges , 2011, IEEE Communications Magazine.

[20]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 1985 .

[21]  Mingyan Liu,et al.  An Online Approach to Dynamic Channel Access and Transmission Scheduling , 2015, MobiHoc.

[22]  Mehrdad Mahdavi,et al.  Online Decision Making under Stochastic Constraints , 2013 .

[23]  Wei Chen,et al.  Combinatorial multi-armed bandit: general framework, results and applications , 2013, ICML 2013.

[24]  Wenhan Dai,et al.  Online learning for multi-channel opportunistic access over unknown Markovian channels , 2014, 2014 Eleventh Annual IEEE International Conference on Sensing, Communication, and Networking (SECON).

[25]  Atsuyoshi Nakamura,et al.  Algorithms for Adversarial Bandit Problems with Multiple Plays , 2010, ALT.

[26]  Kai Han,et al.  Taming the Uncertainty: Budget Limited Robust Crowdsensing Through Online Learning , 2016, IEEE/ACM Transactions on Networking.