Near Optimal Adaptive Shortest Path Routing with Stochastic Links States under Adversarial Attack

We consider the shortest path routing (SPR) of a network with stochastically time varying link metrics under potential adversarial attacks. Due to potential denial of service attacks, the distributions of link states could be stochastic (benign) or adversarial at different temporal and spatial locations. Without any \emph{a priori}, designing an adaptive SPR protocol to cope with all possible situations in practice optimally is a very challenging issue. In this paper, we present the first solution by formulating it as a multi-armed bandit (MAB) problem. By introducing a novel control parameter into the exploration phase for each link, a martingale inequality is applied in the our combinatorial adversarial MAB framework. As such, our proposed algorithms could automatically detect features of the environment within a unified framework and find the optimal SPR strategies with almost optimal learning performance in all possible cases over time. Moreover, we study important issues related to the practical implementation, such as decoupling route selection with multi-path route probing, cooperative learning among multiple sources, the "cold-start" issue and delayed feedback of our algorithm. Nonetheless, the proposed SPR algorithms can be implemented with low complexity and they are proved to scale very well with the network size. Comparing to existing approaches in a typical network scenario under jamming attacks, our algorithm has a 65.3\% improvement of network delay given a learning period and a 81.5\% improvement of learning duration under a specified network delay.

[1]  Don Towsley,et al.  Routing worm: a fast, selective attack worm based on IP address information , 2005, Workshop on Principles of Advanced and Distributed Simulation (PADS'05).

[2]  Abhijeet Bhorkar,et al.  Adaptive Opportunistic Routing for Wireless Ad Hoc Networks , 2012, IEEE/ACM Transactions on Networking.

[3]  Zheng Wen,et al.  Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits , 2014, AISTATS.

[4]  Donald F. Towsley,et al.  Endhost-based shortest path routing in dynamic networks: An online learning approach , 2013, 2013 Proceedings IEEE INFOCOM.

[5]  András György,et al.  Online Learning under Delayed Feedback , 2013, ICML.

[6]  Tamás Linder,et al.  The On-Line Shortest Path Problem Under Partial Monitoring , 2007, J. Mach. Learn. Res..

[7]  Aleksandrs Slivkins,et al.  One Practical Algorithm for Both Stochastic and Adversarial Bandits , 2014, ICML.

[8]  Bhaskar Krishnamachari,et al.  Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations , 2010, IEEE/ACM Transactions on Networking.

[9]  T. Javidi,et al.  No Regret Routing for ad-hoc wireless networks , 2010, 2010 Conference Record of the Forty Fourth Asilomar Conference on Signals, Systems and Computers.

[10]  Dapeng Wu,et al.  Shortest Path Routing in Unknown Environments: Is the Adaptive Optimal Strategy Available? , 2016, 2016 13th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON).

[11]  Qing Zhao,et al.  Online learning for stochastic linear optimization problems , 2012, 2012 Information Theory and Applications Workshop.

[12]  Koby Crammer,et al.  Prediction with Limited Advice and Multiarmed Bandits with Paid Observations , 2014, ICML.

[13]  Baruch Awerbuch,et al.  Provably competitive adaptive routing , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[14]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[15]  Zhongcheng Li,et al.  Almost Optimal Channel Access in Multi-Hop Networks with Unknown Channel Variables , 2013, 2014 IEEE 34th International Conference on Distributed Computing Systems.

[16]  Nicolò Cesa-Bianchi,et al.  Combinatorial Bandits , 2012, COLT.

[17]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[18]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[19]  Baruch Awerbuch,et al.  Adaptive routing with end-to-end feedback: distributed learning and geometric approaches , 2004, STOC '04.

[20]  Qing Zhao,et al.  Adaptive shortest-path routing under unknown and stochastically varying link states , 2012, 2012 10th International Symposium on Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks (WiOpt).

[21]  Wtt Wtt Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits , 2015 .

[22]  Ambuj Tewari,et al.  Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret , 2012, ICML.

[23]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[24]  Gábor Lugosi,et al.  Regret in Online Combinatorial Optimization , 2012, Math. Oper. Res..

[25]  Thomas P. Hayes,et al.  Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[26]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[27]  Ohad Shamir,et al.  Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..