Shortest Path Routing in Unknown Environments: Is the Adaptive Optimal Strategy Available?

We consider the shortest path routing (SPR) problem of a network with time varying link metrics in unknown environments. Due to potential denial of service attacks, the distributions of link states could be stochastic (benign or i.i.d.), contaminated or adversarial (non-i.i.d.) at different temporal and spatial locations. Without any a priori, designing an adaptive SPR protocol to cope with all possible situations in practice optimally is a very challenging issue. In this paper, we present the first solution by formulating it as a multi-armed bandit problem. By introducing novel control parameters to explore link conditions, our proposed algorithms could automatically detect features of the environment within a unified framework and find the optimal SPR strategies with almost optimal learning performance in all possible cases over time. Moreover, we study important issues related to the practical implementation, such as decoupling route selection with multi-path route probing, cooperative learning among multiple sources, the cold-start issue and delayed feedback of our algorithm. Nonetheless, the proposed SPR algorithms can be implemented with low complexity and they are proved to scale very well with the network size. The efficacy of the proposed solutions is verified by simulations from the real tracedriven datasets. Comparing to existing approaches in a typical network scenario, our algorithm has a 65.3 percent improvement of network delay given a learning period and a 81.5 percent improvement of learning duration under a specified network delay.

[1]  Aleksandrs Slivkins,et al.  One Practical Algorithm for Both Stochastic and Adversarial Bandits , 2014, ICML.

[2]  Bhaskar Krishnamachari,et al.  Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations , 2010, IEEE/ACM Transactions on Networking.

[3]  Tamás Linder,et al.  The On-Line Shortest Path Problem Under Partial Monitoring , 2007, J. Mach. Learn. Res..

[4]  Ambuj Tewari,et al.  Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret , 2012, ICML.

[5]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[6]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[7]  Nicolò Cesa-Bianchi,et al.  Combinatorial Bandits , 2012, COLT.

[8]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9]  Donald F. Towsley,et al.  Endhost-based shortest path routing in dynamic networks: An online learning approach , 2013, 2013 Proceedings IEEE INFOCOM.

[10]  Abhijeet Bhorkar,et al.  Adaptive Opportunistic Routing for Wireless Ad Hoc Networks , 2012, IEEE/ACM Transactions on Networking.

[11]  T. Javidi,et al.  No Regret Routing for ad-hoc wireless networks , 2010, 2010 Conference Record of the Forty Fourth Asilomar Conference on Signals, Systems and Computers.

[12]  Wtt Wtt Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits , 2015 .

[13]  Don Towsley,et al.  Routing worm: a fast, selective attack worm based on IP address information , 2005, Workshop on Principles of Advanced and Distributed Simulation (PADS'05).

[14]  Qing Zhao,et al.  Adaptive shortest-path routing under unknown and stochastically varying link states , 2012, 2012 10th International Symposium on Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks (WiOpt).

[15]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[16]  Baruch Awerbuch,et al.  Provably competitive adaptive routing , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[17]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[18]  Baruch Awerbuch,et al.  Adaptive routing with end-to-end feedback: distributed learning and geometric approaches , 2004, STOC '04.