Toward Optimal Adaptive Online Shortest Path Routing With Acceleration Under Jamming Attack

We consider the online shortest path routing (SPR) of a network with stochastically time varying link states under potential adversarial attacks. Due to the denial of service (DoS) attacks, the distributions of link states could be stochastic (benign) or adversarial at different temporal and spatial locations. Without any a priori, designing an adaptive and optimal DoS-proof SPR protocol to thwart all possible adversarial attacks is a very challenging issue. In this paper, we present the first such integral solution based on the multi-armed bandit (MAB) theory, where jamming is the adversarial strategy. By introducing a novel control parameter into the exploration phase for each link, a martingale inequality is applied in our formulated combinatorial adversarial MAB framework. The proposed algorithm could automatically detect the specific jammed and un-jammed links within a unified framework. As a result, the adaptive online SPR strategies with near-optimal learning performance in all possible regimes are obtained. Moreover, we propose the accelerated algorithms by multi-path route probing and cooperative learning among multiple sources, and study their implementation issues. Comparing to existing works, our algorithm has the respective 30.3% and 87.1% improvements of network delay for oblivious jamming and adaptive jamming given a typical learning period and a 81.5% improvement of learning duration under a specified network delay on average, while it enjoys almost the same performance without jamming. Lastly, the accelerated algorithms can achieve a maximal of 150.2% improvement in network delay and a 431.3% improvement in learning duration.

[1]  T. Javidi,et al.  No Regret Routing for ad-hoc wireless networks , 2010, 2010 Conference Record of the Forty Fourth Asilomar Conference on Signals, Systems and Computers.

[2]  Wenyuan Xu,et al.  Jamming-Resilient Multipath Routing , 2012, IEEE Transactions on Dependable and Secure Computing.

[3]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[4]  Baruch Awerbuch,et al.  Adaptive routing with end-to-end feedback: distributed learning and geometric approaches , 2004, STOC '04.

[5]  John Shawe-Taylor,et al.  PAC-Bayes-Bernstein Inequality for Martingales and its Application to Multiarmed Bandits , 2011, ICML On-line Trading of Exploration and Exploitation.

[6]  Thomas P. Hayes,et al.  Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[7]  Jean-Yves Audibert,et al.  Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.

[8]  Zhongcheng Li,et al.  Almost Optimal Channel Access in Multi-Hop Networks with Unknown Channel Variables , 2013, 2014 IEEE 34th International Conference on Distributed Computing Systems.

[9]  Donald F. Towsley,et al.  Endhost-based shortest path routing in dynamic networks: An online learning approach , 2013, 2013 Proceedings IEEE INFOCOM.

[10]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[11]  Gilles Stoltz Incomplete information and internal regret in prediction of individual sequences , 2005 .

[12]  Baruch Awerbuch,et al.  Provably competitive adaptive routing , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Aleksandrs Slivkins,et al.  One Practical Algorithm for Both Stochastic and Adversarial Bandits , 2014, ICML.

[15]  Bhaskar Krishnamachari,et al.  Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations , 2010, IEEE/ACM Transactions on Networking.

[16]  Peng Ning,et al.  Jamming-Resistant Multiradio Multichannel Opportunistic Spectrum Access in Cognitive Radio Networks , 2016, IEEE Transactions on Vehicular Technology.

[17]  Andrew S. Tanenbaum,et al.  Computer networks, 4th Edition , 2002 .

[18]  Peter Auer,et al.  An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits , 2016, COLT.

[19]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[20]  András György,et al.  The Combination of the Label Efficient and the Multi-Armed Bandit Problem in Adversarial Setting , 2006 .

[21]  Nicolò Cesa-Bianchi,et al.  Finite-Time Regret Bounds for the Multiarmed Bandit Problem , 1998, ICML.

[22]  Magyar Tud The On-Line Shortest Path Problem Under Partial Monitoring , 2007 .

[23]  Ming Li,et al.  Jamming Resilient Communication Using MIMO Interference Cancellation , 2016, IEEE Transactions on Information Forensics and Security.

[24]  R. Munos,et al.  Kullback–Leibler upper confidence bounds for optimal sequential allocation , 2012, 1210.1136.

[25]  Ohad Shamir,et al.  Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..

[26]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[27]  Aleksandrs Slivkins,et al.  25th Annual Conference on Learning Theory The Best of Both Worlds: Stochastic and Adversarial Bandits , 2022 .

[28]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 1985 .

[29]  Gábor Lugosi,et al.  Minimizing Regret with Label Efficient Prediction , 2004, COLT.

[30]  Dapeng Wu,et al.  Shortest Path Routing in Unknown Environments: Is the Adaptive Optimal Strategy Available? , 2016, 2016 13th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON).

[31]  Qing Zhao,et al.  Online learning for stochastic linear optimization problems , 2012, 2012 Information Theory and Applications Workshop.

[32]  Nicolò Cesa-Bianchi,et al.  Combinatorial Bandits , 2012, COLT.

[33]  Guey-Yun Chang,et al.  A Jamming-Resistant Channel Hopping Scheme for Cognitive Radio Networks , 2017, IEEE Transactions on Wireless Communications.

[34]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[35]  Mr. Jamal Mhawesh Challab Adaptive Opportunistic Routing For Wireless AD HOC Networks , 2016 .

[36]  Ambuj Tewari,et al.  Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret , 2012, ICML.

[37]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[38]  Gábor Lugosi,et al.  Regret in Online Combinatorial Optimization , 2012, Math. Oper. Res..

[39]  Qing Zhao,et al.  Adaptive shortest-path routing under unknown and stochastically varying link states , 2012, 2012 10th International Symposium on Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks (WiOpt).

[40]  Richard Combes,et al.  Stochastic Online Shortest Path Routing: The Value of Feedback , 2013, IEEE Transactions on Automatic Control.

[41]  András György,et al.  Online Learning under Delayed Feedback , 2013, ICML.

[42]  Koby Crammer,et al.  Prediction with Limited Advice and Multiarmed Bandits with Paid Observations , 2014, ICML.

[43]  Zheng Wen,et al.  Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits , 2014, AISTATS.