DeepADMR: A Deep Learning based Anomaly Detection for MANET Routing

We developed DeepADMR, a novel neural anomaly detector for the deep reinforcement learning (DRL)-based DeepCQ+ MANET routing policy. The performance of DRL-based algorithms such as DeepCQ+ is only verified within the trained and tested environments, hence their deployment in the tactical domain induces high risks. DeepADMR monitors unexpected behavior of the DeepCQ+ policy based on the temporal difference errors (TD-errors) in real-time and detects anomaly scenarios with empirical and non-parametric cumulative-sum statistics. The DeepCQ+ design via multi-agent weight-sharing proximal policy optimization (PPO) is slightly modified to enable the real-time estimation of the TD-errors. We report the DeepADMR performance in the presence of channel disruptions, high mobility levels, and network sizes beyond the training environments, which shows its effectiveness.

[1]  Young-Bae Ko,et al.  Trust-Based Intelligent Routing Protocol with Q-Learning for Mission-Critical Wireless Sensor Networks , 2022, Sensors.

[2]  Saeed Kaviani,et al.  DeepCQ+: Robust and Scalable Routing with Multi-Agent Deep Reinforcement Learning for Highly Dynamic Networks , 2021, MILCOM 2021 - 2021 IEEE Military Communications Conference (MILCOM).

[3]  Pramod K. Varshney,et al.  A Scalable Algorithm for Anomaly Detection via Learning-Based Controlled Sensing , 2021, ICC 2021 - IEEE International Conference on Communications.

[4]  Saeed Kaviani,et al.  Robust and Scalable Routing with Multi-Agent Deep Reinforcement Learning for MANETs , 2021, ArXiv.

[5]  Dinesh Manocha,et al.  Parameter Sharing is Surprisingly Useful for Multi-Agent Deep Reinforcement Learning. , 2020 .

[6]  Lang Tong,et al.  Universal Data Anomaly Detection via Inverse Generative Adversary Network , 2020, IEEE Signal Processing Letters.

[7]  D. Hassabis,et al.  A distributional code for value in dopamine-based reinforcement learning , 2020, Nature.

[8]  Mehmet Necip Kurt,et al.  Sequential Model-Free Anomaly Detection for Big Data Streams , 2019, 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[9]  Kevin Larson,et al.  A Reinforcement Learning Approach to Adaptive Redundancy for Routing in Tactical Networks , 2018, MILCOM 2018 - 2018 IEEE Military Communications Conference (MILCOM).

[10]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[11]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[12]  Patrick M. Pilarski,et al.  True Online Temporal-Difference Learning , 2015, J. Mach. Learn. Res..

[13]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[14]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[15]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[16]  C. Sathitwiriyawong,et al.  A Comparative Study of Random Waypoint and Gauss-Markov Mobility Models in the Performance Evaluation of MANET , 2006, 2006 International Symposium on Communications and Information Technologies.

[17]  Risto Miikkulainen,et al.  Confidence-based Q-Routing: An on-line adaptive network routing algorithm , 1998 .

[18]  Michael L. Littman,et al.  Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach , 1993, NIPS.

[19]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[20]  R. Fergus,et al.  Automatic Data Augmentation for Generalization in Reinforcement Learning , 2021, Neural Information Processing Systems.

[21]  Min Tong,et al.  Intelligent Routing Control for MANET Based on Reinforcement Learning , 2018 .

[22]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.