Distributed reinforcement learning for adaptive and robust network intrusion response

Distributed denial of service (DDoS) attacks constitute a rapidly evolving threat in the current Internet. Multiagent Router Throttling is a novel approach to defend against DDoS attacks where multiple reinforcement learning agents are installed on a set of routers and learn to rate-limit or throttle traffic towards a victim server. The focus of this paper is on online learning and scalability. We propose an approach that incorporates task decomposition, team rewards and a form of reward shaping called difference rewards. One of the novel characteristics of the proposed system is that it provides a decentralised coordinated response to the DDoS problem, thus being resilient to DDoS attacks themselves. The proposed system learns remarkably fast, thus being suitable for online learning. Furthermore, its scalability is successfully demonstrated in experiments involving 1000 learning agents. We compare our approach against a baseline and a popular state-of-the-art throttling technique from the network security literature and show that the proposed approach is more effective, adaptive to sophisticated attack rate dynamics and robust to agent failures.

[1]  Daniel Kudenko,et al.  Distributed response to network intrusions using multiagent reinforcement learning , 2015, Eng. Appl. Artif. Intell..

[2]  Aikaterini Mitrokotsa,et al.  DDoS attacks and defense mechanisms: classification and state-of-the-art , 2004, Comput. Networks.

[3]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[4]  Angelos D. Keromytis,et al.  SOS: secure overlay services , 2002, SIGCOMM 2002.

[5]  Daniel Kudenko,et al.  Multiagent Router Throttling: Decentralized Coordinated Response Against DDoS Attacks , 2013, IAAI.

[6]  Ross J. Anderson,et al.  The XenoService { A Distributed Defeat for Distributed Denial of Service , 2000 .

[7]  Kagan Tumer,et al.  Multiagent reinforcement learning in a distributed sensor network with indirect feedback , 2013, AAMAS.

[8]  Kagan Tumer,et al.  Analyzing and visualizing multiagent rewards in dynamic and stochastic domains , 2008, Autonomous Agents and Multi-Agent Systems.

[9]  Ratul Mahajan,et al.  Controlling high bandwidth aggregates in the network , 2002, CCRV.

[10]  Peter Reiher,et al.  A taxonomy of DDoS attack and DDoS defense mechanisms , 2004, CCRV.

[11]  David K. Y. Yau,et al.  Defending against distributed denial-of-service attacks with max-min fair server-centric router throttles , 2005, IEEE/ACM Transactions on Networking.

[12]  Kagan Tumer,et al.  Collective Intelligence for Control of Distributed Dynamical Systems , 1999, ArXiv.

[13]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[14]  Kagan Tumer,et al.  Distributed agent-based air traffic flow management , 2007, AAMAS '07.

[15]  Sam Devlin,et al.  Potential-based difference rewards for multiagent reinforcement learning , 2014, AAMAS.

[16]  Anna R. Karlin,et al.  Practical network support for IP traceback , 2000, SIGCOMM.

[17]  Balachander Krishnamurthy,et al.  Flash crowds and denial of service attacks: characterization and implications for CDNs and web sites , 2002, WWW '02.

[18]  Ramesh Govindan,et al.  COSSACK: Coordinated Suppression of Simultaneous Attacks , 2003, Proceedings DARPA Information Survivability Conference and Exposition.

[19]  Angelos D. Keromytis,et al.  SOS: secure overlay services , 2002, SIGCOMM '02.

[20]  John S. Heidemann,et al.  A framework for classifying denial of service attacks , 2003, SIGCOMM '03.

[21]  Kagan Tumer,et al.  Combining reward shaping and hierarchies for scaling to large multiagent systems , 2016, The Knowledge Engineering Review.