Distributed reinforcement learning for network intrusion response

The increasing adoption of technologies and the exponential growth of networks has made the area of information technology an integral part of our lives, where network security plays a vital role. One of the most serious threats in the current Internet is posed by distributed denial of service (DDoS) attacks, which target the availability of the victim system. Such an attack is designed to exhaust a server's resources or congest a network's infrastructure, and therefore renders the victim incapable of providing services to its legitimate users or customers. To tackle the distributed nature of these attacks, a distributed and coordinated defence mechanism is necessary, where many defensive nodes, across different locations cooperate in order to stop or reduce the flood. This thesis investigates the applicability of distributed reinforcement learning to intrusion response, specifically, DDoS response. We propose a novel approach to respond to DDoS attacks called Multiagent Router Throttling. Multiagent Router Throttling provides an agent-based distributed response to the DDoS problem, where multiple reinforcement learning agents are installed on a set of routers and learn to rate-limit or throttle traffic towards a victim server. One of the novel characteristics of the proposed approach is that it has a decentralised architecture and provides a decentralised coordinated response to the DDoS problem, thus being resilient to the attacks themselves. Scalability constitutes a critical aspect of a defence system since a non-scalable mechanism will never be considered, let alone adopted, for wide deployment by a company or organisation. We propose Coordinated Team Learning (CTL) which is a novel design to the original Multiagent Router Throttling approach based on the divide-and-conquer paradigm, that uses task decomposition and coordinated team rewards. To better scale-up CTL is combined with a form of reward shaping. The scalability of the proposed system is successfully demonstrated in experiments involving up to 1000 reinforcement learning agents. The significant improvements on scalability and learning speed lay the foundations for a potential real-world deployment.

[1]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[2]  Hervé Debar,et al.  A neural network component for an intrusion detection system , 1992, Proceedings 1992 IEEE Computer Society Symposium on Research in Security and Privacy.

[3]  Sam Devlin,et al.  Dynamic potential-based reward shaping , 2012, AAMAS.

[4]  Peter Stone,et al.  Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[5]  Peter Stone,et al.  Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.

[6]  Kagan Tumer,et al.  Multiagent reinforcement learning in a distributed sensor network with indirect feedback , 2013, AAMAS.

[7]  Shari Lawrence Pfleeger,et al.  Security in Computing, 4th Edition , 2006 .

[8]  Xin Xu,et al.  Defending DDoS Attacks Using Hidden Markov Models and Cooperative Reinforcement Learning , 2007, PAISI.

[9]  Daniel Kudenko,et al.  Multi-Agent Reinforcement Learning for Intrusion Detection: A case study and evaluation , 2008, ECAI.

[10]  Ramesh Govindan,et al.  Cossack: coordinated suppression of simultaneous attacks , 2003, Proceedings DARPA Information Survivability Conference and Exposition.

[11]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[12]  Randy Goebel,et al.  Computational intelligence - a logical approach , 1998 .

[13]  Stephanie Forrest,et al.  Computer immunology , 1997, CACM.

[14]  John S. Heidemann,et al.  A framework for classifying denial of service attacks , 2003, SIGCOMM '03.

[15]  Petr Jan Horn,et al.  Autonomic Computing: IBM's Perspective on the State of Information Technology , 2001 .

[16]  Peter Vrancx,et al.  Solving delayed coordination problems in MAS , 2011, AAMAS.

[17]  William Stallings,et al.  Cryptography and Network Security: Principles and Practice , 1998 .

[18]  Kagan Tumer,et al.  Combining reward shaping and hierarchies for scaling to large multiagent systems , 2016, The Knowledge Engineering Review.

[19]  Kihong Park,et al.  On the relationship between file sizes, transport protocols, and self-similar network traffic , 1996, Proceedings of 1996 International Conference on Network Protocols (ICNP-96).

[20]  Ratul Mahajan,et al.  Controlling High Bandwidth Aggregates in the Network (Extended Version) , 2001 .

[21]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[22]  Xiaofeng Wang,et al.  Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games , 2002, NIPS.

[23]  Angelos D. Keromytis,et al.  SOS: secure overlay services , 2002, SIGCOMM '02.

[24]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[25]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[26]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[27]  Jonathan Timmis,et al.  Application Areas of AIS: The Past, The Present and The Future , 2005, ICARIS.

[28]  Sam Devlin,et al.  Potential-based difference rewards for multiagent reinforcement learning , 2014, AAMAS.

[29]  D. Sculley,et al.  Detecting adversarial advertisements in the wild , 2011, KDD.

[30]  Ming Tan,et al.  Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents , 1997, ICML.

[31]  Christopher Leckie,et al.  A survey of coordinated attacks and collaborative intrusion detection , 2010, Comput. Secur..

[32]  Aiko Pras,et al.  An Overview of IP Flow-Based Intrusion Detection , 2010, IEEE Communications Surveys & Tutorials.

[33]  Paul Ferguson,et al.  Network Ingress Filtering: Defeating Denial of Service Attacks which employ IP Source Address Spoofing , 1998, RFC.

[34]  Ratul Mahajan,et al.  Controlling high bandwidth aggregates in the network , 2002, CCRV.

[35]  Peter Reiher,et al.  A taxonomy of DDoS attack and DDoS defense mechanisms , 2004, CCRV.

[36]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[37]  Teerawat Issariyakul,et al.  Introduction to Network Simulator NS2 , 2008 .

[38]  Kagan Tumer,et al.  Collective Intelligence for Control of Distributed Dynamical Systems , 1999, ArXiv.

[39]  David Moore,et al.  Code-Red: a case study on the spread and victims of an internet worm , 2002, IMW '02.

[40]  Steven M. Bellovin,et al.  Implementing Pushback: Router-Based Defense Against DDoS Attacks , 2002, NDSS.

[41]  Ray Hunt,et al.  A taxonomy of network and computer attacks , 2005, Comput. Secur..

[42]  Jaideep Srivastava,et al.  Detection of Novel Network Attacks Using Data Mining , 2003 .

[43]  C. Douligeris,et al.  Detecting denial of service attacks using emergent self-organizing maps , 2005, Proceedings of the Fifth IEEE International Symposium on Signal Processing and Information Technology, 2005..

[44]  Balachander Krishnamurthy,et al.  Flash crowds and denial of service attacks: characterization and implications for CDNs and web sites , 2002, WWW '02.

[45]  David K. Y. Yau,et al.  On Defending Against Distribtued Denial- of-Service Attacks with Server-Centric Router Throttles , 2001 .

[46]  Aikaterini Mitrokotsa,et al.  DDoS attacks and defense mechanisms: classification and state-of-the-art , 2004, Comput. Networks.

[47]  Anna R. Karlin,et al.  Practical network support for IP traceback , 2000, SIGCOMM.

[48]  Ratul Mahajan,et al.  Measuring ISP topologies with rocketfuel , 2002, TNET.

[49]  Ruby B. Lee,et al.  Distributed Denial of Service: Taxonomies of Attacks, Tools, and Countermeasures , 2004, PDCS.

[50]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[51]  Kotagiri Ramamohanarao,et al.  Protection from distributed denial of service attacks using history-based IP filtering , 2003, IEEE International Conference on Communications, 2003. ICC '03..

[52]  Jelena Mirkovic,et al.  Alliance formation for DDoS defense , 2003, NSPW '03.

[53]  Martin Lauer,et al.  An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[54]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[55]  Cannady,et al.  Next Generation Intrusion Detection: Autonomous Reinforcement Learning of Network Attacks , 2000 .

[56]  Jelena Mirkovic,et al.  A Framework for a Collaborative DDoS Defense , 2006, 2006 22nd Annual Computer Security Applications Conference (ACSAC'06).

[57]  Martin Roesch,et al.  Snort - Lightweight Intrusion Detection for Networks , 1999 .

[58]  Maja J. Mataric,et al.  Reward Functions for Accelerated Learning , 1994, ICML.

[59]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[60]  Kagan Tumer,et al.  Distributed agent-based air traffic flow management , 2007, AAMAS '07.

[61]  Michael Wooldridge,et al.  An Introduction to MultiAgent Systems, Second Edition , 2009 .

[62]  Preben Alstrøm,et al.  Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.

[63]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[64]  Ali A. Ghorbani,et al.  A Feature Classification Scheme For Network Intrusion Detection , 2007, Int. J. Netw. Secur..

[65]  S. Russel and P. Norvig,et al.  “Artificial Intelligence – A Modern Approach”, Second Edition, Pearson Education, 2003. , 2015 .

[66]  Vern Paxson,et al.  Outside the Closed World: On Using Machine Learning for Network Intrusion Detection , 2010, 2010 IEEE Symposium on Security and Privacy.

[67]  Michael Wooldridge,et al.  Introduction to multiagent systems , 2001 .

[68]  Sam Devlin,et al.  Theoretical considerations of potential-based reward shaping for multi-agent systems , 2011, AAMAS.

[69]  Deborah Estrin,et al.  Advances in network simulation , 2000, Computer.

[70]  Ross J. Anderson,et al.  The XenoService { A Distributed Defeat for Distributed Denial of Service , 2000 .

[71]  Maja J. Mataric,et al.  Using communication to reduce locality in distributed multiagent learning , 1997, J. Exp. Theor. Artif. Intell..

[72]  Kagan Tumer,et al.  Unifying temporal and structural credit assignment problems , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[73]  Ali A. Ghorbani,et al.  Network Intrusion Detection and Prevention - Concepts and Techniques , 2010, Advances in Information Security.

[74]  Bart De Schutter,et al.  Multiagent Reinforcement Learning with Adaptive State Focus , 2005, BNAIC.

[75]  David K. Y. Yau,et al.  Defending against distributed denial-of-service attacks with max-min fair server-centric router throttles , 2005, IEEE/ACM Transactions on Networking.

[76]  Jelena Mirkovic,et al.  Attacking DDoS at the source , 2002, 10th IEEE International Conference on Network Protocols, 2002. Proceedings..

[77]  Eric Wiewiora,et al.  Potential-Based Shaping and Q-Value Initialization are Equivalent , 2003, J. Artif. Intell. Res..

[78]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[79]  Anup K. Ghosh,et al.  Detecting anomalous and unknown intrusions against programs , 1998, Proceedings 14th Annual Computer Security Applications Conference (Cat. No.98EX217).

[80]  Peter Stone,et al.  Learning and Multiagent Reasoning for Autonomous Agents , 2007, IJCAI.

[81]  Jeffrey O. Kephart,et al.  Measuring and modeling computer virus prevalence , 1993, Proceedings 1993 IEEE Computer Society Symposium on Research in Security and Privacy.

[82]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[83]  Heejo Lee,et al.  On the effectiveness of route-based packet filtering for distributed DoS attack prevention in power-law internets , 2001, SIGCOMM '01.

[84]  Daniel Kudenko,et al.  Multi-agent Reinforcement Learning for Intrusion Detection , 2007, Adaptive Agents and Multi-Agents Systems.

[85]  L. Buşoniu,et al.  A comprehensive survey of multi-agent reinforcement learning , 2011 .

[86]  Ling Huang,et al.  Approaches to adversarial drift , 2013, AISec.

[87]  Natalia Akchurina,et al.  Multiagent reinforcement learning: algorithm converging to Nash equilibrium in general-sum discounted stochastic games , 2009, AAMAS.

[88]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.