Dynamic Scheduling of Cybersecurity Analysts for Minimizing Risk Using Reinforcement Learning

An important component of the cyber-defense mechanism is the adequate staffing levels of its cybersecurity analyst workforce and their optimal assignment to sensors for investigating the dynamic alert traffic. The ever-increasing cybersecurity threats faced by today’s digital systems require a strong cyber-defense mechanism that is both reactive in its response to mitigate the known risk and proactive in being prepared for handling the unknown risks. In order to be proactive for handling the unknown risks, the above workforce must be scheduled dynamically so the system is adaptive to meet the day-to-day stochastic demands on its workforce (both size and expertise mix). The stochastic demands on the workforce stem from the varying alert generation and their significance rate, which causes an uncertainty for the cybersecurity analyst scheduler that is attempting to schedule analysts for work and allocate sensors to analysts. Sensor data are analyzed by automatic processing systems, and alerts are generated. A portion of these alerts is categorized to be significant, which requires thorough examination by a cybersecurity analyst. Risk, in this article, is defined as the percentage of significant alerts that are not thoroughly analyzed by analysts. In order to minimize risk, it is imperative that the cyber-defense system accurately estimates the future significant alert generation rate and dynamically schedules its workforce to meet the stochastic workload demand to analyze them. The article presents a reinforcement learning-based stochastic dynamic programming optimization model that incorporates the above estimates of future alert rates and responds by dynamically scheduling cybersecurity analysts to minimize risk (i.e., maximize significant alert coverage by analysts) and maintain the risk under a pre-determined upper bound. The article tests the dynamic optimization model and compares the results to an integer programming model that optimizes the static staffing needs based on a daily-average alert generation rate with no estimation of future alert rates (static workforce model). Results indicate that over a finite planning horizon, the learning-based optimization model, through a dynamic (on-call) workforce in addition to the static workforce, (a) is capable of balancing risk between days and reducing overall risk better than the static model, (b) is scalable and capable of identifying the quantity and the right mix of analyst expertise in an organization, and (c) is able to determine their dynamic (on-call) schedule and their sensor-to-analyst allocation in order to maintain risk below a given upper bound. Several meta-principles are presented, which are derived from the optimization model, and they further serve as guiding principles for hiring and scheduling cybersecurity analysts. Days-off scheduling was performed to determine analyst weekly work schedules that met the cybersecurity system’s workforce constraints and requirements.

[1]  L. Goddard,et al.  Operations Research (OR) , 2007 .

[2]  Sushil Jajodia,et al.  Optimal Scheduling of Cybersecurity Analysts for Minimizing Risk , 2017, ACM Trans. Intell. Syst. Technol..

[3]  Ted K. Ralphs,et al.  Integer and Combinatorial Optimization , 2013 .

[4]  Tudor Dumitras,et al.  The Global Cyber-Vulnerability Report , 2015, Terrorism, Security, and Computation.

[5]  Yves Nobert,et al.  Freight Handling Personnel Scheduling at Air Cargo Terminals , 1998, Transp. Sci..

[6]  Dorothy E. Denning,et al.  An Intrusion-Detection Model , 1987, IEEE Transactions on Software Engineering.

[7]  Abhijit Gosavi,et al.  Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .

[8]  Peter Stone,et al.  Reinforcement learning , 2019, Scholarpedia.

[9]  David Lesaint,et al.  Field workforce scheduling , 2003 .

[10]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[11]  Mehmet Emin Aydin,et al.  Dynamic job-shop scheduling using reinforcement learning agents , 2000, Robotics Auton. Syst..

[12]  Warren B. Powell,et al.  Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics) , 2007 .

[13]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[14]  Fuqing Zhao,et al.  A Dynamic Rescheduling Model with Multi-Agent System and Its Solution Method , 2012 .

[15]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[16]  U. Rieder,et al.  Markov Decision Processes , 2010 .

[17]  Roberto Di Pietro,et al.  Intrusion Detection Systems , 2008 .

[18]  Tapas K. Das,et al.  A multi-agent reinforcement learning approach to obtaining dynamic control policies for stochastic lot scheduling problem , 2005, Simul. Model. Pract. Theory.

[19]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[20]  Todd L. Heberlein,et al.  Network intrusion detection , 1994, IEEE Network.

[21]  Robert F. Erbacher,et al.  Extending Case-Based Reasoning to Network Alert Reporting , 2012, 2012 International Conference on Cyber Security.

[22]  Nuno J. Mamede,et al.  Multi-Agent Dynamic Scheduling and Re-Scheduling with Global Temporal Constraints , 2001, ICEIS.

[23]  J. Pearson Linear multivariable control, a geometric approach , 1977 .

[24]  Hai Yang,et al.  ACM Transactions on Intelligent Systems and Technology - Special Section on Urban Computing , 2014 .

[25]  Warren B. Powell,et al.  Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .

[26]  Fabio Persia,et al.  Discovering the Top-k Unexplained Sequences in Time-Stamped Observation Data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[27]  Uwe Fink,et al.  Planning And Scheduling In Manufacturing And Services , 2016 .

[28]  Sushil Jajodia,et al.  Applications of Data Mining in Computer Security , 2002, Advances in Information Security.

[29]  野村総合研究所 最新図解CIOハンドブック : Chief information officer , 2000 .

[30]  Panos M. Pardalos,et al.  Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..

[31]  Vern Paxson,et al.  Outside the Closed World: On Using Machine Learning for Network Intrusion Detection , 2010, 2010 IEEE Symposium on Security and Privacy.

[32]  A. N. Zincir-Heywood,et al.  Intrusion Detection Systems , 2008 .

[33]  Der-San Chen,et al.  Applied Integer Programming: Modeling and Solution , 2010 .