A methodology to measure and monitor level of operational effectiveness of a CSOC

In a cybersecurity operations center (CSOC), under normal operating conditions in a day, sufficient numbers of analysts are available to analyze the amount of alert workload generated by intrusion detection systems (IDSs). For the purpose of this paper, this means that the cybersecurity analysts can fully investigate each and every alert that is generated by the IDSs in a reasonable amount of time. However, there are a number of disruptive factors that can adversely impact the normal operating conditions such as (1) higher alert generation rates from a few IDSs, (2) new alert patterns that decreases the throughput of the alert analysis process, and (3) analyst absenteeism. The impact of all the above factors is that the alerts wait for a long duration before being analyzed, which impacts the readiness of the CSOC. It is imperative that the readiness of the CSOC be quantified, which in this paper is defined as the level of operational effectiveness (LOE) of a CSOC. LOE can be quantified and monitored by knowing the exact deviation of the CSOC conditions from normal and how long it takes for the condition to return to normal. In this paper, we quantify LOE by defining a new metric called total time for alert investigation (TTA), which is the sum of the waiting time in the queue and the analyst investigation time of an alert after its arrival in the CSOC database. A dynamic TTA monitoring framework is developed in which a nominal average TTA per hour (avgTTA/hr) is established as the baseline for normal operating condition using individual TTA of alerts that were investigated in that hour. At the baseline value of avgTTA/hr, LOE is considered to be ideal. Also, an upper-bound (threshold) value for avgTTA/hr is established, below which the LOE is considered to be optimal. Several case studies illustrate the impact of the above disruptive factors on the dynamic behavior of avgTTA/hr, which provide useful insights about the current LOE of the system. Also, the effect of actions taken to return the CSOC to its normal operating condition is studied by varying both the amount and the time of action, which in turn impacts the dynamic behavior of avgTTA/hr. Results indicate that by using the insights learnt from measuring, monitoring, and controlling the dynamic behavior of avgTTA/hr, a manager can quantify and color-code the LOE of the CSOC. Furthermore, the above insights allow for a deeper understanding of acceptable downtime for the IDS, acceptable levels for absenteeism, and the recovery time and effort needed to return the CSOC to its ideal LOE.

[1]  W. Mackillop,et al.  The relationship between waiting time for radiotherapy and clinical outcomes: a systematic review of the literature. , 2008, Radiotherapy and oncology : journal of the European Society for Therapeutic Radiology and Oncology.

[2]  Thomas R. Robbins,et al.  Evaluating Arrival Rate Uncertainty in Call Centers , 2006, Proceedings of the 2006 Winter Simulation Conference.

[3]  Stephen Northcutt,et al.  Network intrusion detection , 2003 .

[4]  Vern Paxson,et al.  Outside the Closed World: On Using Machine Learning for Network Intrusion Detection , 2010, 2010 IEEE Symposium on Security and Privacy.

[5]  A. N. Zincir-Heywood,et al.  Intrusion Detection Systems , 2008 .

[6]  Dorothy E. Denning,et al.  An Intrusion-Detection Model , 1987, IEEE Transactions on Software Engineering.

[7]  Tudor Dumitras,et al.  The Global Cyber-Vulnerability Report , 2015, Terrorism, Security, and Computation.

[8]  Pieter Vansteenwegen,et al.  Decreasing the passenger waiting time for an intercity rail network , 2007 .

[9]  Sushil Jajodia,et al.  Optimal Scheduling of Cybersecurity Analysts for Minimizing Risk , 2017, ACM Trans. Intell. Syst. Technol..

[10]  Charles Kelly,et al.  A framework for improving operational effectiveness and cost efficiency in emergency planning and response , 1995 .

[11]  Henk Tijms New and old results for the M/D/c queue , 2006 .

[12]  Eric P. Jack,et al.  Operational challenges in the call center industry: a case study and resource‐based framework , 2006 .

[13]  Wayne G. Lutters,et al.  I know my network: collaboration and expertise in intrusion detection , 2004, CSCW.

[14]  Sushil Jajodia,et al.  Dynamic Scheduling of Cybersecurity Analysts for Minimizing Risk Using Reinforcement Learning , 2016, ACM Trans. Intell. Syst. Technol..

[15]  F. Guerriero,et al.  Operational research in the management of the operating theatre: a survey , 2011, Health care management science.

[16]  Vladimir Marianov,et al.  Location models for airline hubs behaving as M/D/c queues , 2003, Comput. Oper. Res..

[17]  Mark P. Van Oyen,et al.  Design and Analysis of Hospital Admission Control for Operational Effectiveness , 2011 .

[18]  Marc Dacier,et al.  Mining intrusion detection alarms for actionable knowledge , 2002, KDD.