Diagnosing Advanced Persistent Threats: A Position Paper

When a computer system is hacked, analyzing the root-cause (for example entry-point of penetration) is a diagnostic process. An audit trail, as defined in the National Information Assurance Glossary, is a securityrelevant chronological (set of) record(s), and/or destination and source of records that provide evidence of the sequence of activities that have affected, at any time, a specific operation, procedure, or event. After detecting an intrusion, system administrators manually analyze audit trails to both isolate the root-cause and perform damage impact assessment of the attack. Due to the sheer volume of information and low-level activities in the audit trails, this task is rather cumbersome and time intensive. In this position paper, we discuss our ideas to automate the analysis of audit trails using machine learning and model-based reasoning techniques. Our approach classifies audit trails into the high-level activities they represent, and then reasons about those activities and their threat potential in real-time and forensically. We argue that, by using the outcome of this reasoning to explain complex evidence of malicious behavior, we are equipping system administrators with the proper tools to promptly react to, stop, and mitigate attacks.

[1]  Rui Abreu,et al.  A Low-Cost Approximate Minimal Hitting Set Algorithm and its Application to Model-Based Diagnosis , 2009, SARA.

[2]  Stéphane Lafortune,et al.  Failure diagnosis using discrete-event models , 1996, IEEE Trans. Control. Syst. Technol..

[3]  Kumar Sricharan,et al.  Multi-source fusion for anomaly detection: using across-domain and across-time peer-group consistency checks , 2014, J. Wirel. Mob. Networks Ubiquitous Comput. Dependable Appl..

[4]  Alban Grastien,et al.  Incremental Diagnosis of Discrete-Event Systems , 2005, IJCAI.

[5]  Xiangyu Zhang,et al.  High Accuracy Attack Provenance via Binary-based Execution Partition , 2013, NDSS.

[6]  Alexander Feldman,et al.  Empirical Evaluation of Diagnostic Algorithm Performance Using a Generic Framework , 2010 .

[7]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[8]  Charu C. Aggarwal,et al.  Outlier ensembles: position paper , 2013, SKDD.

[9]  Somesh Jha,et al.  Automated generation and analysis of attack graphs , 2002, Proceedings 2002 IEEE Symposium on Security and Privacy.

[10]  Rui Abreu,et al.  A Distributed Approach to Diagnosis Candidate Generation , 2013, EPIA.

[11]  Peter Zoeteweij,et al.  Spectrum-Based Multiple Fault Localization , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[12]  Jurryt Pietersma,et al.  All Roads Lead to Fault Diagnosis: Model-Based Reasoning with LYDIA , 2006 .

[13]  Patrik Haslum,et al.  Conflict-Based Diagnosis of Discrete Event Systems: Theory and Practice , 2012, KR.

[14]  Kenneth D. Forbus,et al.  Building Problem Solvers , 1993 .

[15]  Alban Grastien,et al.  First Steps Towards Incremental Diagnosis of Discrete-Event Systems , 2005, Canadian Conference on AI.

[16]  Gregg Rothermel,et al.  An empirical investigation of the relationship between spectra differences and regression faults , 2000 .

[17]  Hans-Peter Kriegel,et al.  Interpreting and Unifying Outlier Scores , 2011, SDM.

[18]  Hans-Peter Kriegel,et al.  On Evaluation of Outlier Rankings and Outlier Scores , 2012, SDM.

[19]  Peter Zoeteweij,et al.  A New Bayesian Approach to Multiple Intermittent Fault Diagnosis , 2009, IJCAI.

[20]  Jing Gao,et al.  Converting Output Scores from Outlier Detection Algorithms into Probability Estimates , 2006, Sixth International Conference on Data Mining (ICDM'06).

[21]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[22]  Kate J. Li,et al.  Bayesian Aggregation of Order-Based Rank Data , 2014 .

[23]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[24]  Xiangyu Zhang,et al.  LogGC: garbage collecting audit log , 2013, CCS.

[25]  Behrooz Parhami,et al.  Computer arithmetic - algorithms and hardware designs , 1999 .

[26]  Brian C. Williams,et al.  Diagnosing Multiple Faults , 1987, Artif. Intell..

[27]  Glenn H. MacEwen,et al.  A logic for reasoning about security , 1990, [1990] Proceedings. The Computer Security Foundations Workshop III.

[28]  Gregory M. Provan,et al.  Approximate Model-Based Diagnosis Using Greedy Stochastic Search , 2010, J. Artif. Intell. Res..

[29]  Bülent Yener,et al.  Modeling and detection of complex attacks , 2007, 2007 Third International Conference on Security and Privacy in Communications Networks and the Workshops - SecureComm 2007.

[30]  Johan de Kleer,et al.  One step lookahead is pretty good , 1992 .