MCI : Modeling-based Causality Inference in Audit Logging for Attack Investigation

In this paper, we develop a model based causality inference technique for audit logging that does not require any application instrumentation or kernel modification. It leverages a recent dynamic analysis, dual execution (LDX), that can infer precise causality between system calls but unfortunately requires doubling the resource consumption such as CPU time and memory consumption. For each application, we use LDX to acquire precise causal models for a set of primitive operations. Each model is a sequence of system calls that have inter-dependences, some of them caused by memory operations and hence implicit at the system call level. These models are described by a language that supports various complexity such as regular, context-free, and even context-sensitive. In production run, a novel parser is deployed to parse audit logs (without any enhancement) to model instances and hence derive causality. Our evaluation on a set of real-world programs shows that the technique is highly effective. The generated models can recover causality with 0% false-positives (FP) and false-negatives (FN) for most programs and only 8.3% FP and 5.2% FN in the worst cases. The models also feature excellent composibility, meaning that the models derived from primitive operations can be composed together to describe causality for large and complex real world missions. Applying our technique to attack investigation shows that the system-wide attack causal graphs are highly precise and concise, having better quality than the state-of-the-art.

[1]  Noam Chomsky,et al.  On Certain Formal Properties of Grammars , 1959, Inf. Control..

[2]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[3]  Steven A. Hofmeyr,et al.  Intrusion Detection via System Call Traces , 1997, IEEE Softw..

[4]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[5]  R. Sekar,et al.  A fast automaton-based method for detecting anomalous program behaviors , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[6]  David A. Wagner,et al.  Intrusion detection via static analysis , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[7]  Samuel T. King,et al.  Backtracking intrusions , 2003, SOSP '03.

[8]  Weibo Gong,et al.  Anomaly detection using call stack information , 2003, 2003 Symposium on Security and Privacy, 2003..

[9]  Eyal de Lara,et al.  The taser intrusion recovery system , 2005, SOSP '05.

[10]  Samuel T. King,et al.  Enriching Intrusion Alerts Through Multi-Host Causality , 2005, NDSS.

[11]  James Newsome,et al.  Dynamic Taint Analysis for Automatic Detection, Analysis, and SignatureGeneration of Exploits on Commodity Software , 2005, NDSS.

[12]  Patrick Crowley,et al.  Algorithms to accelerate multiple regular expressions matching for deep packet inspection , 2006, SIGCOMM 2006.

[13]  Alaa A. Kharbouch,et al.  Three models for the description of language , 1956, IRE Trans. Inf. Theory.

[14]  Ion Stoica,et al.  ODR: output-deterministic replay for multicore debugging , 2009, SOSP '09.

[15]  Xi Wang,et al.  Intrusion Recovery Using Selective Re-execution , 2010, OSDI.

[16]  Fabian Monrose,et al.  Trail of bytes: efficient support for forensic analysis , 2010, CCS '10.

[17]  Angelos D. Keromytis,et al.  A General Approach for Efficiently Accelerating Software-based Dynamic Data Flow Tracking on Commodity Hardware , 2012, NDSS.

[18]  Angelos D. Keromytis,et al.  libdft: practical dynamic data flow tracking for commodity systems , 2012, VEE '12.

[19]  Xiangyu Zhang,et al.  High Accuracy Attack Provenance via Binary-based Execution Partition , 2013, NDSS.

[20]  Xiangyu Zhang,et al.  LogGC: garbage collecting audit log , 2013, CCS.

[21]  Angelos D. Keromytis,et al.  ShadowReplica: efficient parallelization of dynamic data flow tracking , 2013, CCS.

[22]  John K. Ousterhout,et al.  In Search of an Understandable Consensus Algorithm , 2014, USENIX ATC.

[23]  Thomas Moyer,et al.  Trustworthy Whole-System Provenance for the Linux Kernel , 2015, USENIX Security Symposium.

[24]  Jun Wang,et al.  TaintPipe: Pipelined Symbolic Taint Analysis , 2015, USENIX Security Symposium.

[25]  Xiangyu Zhang,et al.  Accurate, Low Cost and Instrumentation-Free Security Audit Logging for Windows , 2015, ACSAC.

[26]  Naren Ramakrishnan,et al.  Unearthing Stealthy Program Attacks Buried in Extremely Long Execution Paths , 2015, CCS.

[27]  Xiangyu Zhang,et al.  Dual Execution for On the Fly Fine Grained Execution Comparison , 2015, ASPLOS.

[28]  Xiangyu Zhang,et al.  ProTracer: Towards Practical Provenance Tracing by Alternating Between Logging and Tainting , 2016, NDSS.

[29]  Zhou Li,et al.  Operational Security Log Analytics for Enterprise Breach Detection , 2016, 2016 IEEE Cybersecurity Development (SecDev).

[30]  Xiangyu Zhang,et al.  LDX: Causality Inference by Lightweight Dual Execution , 2016, ASPLOS.

[31]  Barbara G. Ryder,et al.  A Sharper Sense of Self: Probabilistic Reasoning of Program Behaviors for Anomaly Detection with Context Sensitivity , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[32]  Mike O'Leary Snort , 2019, Cyber Operations.

[33]  About Event Tracing for Windows , 2020 .