Dependence-Preserving Data Compaction for Scalable Forensic Analysis

Large organizations are increasingly targeted in long-running attack campaigns lasting months or years. When a break-in is eventually discovered, forensic analysis begins. System audit logs provide crucial information that underpins such analysis. Unfortunately, audit data collected over months or years can grow to enormous sizes. Large data size is not only a storage concern: forensic analysis tasks can become very slow when they must sift through billions of records. In this paper, we first present two powerful event reduction techniques that reduce the number of records by a factor of 4.6 to 19 in our experiments. An important benefit of our techniques is that they provably preserve the accuracy of forensic analysis tasks such as backtracking and impact analysis. While providing this guarantee, our techniques reduce on-disk file sizes by an average of 35× across our data sets. On average, our in-memory dependence graph uses just 5 bytes per event in the original data. Our system is able to consume and analyze nearly a million events per second.

[1]  V. N. Venkatakrishnan,et al.  Empowering mobile code using expressive security policies , 2002, NSPW '02.

[2]  Craig A. N. Soules,et al.  Metadata Efficiency in a Comprehensive Versioning File System (CMU-CS-02-145) , 2002 .

[3]  Tzi-cker Chiueh,et al.  Design, implementation, and evaluation of repairable file service , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[4]  James Newsome,et al.  Dynamic Taint Analysis for Automatic Detection, Analysis, and SignatureGeneration of Exploits on Commodity Software , 2005, NDSS.

[5]  Angelos D. Keromytis,et al.  libdft: practical dynamic data flow tracking for commodity systems , 2012, VEE '12.

[6]  Alessandro Orso,et al.  RAIN: Refinable Attack Investigation with On-demand Inter-Process Information Flow Tracking , 2017, CCS.

[7]  Yulai Xie,et al.  A hybrid approach for efficient provenance storage , 2012, CIKM '12.

[8]  Daniel C. DuVarney,et al.  Model-carrying code: a practical approach for safe execution of untrusted applications , 2003, SOSP '03.

[9]  Xiangyu Zhang,et al.  High Accuracy Attack Provenance via Binary-based Execution Partition , 2013, NDSS.

[10]  R. Sekar,et al.  A portable user-level approach for system-wide integrity protection , 2013, ACSAC.

[11]  Fei Wang,et al.  HERCULE: attack story reconstruction via community discovery on correlated log graph , 2016, ACSAC.

[12]  Margo I. Seltzer,et al.  Provenance-Aware Storage Systems , 2006, USENIX ATC, General Track.

[13]  V. N. Venkatakrishnan,et al.  SLEUTH: Real-time Attack Scenario Reconstruction from COTS Audit Data , 2018, USENIX Security Symposium.

[14]  Thomas Moyer,et al.  Trustworthy Whole-System Provenance for the Linux Kernel , 2015, USENIX Security Symposium.

[15]  David A. Bader,et al.  A performance evaluation of open source graph databases , 2014, PPAA '14.

[16]  Jacques Klein,et al.  FlowDroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android apps , 2014, PLDI.

[17]  Ashish Gehani,et al.  SPADE: Support for Provenance Auditing in Distributed Environments , 2012, Middleware.

[18]  R. Sekar,et al.  Practical Dynamic Taint Analysis for Countering Input Validation Attacks on Web Applications , 2005 .

[19]  Nisheeth Shrivastava,et al.  Graph summarization with bounded error , 2008, SIGMOD Conference.

[20]  Xiangyu Zhang,et al.  LogGC: garbage collecting audit log , 2013, CCS.

[21]  Chen Chen,et al.  Distributed Provenance Compression , 2017, SIGMOD Conference.

[22]  Mu Zhang,et al.  Towards a Timely Causality Analysis for Enterprise Security , 2018, NDSS.

[23]  Weiqing Sun,et al.  Practical Proactive Integrity Preservation: A Basis for Malware Defense , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[24]  Norman C. Hutchinson,et al.  Deciding when to forget in the Elephant file system , 1999, SOSP.

[25]  R. Sekar,et al.  Provenance-based Integrity Protection for Windows , 2015, ACSAC.

[26]  Zhenkai Liang,et al.  Alcatraz: An Isolated Environment for Experimenting with Untrusted Software , 2009, TSEC.

[27]  Fengyuan Xu,et al.  High Fidelity Data Reduction for Big Data Security Dependency Analyses , 2016, CCS.

[28]  Shazia Wasim Sadiq,et al.  Efficient provenance storage for relational queries , 2012, CIKM '12.

[29]  Samuel T. King,et al.  Enriching Intrusion Alerts Through Multi-Host Causality , 2005, NDSS.

[30]  Adriane Chapman,et al.  Efficient provenance storage , 2008, SIGMOD Conference.

[31]  Jignesh M. Patel,et al.  Efficient aggregation for graph summarization , 2008, SIGMOD Conference.

[32]  Sushil Jajodia,et al.  Recovery from Malicious Transactions , 2002, IEEE Trans. Knowl. Data Eng..

[33]  Thomas Moyer,et al.  Towards Scalable Cluster Auditing through Grammatical Inference over Provenance Graphs , 2018, NDSS.

[34]  Fei Wang,et al.  MPI: Multiple Perspective Attack Investigation with Semantic Aware Execution Partitioning , 2017, USENIX Security Symposium.

[35]  Erez Zadok,et al.  A Versatile and User-Oriented Versioning File System , 2004, FAST.

[36]  Fabian Monrose,et al.  Trail of bytes: efficient support for forensic analysis , 2010, CCS '10.

[37]  David M. Eyers,et al.  Practical whole-system provenance capture , 2017, SoCC.

[38]  Eyal de Lara,et al.  The taser intrusion recovery system , 2005, SOSP '05.

[39]  Xiangyu Zhang,et al.  ProTracer: Towards Practical Provenance Tracing by Alternating Between Logging and Tainting , 2016, NDSS.

[40]  Trent Jaeger,et al.  Taming the Costs of Trustworthy Provenance through Policy Reduction , 2017, ACM Trans. Internet Techn..