TRACE: Enterprise-Wide Provenance Tracking for Real-Time APT Detection

We present TRACE, a comprehensive provenance tracking system for scalable, real-time, enterprise-wide APT detection. TRACE uses static analysis to identify program unit structures and inter-unit dependences, such that the provenance of an output event includes the input events within the same unit. Provenance collected from individual hosts are integrated to facilitate construction of a distributed enterprise-wide causal graph. We describe the evolution of TRACE over a four-year period, during which our improvements to the system focused on performance, scalability, and fidelity. In this time span, the system call coverage increased (from 47 to 66) while the time and space overhead reduced by over one and two orders of magnitude, respectively. We also provide results from five adversarial engagements where an independent team of system evaluators conducted APT attacks and assessed system performance. The input from our system was used by three other teams to implement real-time APT detection logic. Retrospective analysis revealed that TRACE provided sufficient evidence to detect over 80% of the attack stages across all evaluations. By the last engagement, temporal and spatial overhead had been reduced significantly to 18% and 10%, respectively.

[1]  Alessandro Orso,et al.  Dytan: a generic dynamic taint analysis framework , 2007, ISSTA '07.

[2]  Margo Seltzer,et al.  UNICORN: Runtime Provenance-Based Detector for Advanced Persistent Threats , 2020, NDSS.

[3]  Fei Wang,et al.  MPI: Multiple Perspective Attack Investigation with Semantic Aware Execution Partitioning , 2017, USENIX Security Symposium.

[4]  Peng Gao,et al.  AIQL: Enabling Efficient Attack Investigation from System Monitoring Data , 2018, USENIX Annual Technical Conference.

[5]  Ashish Gehani,et al.  Policy-Based Integration of Provenance Metadata , 2011, 2011 IEEE International Symposium on Policies for Distributed Systems and Networks.

[6]  Xiao Yu,et al.  You Are What You Do: Hunting Stealthy Malware via Data Provenance Analysis , 2020, NDSS.

[7]  Paul T. Groth,et al.  The requirements of recording and using provenance in e- Science experiments , 2005 .

[8]  Mu Zhang,et al.  Towards a Timely Causality Analysis for Enterprise Security , 2018, NDSS.

[9]  Paul T. Groth,et al.  Recording and using provenance in a protein compressibility experiment , 2005, HPDC-14. Proceedings. 14th IEEE International Symposium on High Performance Distributed Computing, 2005..

[10]  V. N. Venkatakrishnan,et al.  SLEUTH: Real-time Attack Scenario Reconstruction from COTS Audit Data , 2018, USENIX Security Symposium.

[11]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[12]  Stephen McCamant,et al.  Quantitative information flow as network flow capacity , 2008, PLDI '08.

[13]  Ashish Gehani,et al.  SPADE: Support for Provenance Auditing in Distributed Environments , 2012, Middleware.

[14]  V. N. Venkatakrishnan,et al.  HOLMES: Real-Time APT Detection through Correlation of Suspicious Information Flows , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[15]  Subbarayan Venkatesan,et al.  Forensic analysis of file system intrusions using improved backtracking , 2005, Third IEEE International Workshop on Information Assurance (IWIA'05).

[16]  Eyal de Lara,et al.  The taser intrusion recovery system , 2005, SOSP '05.

[17]  Xiangyu Zhang,et al.  LogGC: garbage collecting audit log , 2013, CCS.

[18]  Xiaozhou Li,et al.  Efficient querying and maintenance of network provenance at internet-scale , 2010, SIGMOD Conference.

[19]  Andreas Haeberlen,et al.  Secure network provenance , 2011, SOSP.

[20]  Brett Benyo,et al.  An Event-based Data Model for Granular Information Flow Tracking , 2020 .

[21]  Yogesh L. Simmhan,et al.  The Open Provenance Model core specification (v1.1) , 2011, Future Gener. Comput. Syst..

[22]  V. N. Venkatakrishnan,et al.  POIROT: Aligning Attack Behavior with Kernel Audit Records for Cyber Threat Hunting , 2019, CCS.

[23]  Xiang Zhang,et al.  Tracing Lineage Beyond Relational Operators , 2007, VLDB.

[24]  Patrick D. McDaniel,et al.  Hi-Fi: collecting high-fidelity whole-system provenance , 2012, ACSAC '12.

[25]  Angelos D. Keromytis,et al.  libdft: practical dynamic data flow tracking for commodity systems , 2012, VEE '12.

[26]  David Leon,et al.  Detecting and debugging insecure information flows , 2004, 15th International Symposium on Software Reliability Engineering.

[27]  Daniel Marino,et al.  Tactical Provenance Analysis for Endpoint Detection and Response Systems , 2020, 2020 IEEE Symposium on Security and Privacy (SP).

[28]  Andreas Haeberlen,et al.  Diagnosing missing events in distributed systems with negative provenance , 2014, SIGCOMM.

[29]  Stephen McCamant,et al.  DTA++: Dynamic Taint Analysis with Targeted Control-Flow Propagation , 2011, NDSS.

[30]  Xiangyu Zhang,et al.  Strict control dependence and its effect on dynamic information flow analyses , 2010, ISSTA '10.

[31]  Somesh Jha,et al.  MCI : Modeling-based Causality Inference in Audit Logging for Attack Investigation , 2018, NDSS.

[32]  Lei Xu,et al.  Towards Fine-grained Network Security Forensics and Diagnosis in the SDN Era , 2018, CCS.

[33]  Thomas Moyer,et al.  Trustworthy Whole-System Provenance for the Linux Kernel , 2015, USENIX Security Symposium.

[34]  David M. Eyers,et al.  Runtime Analysis of Whole-System Provenance , 2018, CCS.

[35]  Zhenkai Liang,et al.  BitBlaze: A New Approach to Computer Security via Binary Analysis , 2008, ICISS.

[36]  Xiangyu Zhang,et al.  High Accuracy Attack Provenance via Binary-based Execution Partition , 2013, NDSS.

[37]  Samuel T. King,et al.  Backtracking intrusions , 2003, SOSP '03.

[38]  Brett Benyo,et al.  Scalable Transparency Architecture for Research Collaboration (STARC)-DARPA Transparent Computing (TC) Program , 2020 .

[39]  Xuxian Jiang,et al.  Provenance-Aware Tracing ofWorm Break-in and Contaminations: A Process Coloring Approach , 2006, 26th IEEE International Conference on Distributed Computing Systems (ICDCS'06).

[40]  David M. Eyers,et al.  Practical whole-system provenance capture , 2017, SoCC.

[41]  Cheng Wang,et al.  LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[42]  Somesh Jha,et al.  Kernel-Supported Cost-Effective Audit Logging for Causality Tracking , 2018, USENIX Annual Technical Conference.

[43]  Herbert Bos,et al.  Minemu: The World's Fastest Taint Tracker , 2011, RAID.

[44]  Michael Laurenzano,et al.  PEBIL: Efficient static binary instrumentation for Linux , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[45]  Mohammad A. Noureddine,et al.  OmegaLog: High-Fidelity Attack Investigation via Transparent Multi-layer Log Analysis , 2020, NDSS.

[46]  Margo I. Seltzer,et al.  SIGL: Securing Software Installations Through Deep Graph Learning , 2020, USENIX Security Symposium.