Enabling Refinable Cross-Host Attack Investigation with Efficient Data Flow Tagging and Tracking

Investigating attacks across multiple hosts is challenging. The true dependencies between security-sensitive files, network endpoints, or memory objects from different hosts can be easily concealed by dependency explosion or undefined program behavior (e.g., memory corruption). Dynamic information flow tracking (DIFT) is a potential solution to this problem, but, existing DIFT techniques only track information flow within a single host and lack an efficient mechanism to maintain and synchronize the data flow tags globally across multiple hosts. In this paper, we propose RTAG, an efficient data flow tagging and tracking mechanism that enables practical cross-host attack investigations. RTAG is based on three novel techniques. First, by using a record-and-replay technique, it decouples the dependencies between different data flow tags from the analysis, enabling lazy synchronization between independent and parallel DIFT instances of different hosts. Second, it takes advantage of systemcall-level provenance information to calculate and allocate the optimal tag map in terms of memory consumption. Third, it embeds tag information into network packets to track cross-host data flows with less than 0.05% network bandwidth overhead. Evaluation results show that RTAG is able to recover the true data flows of realistic cross-host attack scenarios. Performance wise, RTAG reduces the memory consumption of DIFT-based analysis by up to 90% and decreases the overall analysis time by 60%-90% compared with previous investigation systems.

[1]  Jon Postel,et al.  User Datagram Protocol , 1980, RFC.

[2]  Seth Copen Goldstein,et al.  Hardware-assisted replay of multiprocessor programs , 1991, PADD '91.

[3]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[4]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[5]  Bogdan M. Wilamowski,et al.  The Transmission Control Protocol , 2005, The Industrial Information Technology Handbook.

[6]  Yasushi Saito,et al.  Jockey: a user-space library for record-replay debugging , 2005, AADEBUG'05.

[7]  James Newsom,et al.  Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software, Network and Distributed System Security Symposium Conference Proceedings : 2005 , 2005 .

[8]  Margo I. Seltzer,et al.  Provenance-Aware Storage Systems , 2006, USENIX ATC, General Track.

[9]  Heng Yin,et al.  Panorama: capturing system-wide information flow for malware detection and analysis , 2007, CCS '07.

[10]  Alessandro Orso,et al.  Dytan: a generic dynamic taint analysis framework , 2007, ISSTA '07.

[11]  Olatunji Ruwase,et al.  Parallelizing dynamic information flow tracking , 2008, SPAA '08.

[12]  Tal Garfinkel,et al.  VMwareDecoupling Dynamic Program Analysis from Execution in Virtual Environments , 2008, USENIX Annual Technical Conference.

[13]  Brandon Lucia,et al.  DMP: deterministic shared memory multiprocessing , 2009, IEEE Micro.

[14]  Byung-Gon Chun,et al.  TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoring on Smartphones , 2010, OSDI.

[15]  Jason Nieh,et al.  Transparent, lightweight application execution replay on commodity multiprocessor operating systems , 2010, SIGMETRICS '10.

[16]  Xiaozhou Li,et al.  Efficient querying and maintenance of network provenance at internet-scale , 2010, SIGMOD Conference.

[17]  James Cownie,et al.  PinPlay: a framework for deterministic replay and reproducible analysis of parallel programs , 2010, CGO '10.

[18]  Angelos D. Keromytis,et al.  Taint-Exchange: A Generic System for Cross-Process and Cross-Host Taint Tracking , 2011, IWSEC.

[19]  Andreas Haeberlen,et al.  Secure network provenance , 2011, SOSP.

[20]  Ashish Gehani,et al.  SPADE: Support for Provenance Auditing in Distributed Environments , 2012, Middleware.

[21]  Nickolai Zeldovich,et al.  Recovering from intrusions in distributed systems with DARE , 2012, APSys.

[22]  Aarti Gupta,et al.  DTAM: dynamic taint analysis of multi-threaded programs for relevancy , 2012, SIGSOFT FSE.

[23]  Patrick D. McDaniel,et al.  Hi-Fi: collecting high-fidelity whole-system provenance , 2012, ACSAC '12.

[24]  Angelos D. Keromytis,et al.  libdft: practical dynamic data flow tracking for commodity systems , 2012, VEE '12.

[25]  Todd D. Millstein,et al.  RERAN: Timing- and touch-sensitive record and replay for Android , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[26]  Xiangyu Zhang,et al.  High Accuracy Attack Provenance via Binary-based Execution Partition , 2013, NDSS.

[27]  Angelos D. Keromytis,et al.  CloudFence: Data Flow Tracking as a Cloud Service , 2013, RAID.

[28]  Koushik Sen,et al.  Jalangi: a selective record-replay and dynamic analysis framework for JavaScript , 2013, ESEC/FSE 2013.

[29]  Angelos D. Keromytis,et al.  ShadowReplica: efficient parallelization of dynamic data flow tracking , 2013, CCS.

[30]  Brendan Dolan-Gavitt,et al.  Tappan Zee (north) bridge: mining memory accesses for introspection , 2013, CCS.

[31]  Fan Long,et al.  Automatic runtime error repair and containment via recovery shepherding , 2014, PLDI.

[32]  Andreas Haeberlen,et al.  Detecting Covert Timing Channels with Time-Deterministic Replay , 2014, OSDI.

[33]  Michael Chow,et al.  Eidetic Systems , 2014, OSDI.

[34]  B. T. Loo,et al.  Diagnosing missing events in distributed systems with negative provenance , 2014, SIGCOMM.

[35]  Ping Chen,et al.  A Study on Advanced Persistent Threats , 2014, Communications and Multimedia Security.

[36]  Paul T. Groth,et al.  Looking Inside the Black-Box: Capturing Data Provenance Using Dynamic Instrumentation , 2014, IPAW.

[37]  Andreas Haeberlen,et al.  Let SDN Be Your Eyes: Secure Forensics in Data Center Networks , 2014 .

[38]  Brendan Dolan-Gavitt,et al.  Repeatable Reverse Engineering with PANDA , 2015, PPREW@ACSAC.

[39]  Thomas Moyer,et al.  Trustworthy Whole-System Provenance for the Linux Kernel , 2015, USENIX Security Symposium.

[40]  Jun Wang,et al.  TaintPipe: Pipelined Symbolic Taint Analysis , 2015, USENIX Security Symposium.

[41]  Mike Hibler,et al.  Abstractions for Practical Virtual Machine Replay , 2016, VEE.

[42]  Wenke Lee,et al.  RecProv: Towards Provenance-Aware User Space Record and Replay , 2016, IPAW.

[43]  John C. S. Lui,et al.  TaintART: A Practical Multi-level Information-Flow Tracking System for Android RunTime , 2016, CCS.

[44]  Jun Wang,et al.  StraightTaint: Decoupled offline symbolic taint analysis , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[45]  Xiangyu Zhang,et al.  ProTracer: Towards Practical Provenance Tracing by Alternating Between Logging and Tainting , 2016, NDSS.

[46]  Jiang Ming Pipelined Symbolic Taint Analysis , 2016 .

[47]  Xiangyu Zhang,et al.  LDX: Causality Inference by Lightweight Dual Execution , 2016, ASPLOS.

[48]  Christian T. Jacobs,et al.  Git-RDM: A research data management plugin for the Git version control system , 2016, J. Open Source Softw..

[49]  Jason Flinn,et al.  JetStream: Cluster-Scale Parallelization of Information Flow Queries , 2016, OSDI.

[50]  Zhen Xiao,et al.  Samsara: Efficient Deterministic Replay in Multiprocessor Environments with Hardware Virtualization Extensions , 2016, USENIX Annual Technical Conference.

[51]  Josep Torrellas,et al.  ReplayConfusion: Detecting cache-based covert channel attacks using record and replay , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[52]  Cong Li,et al.  Kernel-based Virtual Machine , 2017 .

[53]  Alessandro Orso,et al.  RAIN: Refinable Attack Investigation with On-demand Inter-Process Information Flow Tracking , 2017, CCS.

[54]  Qi Wang,et al.  Fear and Logging in the Internet of Things , 2018, NDSS.

[55]  Somesh Jha,et al.  MCI : Modeling-based Causality Inference in Audit Logging for Attack Investigation , 2018, NDSS.