THREATRACE: Detecting and Tracing Host-Based Threats in Node Level Through Provenance Graph Learning

Host-based threats such as Program Attack, Malware Implantation, and Advanced Persistent Threats (APT), are commonly adopted by modern attackers. Recent studies propose leveraging the rich contextual information in data provenance to detect threats in a host. Data provenance is a directed acyclic graph constructed from system audit data. Nodes in a provenance graph represent system entities (e.g., processes and files) and edges represent system calls in the direction of information flow. However, previous studies, which extract features of the whole provenance graph, are not sensitive to the small number of threatrelated entities and thus result in low performance when hunting stealthy threats. We present THREATRACE, an anomaly-based detector that detects host-based threats at system entity level without prior knowledge of attack patterns. We tailor GraphSAGE, an inductive graph neural network, to learn every benign entity’s role in a provenance graph. THREATRACE is a real-time system, which is scalable of monitoring a long-term running host and capable of detecting host-based intrusion in their early phase. We evaluate THREATRACE on three public datasets. The results show that THREATRACE outperforms three state-of-the-art host intrusion detection systems.

[1]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[2]  Wenwu Zhu,et al.  Power up! Robust Graph Convolutional Network against Evasion Attacks based on Graph Powering , 2019, ArXiv.

[3]  Vinod Yegneswaran,et al.  Mining Data Provenance to Detect Advanced Persistent Threats , 2019, TaPP.

[4]  James Cheney,et al.  Aggregating unsupervised provenance anomaly detectors , 2019, TaPP.

[5]  H. Howie Huang,et al.  Detecting Lateral Movement in Enterprise Computer Networks with Unsupervised Graph AI , 2020, RAID.

[6]  V. N. Venkatakrishnan,et al.  HOLMES: Real-Time APT Detection through Correlation of Suspicious Information Flows , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[7]  V. N. Venkatakrishnan,et al.  POIROT: Aligning Attack Behavior with Kernel Audit Records for Cyber Threat Hunting , 2019, CCS.

[8]  Prasenjit Mitra,et al.  Transferring Robustness for Graph Neural Network Against Poisoning Attacks , 2019, WSDM.

[9]  Changsheng Xu,et al.  I Know the Relationships: Zero-Shot Action Recognition via Two-Stream Graph Convolutional Networks and Knowledge Graphs , 2019, AAAI.

[10]  Zitong Li,et al.  A Hierarchical Approach for Advanced Persistent Threat Detection with Attention-Based Graph Neural Networks , 2021, Secur. Commun. Networks.

[11]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[12]  Yuan Luo,et al.  Graph Convolutional Networks for Text Classification , 2018, AAAI.

[13]  Leman Akoglu,et al.  Fast Memory-efficient Anomaly Detection in Streaming Heterogeneous Graphs , 2016, KDD.

[14]  Shouling Ji,et al.  Graph Backdoor , 2020, USENIX Security Symposium.

[15]  Xiao Yu,et al.  You Are What You Do: Hunting Stealthy Malware via Data Provenance Analysis , 2020, NDSS.

[16]  R. Sekar,et al.  A fast automaton-based method for detecting anomalous program behaviors , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[17]  Yan Li,et al.  Pagoda: A Hybrid Approach to Enable Efficient Real-Time Provenance Based Intrusion Detection in Big Data Environments , 2020, IEEE Transactions on Dependable and Secure Computing.

[18]  Yujia Li,et al.  Learning the Graphical Structure of Electronic Health Records with Graph Convolutional Transformer , 2020, AAAI.

[19]  Timothy Baldwin,et al.  Semi-supervised User Geolocation via Graph Convolutional Networks , 2018, ACL.

[20]  Abdulellah A. Alsaheel,et al.  ATLAS: A Sequence-based Learning Approach for Attack Investigation , 2021, USENIX Security Symposium.

[21]  Naren Ramakrishnan,et al.  Unearthing Stealthy Program Attacks Buried in Extremely Long Execution Paths , 2015, CCS.

[22]  Mu Zhang,et al.  Towards a Timely Causality Analysis for Enterprise Security , 2018, NDSS.

[23]  Yu Wen,et al.  Log2vec: A Heterogeneous Graph Embedding Based Approach for Detecting Cyber Threats within Enterprise , 2019, CCS.

[24]  Md Nahid Hossain,et al.  Combating Dependence Explosion in Forensic Analysis Using Alternative Tag Propagation Semantics , 2020, 2020 IEEE Symposium on Security and Privacy (SP).

[25]  Le Song,et al.  Retrosynthesis Prediction with Conditional Graph Logic Network , 2020, NeurIPS.

[26]  V. N. Venkatakrishnan,et al.  SLEUTH: Real-time Attack Scenario Reconstruction from COTS Audit Data , 2018, USENIX Security Symposium.

[27]  Vern Paxson,et al.  Outside the Closed World: On Using Machine Learning for Network Intrusion Detection , 2010, 2010 IEEE Symposium on Security and Privacy.

[28]  Binghui Wang,et al.  Attacking Graph-based Classification via Manipulating the Graph Structure , 2019, CCS.

[29]  Stephan Günnemann,et al.  Adversarial Attacks on Neural Networks for Graph Data , 2018, KDD.

[30]  David M. Eyers,et al.  Practical whole-system provenance capture , 2017, SoCC.

[31]  Philip S. Yu,et al.  Adversarial Attack and Defense on Graph Data: A Survey , 2018 .

[32]  Ding Li,et al.  NoDoze: Combatting Threat Alert Fatigue with Automated Provenance Triage , 2019, NDSS.

[33]  Wenwu Zhu,et al.  Robust Graph Convolutional Networks Against Adversarial Attacks , 2019, KDD.

[34]  Hui Cheng,et al.  Deep Reasoning with Knowledge Graph for Social Relationship Understanding , 2018, IJCAI.

[35]  Daniel Marino,et al.  Tactical Provenance Analysis for Endpoint Detection and Response Systems , 2020, 2020 IEEE Symposium on Security and Privacy (SP).

[36]  Margo Seltzer,et al.  UNICORN: Runtime Provenance-Based Detector for Advanced Persistent Threats , 2020, NDSS.