ALchemist: Fusing Application and Audit Logs for Precise Attack Provenance without Instrumentation

Cyber-attacks are becoming more persistent and complex. Most state-of-the-art attack forensics techniques either require annotating and instrumenting software applications or rely on high quality execution profiling to serve as the basis for anomaly detection. We propose a novel attack forensics technique ALchemist. It is based on the observations that builtin application logs provide critical high-level semantics and audit logs provide low-level fine-grained information; and the two share a lot of common elements. ALchemist is hence a log fusion technique that couples application logs and audit logs to derive critical attack information invisible in either log. It is based on a relational reasoning engine Datalog and features the capabilities of inferring new relations such as the task structure of execution (e.g., tabs in firefox), especially in the presence of complex asynchronous execution models, and high-level dependencies between log events. Our evaluation on 15 popular applications including firefox, Chromium, and OpenOffice, and 14 APT attacks from the literature demonstrates that although ALchemist does not require instrumentation, it is highly effective in partitioning execution to autonomous tasks (in order to avoid bogus dependencies) and deriving precise attack provenance graphs, with very small overhead. It also outperforms NoDoze and OmegaLog, two stateof-the-art techniques that do not require instrumentation.

[1]  Bernhard Scholz,et al.  Soufflé: On Synthesis of Program Analyzers , 2016, CAV.

[2]  Paul T. Groth,et al.  PROV2R: Practical Provenance Analysis of Unstructured Processes , 2017, ACM Trans. Internet Techn..

[3]  V. N. Venkatakrishnan,et al.  HOLMES: Real-Time APT Detection through Correlation of Suspicious Information Flows , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[4]  Fengyuan Xu,et al.  High Fidelity Data Reduction for Big Data Security Dependency Analyses , 2016, CCS.

[5]  Feifei Li,et al.  DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning , 2017, CCS.

[6]  Jennifer Neville,et al.  Structured Comparative Analysis of Systems Logs to Diagnose Performance Problems , 2012, NSDI.

[7]  Margo I. Seltzer,et al.  Provenance-Aware Storage Systems , 2006, USENIX ATC, General Track.

[8]  Andreas Haeberlen,et al.  Data Provenance at Internet Scale: Architecture, Experiences, and the Road Ahead , 2017, CIDR.

[9]  Samuel T. King,et al.  Backtracking intrusions , 2003, SOSP '03.

[10]  V. N. Venkatakrishnan,et al.  SLEUTH: Real-time Attack Scenario Reconstruction from COTS Audit Data , 2018, USENIX Security Symposium.

[11]  UISCOPE: Accurate, Instrumentation-free, Deterministic and Visible Attack Investigation , 2019 .

[12]  Gene Tsudik,et al.  Forward-Secure Sequential Aggregate Authentication , 2007, IACR Cryptol. ePrint Arch..

[13]  Bo Li Enabling fine-grained reconstruction and analysis of web attacks with in-browser recording systems , 2017 .

[14]  Angelos D. Keromytis,et al.  libdft: practical dynamic data flow tracking for commodity systems , 2012, VEE '12.

[15]  Aditya G. Parameswaran,et al.  Navigating the Data Lake with DATAMARAN: Automatically Extracting Structure from Log Datasets , 2017, SIGMOD Conference.

[16]  Yulai Xie,et al.  A hybrid approach for efficient provenance storage , 2012, CIKM '12.

[17]  Alessandro Orso,et al.  Enabling Refinable Cross-Host Attack Investigation with Efficient Data Flow Tagging and Tracking , 2018, USENIX Security Symposium.

[18]  Eyal de Lara,et al.  The taser intrusion recovery system , 2005, SOSP '05.

[19]  Di Ma,et al.  Practical forward secure sequential aggregate signatures , 2008, ASIACCS '08.

[20]  Xiangyu Zhang,et al.  LogGC: garbage collecting audit log , 2013, CCS.

[21]  Zhou Li,et al.  Detection of Early-Stage Enterprise Infection by Mining Large-Scale Log Data , 2014, 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[22]  Peng Ning,et al.  BAF: An Efficient Publicly Verifiable Secure Audit Logging Scheme for Distributed Systems , 2009, 2009 Annual Computer Security Applications Conference.

[23]  Naren Ramakrishnan,et al.  Detection of stealthy malware activities with traffic causality and scalable triggering relation discovery , 2014, AsiaCCS.

[24]  Xiangyu Zhang,et al.  High Accuracy Attack Provenance via Binary-based Execution Partition , 2013, NDSS.

[25]  Ahmed Amer,et al.  Compressing Provenance Graphs , 2011, TaPP.

[26]  Yu Zhang,et al.  Log Clustering Based Problem Identification for Online Service Systems , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).

[27]  Peng Ning,et al.  Efficient, Compromise Resilient and Append-Only Cryptographic Schemes for Secure Audit Logging , 2012, Financial Cryptography.

[28]  Michael I. Jordan,et al.  Detecting large-scale system problems by mining console logs , 2009, SOSP '09.

[29]  Xiangyu Zhang,et al.  Accurate, Low Cost and Instrumentation-Free Security Audit Logging for Windows , 2015, ACSAC.

[30]  Christian S. Collberg,et al.  Provenance of exposure: Identifying sources of leaked documents , 2013, 2013 IEEE Conference on Communications and Network Security (CNS).

[31]  Marianne Winslett,et al.  The Case of the Fake Picasso: Preventing History Forgery with Secure Provenance , 2009, FAST.

[32]  Peng Gao,et al.  SAQL: A Stream-based Query System for Real-Time Abnormal System Behavior Detection , 2018, USENIX Security Symposium.

[33]  Tzi-cker Chiueh,et al.  Design, implementation, and evaluation of repairable file service , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[34]  Sven Bugiel,et al.  Scippa: system-centric IPC provenance on Android , 2014, ACSAC.

[35]  Mu Zhang,et al.  Towards a Timely Causality Analysis for Enterprise Security , 2018, NDSS.

[36]  Yu Wen,et al.  Log2vec: A Heterogeneous Graph Embedding Based Approach for Detecting Cyber Threats within Enterprise , 2019, CCS.

[37]  Moritz Kaufmann Datalog , 2021, Complexity of Infinite-Domain Constraint Satisfaction.

[38]  Nick Feamster,et al.  Packets with Provenance , 2008 .

[39]  William K. Robertson,et al.  Beehive: large-scale log analysis for detecting suspicious activity in enterprise networks , 2013, ACSAC.

[40]  Murat Kantarcioglu,et al.  SmartProvenance: A Distributed, Blockchain Based DataProvenance System , 2018, CODASPY.

[41]  Shilin He,et al.  Towards Automated Log Parsing for Large-Scale Log Data Analysis , 2018, IEEE Transactions on Dependable and Secure Computing.

[42]  Tudor Dumitras,et al.  Experimental Challenges in Cyber Security: A Story of Provenance and Lineage for Malware , 2011, CSET.

[43]  Samuel T. King,et al.  Enriching Intrusion Alerts Through Multi-Host Causality , 2005, NDSS.

[44]  Fei Wang,et al.  MPI: Multiple Perspective Attack Investigation with Semantic Aware Execution Partitioning , 2017, USENIX Security Symposium.

[45]  Peng Gao,et al.  AIQL: Enabling Efficient Attack Investigation from System Monitoring Data , 2018, USENIX Annual Technical Conference.

[46]  Zibin Zheng,et al.  Tools and Benchmarks for Automated Log Parsing , 2018, 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).

[47]  Naren Ramakrishnan,et al.  Causality reasoning about network events for detecting stealthy malware activities , 2016, Comput. Secur..

[48]  Devarshi Ghoshal,et al.  Provenance from log files: a BigData problem , 2013, EDBT '13.

[49]  Milan Petkovic,et al.  Towards a neural language model for signature extraction from forensic logs , 2017, 2017 5th International Symposium on Digital Forensic and Security (ISDFS).

[50]  Patrick D. McDaniel,et al.  Hi-Fi: collecting high-fidelity whole-system provenance , 2012, ACSAC '12.

[51]  Sanjay Jha,et al.  Securing First-Hop Data Provenance for Bodyworn Devices Using Wireless Link Fingerprints , 2014, IEEE Transactions on Information Forensics and Security.

[52]  Chao Yang,et al.  Using Provenance Patterns to Vet Sensitive Behaviors in Android Apps , 2015, SecureComm.

[53]  Thomas Moyer,et al.  Towards Scalable Cluster Auditing through Grammatical Inference over Provenance Graphs , 2018, NDSS.

[54]  David M. Eyers,et al.  Runtime Analysis of Whole-System Provenance , 2018, CCS.

[55]  Wenke Lee,et al.  RecProv: Towards Provenance-Aware User Space Record and Replay , 2016, IPAW.

[56]  Kesheng Wu,et al.  Efficiently Extracting Operational Profiles from Execution Logs Using Suffix Arrays , 2009, 2009 20th International Symposium on Software Reliability Engineering.

[57]  R. Sunitha,et al.  DATA-PROVENANCE VERIFICATION FOR SECURE HOSTS , 2013 .

[58]  Somesh Jha,et al.  MCI : Modeling-based Causality Inference in Audit Logging for Attack Investigation , 2018, NDSS.

[59]  Thomas Moyer,et al.  Trustworthy Whole-System Provenance for the Linux Kernel , 2015, USENIX Security Symposium.

[60]  Mohammad A. Noureddine,et al.  OmegaLog: High-Fidelity Attack Investigation via Transparent Multi-layer Log Analysis , 2020, NDSS.

[61]  Xiangyu Zhang,et al.  ProTracer: Towards Practical Provenance Tracing by Alternating Between Logging and Tainting , 2016, NDSS.

[62]  Reihaneh Safavi-Naini,et al.  A Secure Event Logging System for Smart Homes , 2017, IoT S&P@CCS.

[63]  Haoxiang Lin,et al.  G2: A Graph Processing System for Diagnosing Distributed Systems , 2011, USENIX Annual Technical Conference.

[64]  Patrick D. McDaniel Data Provenance and Security , 2011, IEEE Security & Privacy.

[65]  Ding Li,et al.  NoDoze: Combatting Threat Alert Fatigue with Automated Provenance Triage , 2019, NDSS.

[66]  Xi Wang,et al.  Intrusion Recovery Using Selective Re-execution , 2010, OSDI.