论文信息 - UNICORN: Runtime Provenance-Based Detector for Advanced Persistent Threats

UNICORN: Runtime Provenance-Based Detector for Advanced Persistent Threats

Advanced Persistent Threats (APTs) are difficult to detect due to their "low-and-slow" attack patterns and frequent use of zero-day exploits. We present UNICORN, an anomaly-based APT detector that effectively leverages data provenance analysis. From modeling to detection, UNICORN tailors its design specifically for the unique characteristics of APTs. Through extensive yet time-efficient graph analysis, UNICORN explores provenance graphs that provide rich contextual and historical information to identify stealthy anomalous activities without pre-defined attack signatures. Using a graph sketching technique, it summarizes long-running system execution with space efficiency to combat slow-acting attacks that take place over a long time span. UNICORN further improves its detection capability using a novel modeling approach to understand long-term behavior as the system evolves. Our evaluation shows that UNICORN outperforms an existing state-of-the-art APT detection system and detects real-life APT scenarios with high accuracy.

[1] V. N. Venkatakrishnan,et al. POIROT: Aligning Attack Behavior with Kernel Audit Records for Cyber Threat Hunting , 2019, CCS.

[2] Philip S. Yu,et al. GraphScope: parameter-free mining of large time-evolving graphs , 2007, KDD '07.

[3] Luc Dandurand,et al. Big Data Analytics for Sophisticated Attack Detection , 2014 .

[4] Fareed Zaffar,et al. Identifying the provenance of correlated anomalies , 2011, SAC '11.

[5] James Cheney,et al. Aggregating unsupervised provenance anomaly detectors , 2019, TaPP.

[6] Ashish Gehani,et al. SPADE: Support for Provenance Auditing in Distributed Environments , 2012, Middleware.

[7] R. Sekar,et al. A practical mimicry attack against powerful system-call monitors , 2008, ASIACCS '08.

[8] Edward Raff,et al. Malware Classification and Class Imbalance via Stochastic Hashed LZJD , 2017, AISec@CCS.

[9] John McHugh,et al. Testing Intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory , 2000, TSEC.

[10] Andrew McCallum,et al. Distributional clustering of words for text classification , 1998, SIGIR '98.

[11] Patrick Haffner,et al. Support vector machines for histogram-based image classification , 1999, IEEE Trans. Neural Networks.

[12] George Rajna,et al. Equifax Data Breach , 2018 .

[13] Emmanuel Müller,et al. Focused clustering and outlier detection in large attributed graphs , 2014, KDD.

[14] Trent Jaeger,et al. Consistency analysis of authorization hook placement in the Linux security modules framework , 2004, TSEC.

[15] Robert N. M. Watson,et al. Exploiting Concurrency Vulnerabilities in System Call Wrappers , 2007, WOOT.

[16] Moses Charikar,et al. Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[17] Jure Leskovec,et al. Inductive Representation Learning on Large Graphs , 2017, NIPS.

[18] Ding Li,et al. NoDoze: Combatting Threat Alert Fatigue with Automated Provenance Triage , 2019, NDSS.

[19] Ralf Klinkenberg,et al. Learning drifting concepts: Example selection vs. example weighting , 2004, Intell. Data Anal..

[20] T. Neumann. Computers And Intractability A Guide To The Theory Of Np Completeness , 2016 .

[21] Jiankun Hu,et al. Generating realistic intrusion detection system dataset based on fuzzy qualitative modeling , 2017, J. Netw. Comput. Appl..

[22] Stephanie Forrest,et al. Automated Response Using System-Call Delay , 2000, USENIX Security Symposium.

[23] David M. Eyers,et al. Runtime Analysis of Whole-System Provenance , 2018, CCS.

[24] Charu C. Aggarwal,et al. Evolutionary Clustering and Analysis of Bibliographic Networks , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[25] Kurt Mehlhorn,et al. Weisfeiler-Lehman Graph Kernels , 2011, J. Mach. Learn. Res..

[26] Yizhou Sun,et al. On community outliers and their efficient detection in information networks , 2010, KDD.

[27] Piotr Indyk,et al. Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[28] Qi Tian,et al. Min-Max Hash for Jaccard Similarity , 2013, 2013 IEEE 13th International Conference on Data Mining.

[29] David M. Eyers,et al. Practical whole-system provenance capture , 2017, SoCC.

[30] James P Anderson,et al. Computer Security Technology Planning Study , 1972 .

[31] Jiankun Hu,et al. Generation of a new IDS test dataset: Time to retire the KDD collection , 2013, 2013 IEEE Wireless Communications and Networking Conference (WCNC).

[32] Jaehong Park,et al. A provenance-based access control model , 2012, 2012 Tenth Annual International Conference on Privacy, Security and Trust.

[33] Jinjun Chen,et al. Host-Based Intrusion Detection System with System Calls , 2018, ACM Comput. Surv..

[34] Weibo Gong,et al. Anomaly detection using call stack information , 2003, 2003 Symposium on Security and Privacy, 2003..

[35] Ali Abbasi,et al. A gray-box DPDA-based intrusion detection technique using system-call monitoring , 2011, CEAS '11.

[36] Danai Koutra,et al. Graph based anomaly detection and description: a survey , 2014, Data Mining and Knowledge Discovery.

[37] Bin Li,et al. HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms with Concept Drift , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[38] Qing Wu,et al. AnRAD: A Neuromorphic Anomaly Detection Framework for Massive Concurrent Data Streams , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[39] Philip S. Yu,et al. On Classification of High-Cardinality Data Streams , 2010, SDM.

[40] Horst Bunke,et al. Self-organizing maps for learning the edit costs in graph matching , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[41] Andreas Christmann,et al. Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[42] Heng Tao Shen,et al. Hashing for Similarity Search: A Survey , 2014, ArXiv.

[43] Hector Garcia-Molina,et al. Web graph similarity for anomaly detection , 2010, Journal of Internet Services and Applications.

[44] Yang Liu,et al. graph2vec: Learning Distributed Representations of Graphs , 2017, ArXiv.

[45] Xuelong Li,et al. A survey of graph edit distance , 2010, Pattern Analysis and Applications.

[46] Erik van der Kouwe,et al. Benchmarking Crimes: An Emerging Threat in Systems Security , 2018, ArXiv.

[47] Arvind Mallari Rao,et al. Technical Aspects of Cyber Kill Chain , 2015, SSCC.

[48] Karsten M. Borgwardt,et al. The graphlet spectrum , 2009, ICML '09.

[49] Jure Leskovec,et al. How Powerful are Graph Neural Networks? , 2018, ICLR.

[50] P. Rousseeuw. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[51] Margo I. Seltzer,et al. A primer on provenance , 2014, CACM.

[52] Dijiang Huang,et al. A Survey on Advanced Persistent Threats: Techniques, Solutions, Challenges, and Research Opportunities , 2019, IEEE Communications Surveys & Tutorials.

[53] Chong-Wah Ngo,et al. Evaluating bag-of-visual-words representations in scene classification , 2007, MIR '07.

[54] Abdelwahab Hamou-Lhadj,et al. Combining heterogeneous anomaly detectors for improved software security , 2017, J. Syst. Softw..

[55] Thomas Moyer,et al. Towards Scalable Cluster Auditing through Grammatical Inference over Provenance Graphs , 2018, NDSS.

[56] Ali S. Hadi,et al. Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[57] Margo I. Seltzer,et al. FRAPpuccino: Fault-detection through Runtime Analysis of Provenance , 2017, HotCloud.

[58] Tal Garfinkel,et al. Traps and Pitfalls: Practical Problems in System Call Interposition Based Security Tools , 2003, NDSS.

[59] Mu Zhang,et al. Towards a Timely Causality Analysis for Enterprise Security , 2018, NDSS.

[60] Jim X. Chen,et al. Transparent Computing , 2017, Comput. Sci. Eng..

[61] Marc Dacier,et al. Fixed- vs. Variable-Length Patterns for Detecting Suspicious Process Behavior , 1998, J. Comput. Secur..

[62] Philip S. Yu,et al. Outlier detection in graph streams , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[63] Sudipto Guha,et al. Graph sketches: sparsification, spanners, and subgraphs , 2012, PODS.

[64] Leman Akoglu,et al. Fast Memory-efficient Anomaly Detection in Streaming Heterogeneous Graphs , 2016, KDD.

[65] Mattia Fazzini. Tagging and Tracking of Multi-level Host Events for Transparent Computing , 2017 .

[66] Shoaib Kamil,et al. OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[67] Jerome H. Friedman,et al. On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality , 2004, Data Mining and Knowledge Discovery.

[68] Stefano Zanero,et al. Detecting Intrusions through System Call Sequence and Argument Analysis , 2010, IEEE Transactions on Dependable and Secure Computing.

[69] Giovanni Vigna,et al. Exploiting Execution Context for the Detection of Anomalous System Calls , 2007, RAID.

[70] Paul Barford,et al. Intrusion as (anti)social communication: characterization and detection , 2012, KDD.

[71] Dmitry Namiot,et al. On micro-services architecture , 2014 .

[72] Ping Li,et al. 0-Bit Consistent Weighted Sampling , 2015, KDD.

[73] Naren Ramakrishnan,et al. Long-Span Program Behavior Modeling and Attack Detection , 2017, ACM Trans. Priv. Secur..

[74] Nong Ye,et al. A Markov Chain Model of Temporal Behavior for Anomaly Detection , 2000 .

[75] Stephanie Forrest,et al. Automated response using system-call delays , 2000 .

[76] John A. Clark,et al. Masquerade mimicry attack detection: A randomised approach , 2011, Comput. Secur..

[77] Stephanie Forrest,et al. A sense of self for Unix processes , 1996, Proceedings 1996 IEEE Symposium on Security and Privacy.

[78] David A. Wagner,et al. Mimicry attacks on host-based intrusion detection systems , 2002, CCS '02.

[79] Eric Gilbert,et al. A statistical framework for streaming graph analysis , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[80] Ping Li,et al. In Defense of Minhash over Simhash , 2014, AISTATS.

[81] Ely Porat,et al. Sketching Techniques for Collaborative Filtering , 2009, IJCAI.

[82] Kevin Leyton-Brown,et al. Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[83] Peng Gao,et al. SAQL: A Stream-based Query System for Real-Time Abnormal System Behavior Detection , 2018, USENIX Security Symposium.

[84] Jiankun Hu,et al. Windows Based Data Sets for Evaluation of Robustness of Host Based Intrusion Detection Systems (IDS) to Zero-Day and Stealth Attacks , 2016, Future Internet.

[85] Chao Liu,et al. Mining Behavior Graphs for "Backtrace" of Noncrashing Bugs , 2005, SDM.

[86] Andrew Zisserman,et al. Near Duplicate Image Detection: min-Hash and tf-idf Weighting , 2008, BMVC.

[87] Vinod Yegneswaran,et al. Mining Data Provenance to Detect Advanced Persistent Threats , 2019, TaPP.

[88] Abdelwahab Hamou-Lhadj,et al. Total ADS: Automated Software Anomaly Detection System , 2014, 2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation.

[89] Marc Dacier,et al. Intrusion Detection Using Variable-Length Audit Trail Patterns , 2000, Recent Advances in Intrusion Detection.

[90] Ambuj K. Singh,et al. NetSpot: Spotting Significant Anomalous Regions on Dynamic Networks , 2013, SDM.

[91] Leyla Bilge,et al. Before we knew it: an empirical study of zero-day attacks in the real world , 2012, CCS.

[92] Xiangyu Zhang,et al. ProTracer: Towards Practical Provenance Tracing by Alternating Between Logging and Tainting , 2016, NDSS.

[93] Tyler Moore,et al. Polymorphic Malware Detection Using Sequence Classification Methods , 2016, 2016 IEEE Security and Privacy Workshops (SPW).

[94] Guy E. Blelloch,et al. GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[95] Heikki Mannila,et al. Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[96] Frédéric Tronel,et al. Verifying the Reliability of Operating System-Level Information Flow Control Systems in Linux , 2017, 2017 IEEE/ACM 5th International FME Workshop on Formal Methods in Software Engineering (FormaliSE).

[97] Geoffrey I. Webb,et al. Advances in Knowledge Discovery and Data Mining , 2018, Lecture Notes in Computer Science.

[98] Barbara G. Ryder,et al. A Sharper Sense of Self: Probabilistic Reasoning of Program Behaviors for Anomaly Detection with Context Sensitivity , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[99] Patrick D. McDaniel,et al. Hi-Fi: collecting high-fidelity whole-system provenance , 2012, ACSAC '12.

[100] Barbara G. Ryder,et al. A Formal Framework for Program Anomaly Detection , 2015, RAID.

[101] Crispin Cowan,et al. Linux security modules: general security support for the linux kernel , 2002, Foundations of Intrusion Tolerant Systems, 2003 [Organically Assured and Survivable Information Systems].

[102] Carl Doersch,et al. Tutorial on Variational Autoencoders , 2016, ArXiv.

[103] Ian Goldberg,et al. A Secure Environment for Untrusted Helper Applications ( Confining the Wily Hacker ) , 1996 .

[104] VARUN CHANDOLA,et al. Anomaly detection: A survey , 2009, CSUR.

[105] Fei Wang,et al. HERCULE: attack story reconstruction via community discovery on correlated log graph , 2016, ACSAC.

[106] Chengqi Zhang,et al. Consistent Weighted Sampling Made More Practical , 2017, WWW.

[107] Kunal Talwar,et al. Consistent Weighted Sampling , 2007 .

[108] Danai Koutra,et al. NetSimile: A Scalable Approach to Size-Independent Network Similarity , 2012, ArXiv.

[109] S. V. N. Vishwanathan,et al. Graph kernels , 2007 .

[110] V. N. Venkatakrishnan,et al. HOLMES: Real-Time APT Detection through Correlation of Suspicious Information Flows , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[111] Margo I. Seltzer,et al. Provenance-Aware Storage Systems , 2006, USENIX ATC, General Track.

[112] Ivan Koychev,et al. Gradual Forgetting for Adaptation to Concept Drift , 2000 .

[113] R. Sekar,et al. A fast automaton-based method for detecting anomalous program behaviors , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[114] R. Sekar,et al. User-Level Infrastructure for System Call Interposition: A Platform for Intrusion Detection and Confinement , 2000, NDSS.

[115] V. N. Venkatakrishnan,et al. SLEUTH: Real-time Attack Scenario Reconstruction from COTS Audit Data , 2018, USENIX Security Symposium.

[116] Thomas Moyer,et al. Trustworthy Whole-System Provenance for the Linux Kernel , 2015, USENIX Security Symposium.

[117] David Bernstein,et al. Containers and Cloud: From LXC to Docker to Kubernetes , 2014, IEEE Cloud Computing.

[118] Alexey Tsymbal,et al. The problem of concept drift: definitions and related work , 2004 .

[119] David S. Johnson,et al. Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[120] Orestis Kostakis,et al. Classy: fast clustering streams of call-graphs , 2014, Data Mining and Knowledge Discovery.