UNICORN: Runtime Provenance-Based Detector for Advanced Persistent Threats

Advanced Persistent Threats (APTs) are difficult to detect due to their "low-and-slow" attack patterns and frequent use of zero-day exploits. We present UNICORN, an anomaly-based APT detector that effectively leverages data provenance analysis. From modeling to detection, UNICORN tailors its design specifically for the unique characteristics of APTs. Through extensive yet time-efficient graph analysis, UNICORN explores provenance graphs that provide rich contextual and historical information to identify stealthy anomalous activities without pre-defined attack signatures. Using a graph sketching technique, it summarizes long-running system execution with space efficiency to combat slow-acting attacks that take place over a long time span. UNICORN further improves its detection capability using a novel modeling approach to understand long-term behavior as the system evolves. Our evaluation shows that UNICORN outperforms an existing state-of-the-art APT detection system and detects real-life APT scenarios with high accuracy.

[1]  V. N. Venkatakrishnan,et al.  POIROT: Aligning Attack Behavior with Kernel Audit Records for Cyber Threat Hunting , 2019, CCS.

[2]  Philip S. Yu,et al.  GraphScope: parameter-free mining of large time-evolving graphs , 2007, KDD '07.

[3]  Luc Dandurand,et al.  Big Data Analytics for Sophisticated Attack Detection , 2014 .

[4]  Fareed Zaffar,et al.  Identifying the provenance of correlated anomalies , 2011, SAC '11.

[5]  James Cheney,et al.  Aggregating unsupervised provenance anomaly detectors , 2019, TaPP.

[6]  Ashish Gehani,et al.  SPADE: Support for Provenance Auditing in Distributed Environments , 2012, Middleware.

[7]  R. Sekar,et al.  A practical mimicry attack against powerful system-call monitors , 2008, ASIACCS '08.

[8]  Edward Raff,et al.  Malware Classification and Class Imbalance via Stochastic Hashed LZJD , 2017, AISec@CCS.

[9]  John McHugh,et al.  Testing Intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory , 2000, TSEC.

[10]  Andrew McCallum,et al.  Distributional clustering of words for text classification , 1998, SIGIR '98.

[11]  Patrick Haffner,et al.  Support vector machines for histogram-based image classification , 1999, IEEE Trans. Neural Networks.

[12]  George Rajna,et al.  Equifax Data Breach , 2018 .

[13]  Emmanuel Müller,et al.  Focused clustering and outlier detection in large attributed graphs , 2014, KDD.

[14]  Trent Jaeger,et al.  Consistency analysis of authorization hook placement in the Linux security modules framework , 2004, TSEC.

[15]  Robert N. M. Watson,et al.  Exploiting Concurrency Vulnerabilities in System Call Wrappers , 2007, WOOT.

[16]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[17]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[18]  Ding Li,et al.  NoDoze: Combatting Threat Alert Fatigue with Automated Provenance Triage , 2019, NDSS.

[19]  Ralf Klinkenberg,et al.  Learning drifting concepts: Example selection vs. example weighting , 2004, Intell. Data Anal..

[20]  T. Neumann Computers And Intractability A Guide To The Theory Of Np Completeness , 2016 .

[21]  Jiankun Hu,et al.  Generating realistic intrusion detection system dataset based on fuzzy qualitative modeling , 2017, J. Netw. Comput. Appl..

[22]  Stephanie Forrest,et al.  Automated Response Using System-Call Delay , 2000, USENIX Security Symposium.

[23]  David M. Eyers,et al.  Runtime Analysis of Whole-System Provenance , 2018, CCS.

[24]  Charu C. Aggarwal,et al.  Evolutionary Clustering and Analysis of Bibliographic Networks , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[25]  Kurt Mehlhorn,et al.  Weisfeiler-Lehman Graph Kernels , 2011, J. Mach. Learn. Res..

[26]  Yizhou Sun,et al.  On community outliers and their efficient detection in information networks , 2010, KDD.

[27]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[28]  Qi Tian,et al.  Min-Max Hash for Jaccard Similarity , 2013, 2013 IEEE 13th International Conference on Data Mining.

[29]  David M. Eyers,et al.  Practical whole-system provenance capture , 2017, SoCC.

[30]  James P Anderson,et al.  Computer Security Technology Planning Study , 1972 .

[31]  Jiankun Hu,et al.  Generation of a new IDS test dataset: Time to retire the KDD collection , 2013, 2013 IEEE Wireless Communications and Networking Conference (WCNC).

[32]  Jaehong Park,et al.  A provenance-based access control model , 2012, 2012 Tenth Annual International Conference on Privacy, Security and Trust.

[33]  Jinjun Chen,et al.  Host-Based Intrusion Detection System with System Calls , 2018, ACM Comput. Surv..

[34]  Weibo Gong,et al.  Anomaly detection using call stack information , 2003, 2003 Symposium on Security and Privacy, 2003..

[35]  Ali Abbasi,et al.  A gray-box DPDA-based intrusion detection technique using system-call monitoring , 2011, CEAS '11.

[36]  Danai Koutra,et al.  Graph based anomaly detection and description: a survey , 2014, Data Mining and Knowledge Discovery.

[37]  Bin Li,et al.  HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms with Concept Drift , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[38]  Qing Wu,et al.  AnRAD: A Neuromorphic Anomaly Detection Framework for Massive Concurrent Data Streams , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[39]  Philip S. Yu,et al.  On Classification of High-Cardinality Data Streams , 2010, SDM.

[40]  Horst Bunke,et al.  Self-organizing maps for learning the edit costs in graph matching , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[41]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[42]  Heng Tao Shen,et al.  Hashing for Similarity Search: A Survey , 2014, ArXiv.

[43]  Hector Garcia-Molina,et al.  Web graph similarity for anomaly detection , 2010, Journal of Internet Services and Applications.

[44]  Yang Liu,et al.  graph2vec: Learning Distributed Representations of Graphs , 2017, ArXiv.

[45]  Xuelong Li,et al.  A survey of graph edit distance , 2010, Pattern Analysis and Applications.

[46]  Erik van der Kouwe,et al.  Benchmarking Crimes: An Emerging Threat in Systems Security , 2018, ArXiv.

[47]  Arvind Mallari Rao,et al.  Technical Aspects of Cyber Kill Chain , 2015, SSCC.

[48]  Karsten M. Borgwardt,et al.  The graphlet spectrum , 2009, ICML '09.

[49]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[50]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[51]  Margo I. Seltzer,et al.  A primer on provenance , 2014, CACM.

[52]  Dijiang Huang,et al.  A Survey on Advanced Persistent Threats: Techniques, Solutions, Challenges, and Research Opportunities , 2019, IEEE Communications Surveys & Tutorials.

[53]  Chong-Wah Ngo,et al.  Evaluating bag-of-visual-words representations in scene classification , 2007, MIR '07.

[54]  Abdelwahab Hamou-Lhadj,et al.  Combining heterogeneous anomaly detectors for improved software security , 2017, J. Syst. Softw..

[55]  Thomas Moyer,et al.  Towards Scalable Cluster Auditing through Grammatical Inference over Provenance Graphs , 2018, NDSS.

[56]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[57]  Margo I. Seltzer,et al.  FRAPpuccino: Fault-detection through Runtime Analysis of Provenance , 2017, HotCloud.

[58]  Tal Garfinkel,et al.  Traps and Pitfalls: Practical Problems in System Call Interposition Based Security Tools , 2003, NDSS.

[59]  Mu Zhang,et al.  Towards a Timely Causality Analysis for Enterprise Security , 2018, NDSS.

[60]  Jim X. Chen,et al.  Transparent Computing , 2017, Comput. Sci. Eng..

[61]  Marc Dacier,et al.  Fixed- vs. Variable-Length Patterns for Detecting Suspicious Process Behavior , 1998, J. Comput. Secur..

[62]  Philip S. Yu,et al.  Outlier detection in graph streams , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[63]  Sudipto Guha,et al.  Graph sketches: sparsification, spanners, and subgraphs , 2012, PODS.

[64]  Leman Akoglu,et al.  Fast Memory-efficient Anomaly Detection in Streaming Heterogeneous Graphs , 2016, KDD.

[65]  Mattia Fazzini Tagging and Tracking of Multi-level Host Events for Transparent Computing , 2017 .

[66]  Shoaib Kamil,et al.  OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[67]  Jerome H. Friedman,et al.  On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality , 2004, Data Mining and Knowledge Discovery.

[68]  Stefano Zanero,et al.  Detecting Intrusions through System Call Sequence and Argument Analysis , 2010, IEEE Transactions on Dependable and Secure Computing.

[69]  Giovanni Vigna,et al.  Exploiting Execution Context for the Detection of Anomalous System Calls , 2007, RAID.

[70]  Paul Barford,et al.  Intrusion as (anti)social communication: characterization and detection , 2012, KDD.

[71]  Dmitry Namiot,et al.  On micro-services architecture , 2014 .

[72]  Ping Li,et al.  0-Bit Consistent Weighted Sampling , 2015, KDD.

[73]  Naren Ramakrishnan,et al.  Long-Span Program Behavior Modeling and Attack Detection , 2017, ACM Trans. Priv. Secur..

[74]  Nong Ye,et al.  A Markov Chain Model of Temporal Behavior for Anomaly Detection , 2000 .

[75]  Stephanie Forrest,et al.  Automated response using system-call delays , 2000 .

[76]  John A. Clark,et al.  Masquerade mimicry attack detection: A randomised approach , 2011, Comput. Secur..

[77]  Stephanie Forrest,et al.  A sense of self for Unix processes , 1996, Proceedings 1996 IEEE Symposium on Security and Privacy.

[78]  David A. Wagner,et al.  Mimicry attacks on host-based intrusion detection systems , 2002, CCS '02.

[79]  Eric Gilbert,et al.  A statistical framework for streaming graph analysis , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[80]  Ping Li,et al.  In Defense of Minhash over Simhash , 2014, AISTATS.

[81]  Ely Porat,et al.  Sketching Techniques for Collaborative Filtering , 2009, IJCAI.

[82]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[83]  Peng Gao,et al.  SAQL: A Stream-based Query System for Real-Time Abnormal System Behavior Detection , 2018, USENIX Security Symposium.

[84]  Jiankun Hu,et al.  Windows Based Data Sets for Evaluation of Robustness of Host Based Intrusion Detection Systems (IDS) to Zero-Day and Stealth Attacks , 2016, Future Internet.

[85]  Chao Liu,et al.  Mining Behavior Graphs for "Backtrace" of Noncrashing Bugs , 2005, SDM.

[86]  Andrew Zisserman,et al.  Near Duplicate Image Detection: min-Hash and tf-idf Weighting , 2008, BMVC.

[87]  Vinod Yegneswaran,et al.  Mining Data Provenance to Detect Advanced Persistent Threats , 2019, TaPP.

[88]  Abdelwahab Hamou-Lhadj,et al.  Total ADS: Automated Software Anomaly Detection System , 2014, 2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation.

[89]  Marc Dacier,et al.  Intrusion Detection Using Variable-Length Audit Trail Patterns , 2000, Recent Advances in Intrusion Detection.

[90]  Ambuj K. Singh,et al.  NetSpot: Spotting Significant Anomalous Regions on Dynamic Networks , 2013, SDM.

[91]  Leyla Bilge,et al.  Before we knew it: an empirical study of zero-day attacks in the real world , 2012, CCS.

[92]  Xiangyu Zhang,et al.  ProTracer: Towards Practical Provenance Tracing by Alternating Between Logging and Tainting , 2016, NDSS.

[93]  Tyler Moore,et al.  Polymorphic Malware Detection Using Sequence Classification Methods , 2016, 2016 IEEE Security and Privacy Workshops (SPW).

[94]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[95]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[96]  Frédéric Tronel,et al.  Verifying the Reliability of Operating System-Level Information Flow Control Systems in Linux , 2017, 2017 IEEE/ACM 5th International FME Workshop on Formal Methods in Software Engineering (FormaliSE).

[97]  Geoffrey I. Webb,et al.  Advances in Knowledge Discovery and Data Mining , 2018, Lecture Notes in Computer Science.

[98]  Barbara G. Ryder,et al.  A Sharper Sense of Self: Probabilistic Reasoning of Program Behaviors for Anomaly Detection with Context Sensitivity , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[99]  Patrick D. McDaniel,et al.  Hi-Fi: collecting high-fidelity whole-system provenance , 2012, ACSAC '12.

[100]  Barbara G. Ryder,et al.  A Formal Framework for Program Anomaly Detection , 2015, RAID.

[101]  Crispin Cowan,et al.  Linux security modules: general security support for the linux kernel , 2002, Foundations of Intrusion Tolerant Systems, 2003 [Organically Assured and Survivable Information Systems].

[102]  Carl Doersch,et al.  Tutorial on Variational Autoencoders , 2016, ArXiv.

[103]  Ian Goldberg,et al.  A Secure Environment for Untrusted Helper Applications ( Confining the Wily Hacker ) , 1996 .

[104]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[105]  Fei Wang,et al.  HERCULE: attack story reconstruction via community discovery on correlated log graph , 2016, ACSAC.

[106]  Chengqi Zhang,et al.  Consistent Weighted Sampling Made More Practical , 2017, WWW.

[107]  Kunal Talwar,et al.  Consistent Weighted Sampling , 2007 .

[108]  Danai Koutra,et al.  NetSimile: A Scalable Approach to Size-Independent Network Similarity , 2012, ArXiv.

[109]  S. V. N. Vishwanathan,et al.  Graph kernels , 2007 .

[110]  V. N. Venkatakrishnan,et al.  HOLMES: Real-Time APT Detection through Correlation of Suspicious Information Flows , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[111]  Margo I. Seltzer,et al.  Provenance-Aware Storage Systems , 2006, USENIX ATC, General Track.

[112]  Ivan Koychev,et al.  Gradual Forgetting for Adaptation to Concept Drift , 2000 .

[113]  R. Sekar,et al.  A fast automaton-based method for detecting anomalous program behaviors , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[114]  R. Sekar,et al.  User-Level Infrastructure for System Call Interposition: A Platform for Intrusion Detection and Confinement , 2000, NDSS.

[115]  V. N. Venkatakrishnan,et al.  SLEUTH: Real-time Attack Scenario Reconstruction from COTS Audit Data , 2018, USENIX Security Symposium.

[116]  Thomas Moyer,et al.  Trustworthy Whole-System Provenance for the Linux Kernel , 2015, USENIX Security Symposium.

[117]  David Bernstein,et al.  Containers and Cloud: From LXC to Docker to Kubernetes , 2014, IEEE Cloud Computing.

[118]  Alexey Tsymbal,et al.  The problem of concept drift: definitions and related work , 2004 .

[119]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[120]  Orestis Kostakis,et al.  Classy: fast clustering streams of call-graphs , 2014, Data Mining and Knowledge Discovery.