Leveraging Data Provenance to Enhance Cyber Resilience

Building secure systems used to mean ensuring a secure perimeter, but that is no longer the case. Today's systems are ill-equipped to deal with attackers that are able to pierce perimeter defenses. Data provenance is a critical technology in building resilient systems that will allow systems to recover from attackers that manage to overcome the "hard-shell" defenses. In this paper, we provide background information on data provenance, details on provenance collection, analysis, and storage techniques and challenges. Data provenance is situated to address the challenging problem of allowing a system to "fight-through" an attack, and we help to identify necessary work to ensure that future systems are resilient.

[1]  Margo I. Seltzer,et al.  Provenance-Aware Storage Systems , 2006, USENIX ATC, General Track.

[2]  Carole A. Goble,et al.  Using Semantic Web Technologies for Representing E-science Provenance , 2004, SEMWEB.

[3]  Ashish Gehani,et al.  SPADE: Support for Provenance Auditing in Distributed Environments , 2012, Middleware.

[4]  Yong Zhao,et al.  Chimera: a virtual data system for representing, querying, and automating data derivation , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[5]  Patrick D. McDaniel,et al.  Hi-Fi: collecting high-fidelity whole-system provenance , 2012, ACSAC '12.

[6]  R. Sekar,et al.  Efficient fine-grained binary instrumentationwith applications to taint-tracking , 2008, CGO '08.

[7]  Marianne Winslett,et al.  The Case of the Fake Picasso: Preventing History Forgery with Secure Provenance , 2009, FAST.

[8]  Dawn Xiaodong Song,et al.  TaintEraser: protecting sensitive data leaks using application-level taint tracking , 2011, OPSR.

[9]  Silas Boyd-Wickizer,et al.  Securing Distributed Systems with Information Flow Control , 2008, NSDI.

[10]  Jennifer Widom,et al.  Trio: A System for Integrated Management of Data, Accuracy, and Lineage , 2004, CIDR.

[11]  Wang Chiew Tan,et al.  DBNotes: a post-it system for relational databases based on provenance , 2005, SIGMOD '05.

[12]  Somesh Jha,et al.  Retrofitting legacy code for authorization policy enforcement , 2006, 2006 IEEE Symposium on Security and Privacy (S&P'06).

[13]  Xiangyu Zhang,et al.  High Accuracy Attack Provenance via Binary-based Execution Partition , 2013, NDSS.

[14]  Thomas Moyer,et al.  Retrofitting Applications with Provenance-Based Security Monitoring , 2016, ArXiv.

[15]  Dan Feng,et al.  Evaluation of a Hybrid Approach for Efficient Provenance Storage , 2013, TOS.

[16]  Margo I. Seltzer,et al.  BURRITO: Wrapping Your Lab Notebook in Computational Infrastructure , 2012, TaPP.

[17]  Byung-Gon Chun,et al.  TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoring on Smartphones , 2010, OSDI.

[18]  Adriane Chapman,et al.  Efficient provenance storage , 2008, SIGMOD Conference.

[19]  Gustavo Alonso,et al.  Perm: Processing Provenance and Data on the Same Data Model through Query Rewriting , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[20]  Eddie Kohler,et al.  Information flow control for standard OS abstractions , 2007, SOSP.

[21]  Xiangyu Zhang,et al.  ProTracer: Towards Practical Provenance Tracing by Alternating Between Logging and Tainting , 2016, NDSS.

[22]  James Frew,et al.  Earth System Science Workbench: a data management infrastructure for earth science products , 2001, Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001.

[23]  Juliana Freire,et al.  Using VisTrails and Provenance for Teaching Scientific Visualization , 2011, Comput. Graph. Forum.

[24]  Kaizar Amin,et al.  Metadata in the Collaboratory for Multi-Scale Chemical Science , 2003, Dublin Core Conference.

[25]  Margo I. Seltzer,et al.  Choosing a Data Model and Query Language for Provenance , 2008, IPAW 2008.

[26]  Yulai Xie,et al.  A hybrid approach for efficient provenance storage , 2012, CIKM '12.

[27]  Thomas Moyer,et al.  Take Only What You Need: Leveraging Mandatory Access Control Policy to Reduce Provenance Storage Costs , 2015, TaPP.

[28]  Somesh Jha,et al.  Automating Security Mediation Placement , 2010, ESOP.

[29]  Paul T. Groth,et al.  The provenance of electronic data , 2008, CACM.

[30]  Krzysztof Z. Gajos,et al.  Evaluation of Filesystem Provenance Visualization Tools , 2013, IEEE Transactions on Visualization and Computer Graphics.

[31]  Xiangyu Zhang,et al.  LogGC: garbage collecting audit log , 2013, CCS.

[32]  James P Anderson,et al.  Computer Security Technology Planning Study , 1972 .

[33]  Ahmed Amer,et al.  Compressing Provenance Graphs , 2011, TaPP.

[34]  Thomas Moyer,et al.  Trustworthy Whole-System Provenance for the Linux Kernel , 2015, USENIX Security Symposium.

[35]  Trent Jaeger,et al.  A logical specification and analysis for SELinux MLS policy , 2007, SACMAT '07.

[36]  Eddie Kohler,et al.  Making information flow explicit in HiStar , 2006, OSDI '06.

[37]  Scott Klasky,et al.  Tracking Files in the Kepler Provenance Framework , 2009, SSDBM.

[38]  Trent Jaeger,et al.  An architecture for enforcing end-to-end access control over web applications , 2010, SACMAT '10.

[39]  Margo I. Seltzer,et al.  Layering in Provenance Systems , 2009, USENIX Annual Technical Conference.

[40]  Dorothy E. Denning,et al.  A lattice model of secure information flow , 1976, CACM.

[41]  Margo I. Seltzer,et al.  A General-Purpose Provenance Library , 2012, TaPP.

[42]  Mark Greenwood,et al.  Taverna: lessons in creating a workflow environment for the life sciences: Research Articles , 2006 .

[43]  Trent Jaeger,et al.  Design and Implementation of a TCG-based Integrity Measurement Architecture , 2004, USENIX Security Symposium.

[44]  Edward A. Lee,et al.  CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2000; 00:1–7 Prepared using cpeauth.cls [Version: 2002/09/19 v2.02] Taverna: Lessons in creating , 2022 .

[45]  Steve Vandebogart,et al.  Labels and event processes in the Asbestos operating system , 2005, TOCS.

[46]  Jonathan Stern,et al.  Oracle Essentials: Oracle Database 10g , 2004 .

[47]  Trent Jaeger,et al.  Cut me some security , 2010, SafeConfig '10.

[48]  Andy Hopper,et al.  HadoopProv: Towards Provenance as a First Class Citizen in MapReduce , 2013, TaPP.

[49]  Rok Sosic,et al.  SNAP , 2016, ACM Trans. Intell. Syst. Technol..

[50]  Rob Davies,et al.  ActiveMQ in Action , 2011 .

[51]  Robert K. Cunningham,et al.  Computing on masked data: a high performance method for improving big data veracity , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).

[52]  Cláudio T. Silva,et al.  VisTrails: visualization meets data management , 2006, SIGMOD Conference.