A Core Calculus for Provenance

Provenance is an increasing concern due to the ongoing revolution in sharing and processing scientific data on the Web and in other computer systems. It is proposed that many computer systems will need to become provenance-aware in order to provide satisfactory accountability, reproducibility, and trust for scientific or other high-value data. To date, there is not a consensus concerning appropriate formal models or security properties for provenance. In previous work, we introduced a formal framework for provenance security and proposed formal definitions of properties called disclosure and obfuscation. In this article, we study refined notions of positive and negative disclosure and obfuscation in a concrete setting, that of a general-purpose programing language. Previous models of provenance have focused on special-purpose languages such as workflows and database queries. We consider a higher-order, functional language with sums, products, and recursive types and functions, and equip it with a tracing semantics in which traces themselves can be replayed as computations. We present an annotation-propagation framework that supports many provenance views over traces, including standard forms of provenance studied previously. We investigate some relationships among provenance views and develop some partial solutions to the disclosure and obfuscation problems, including correct algorithms for disclosure and positive obfuscation based on trace slicing.

[1]  Benjamin C. Pierce,et al.  Types and programming languages: the next generation , 2003, 18th Annual IEEE Symposium of Logic in Computer Science, 2003. Proceedings..

[2]  Bertram Ludäscher,et al.  ProPub: Towards a Declarative Approach for Publishing Customized, Policy-Aware Provenance , 2011, SSDBM.

[3]  James Cheney,et al.  A Graph Model of Data and Workflow Provenance , 2010, TaPP.

[4]  James Cheney,et al.  Provenance in Databases: Why, How, and Where , 2009, Found. Trends Databases.

[5]  James Cheney,et al.  A Core Calculus for Provenance , 2012, POST.

[6]  James Frew,et al.  Lineage retrieval for scientific data processing: a survey , 2005, CSUR.

[7]  Yogesh L. Simmhan,et al.  The Open Provenance Model core specification (v1.1) , 2011, Future Gener. Comput. Syst..

[8]  Andrew P. Martin,et al.  Provenance as a Security Control , 2012, TaPP.

[9]  Michael Hicks,et al.  Fable: A Language for Enforcing User-defined Security Policies , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[10]  Joseph Y. Halpern,et al.  Causes and Explanations: A Structural-Model Approach. Part I: Causes , 2000, The British Journal for the Philosophy of Science.

[11]  Stephen Chong Towards Semantics for Provenance Security , 2009, Workshop on the Theory and Practice of Provenance.

[12]  Benjamin C. Pierce,et al.  Boomerang: resourceful lenses for string data , 2008, POPL '08.

[13]  Roland Perera,et al.  Interactive functional programming , 2013 .

[14]  Andrew P. Martin,et al.  Trusted Computing and Provenance: Better Together , 2010, TaPP.

[15]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[16]  Adriane Chapman,et al.  Surrogate Parenthood: Protected and Informative Graphs , 2011, Proc. VLDB Endow..

[17]  Val Tannen,et al.  Provenance semirings , 2007, PODS.

[18]  Jacek Sroka,et al.  A Formal Model of Dataflow Repositories , 2007, DILS.

[19]  Luc Moreau,et al.  The Foundations for Provenance on the Web , 2010, Found. Trends Web Sci..

[20]  Mark Weiser,et al.  Program Slicing , 1981, IEEE Transactions on Software Engineering.

[21]  Joseph Y. Halpern,et al.  Causes and Explanations: A Structural-Model Approach. Part II: Explanations , 2001, The British Journal for the Philosophy of Science.

[22]  James Cheney Provenance, XML, and the Scientific Web , 2009 .

[23]  Juan Chen,et al.  Secure distributed programming with value-dependent types , 2011, Journal of Functional Programming.

[24]  Sanjeev Khanna,et al.  Why and Where: A Characterization of Data Provenance , 2001, ICDT.

[25]  Marianne Winslett,et al.  Preventing history forgery with secure provenance , 2009, TOS.

[26]  Val Tannen,et al.  Annotated XML: queries and provenance , 2008, PODS.

[27]  Martín Abadi,et al.  A core calculus of dependency , 1999, POPL '99.

[28]  Steve Zdancewic,et al.  AURA: a programming language for authorization and audit , 2008, ICFP 2008.

[29]  Debmalya Panigrahi,et al.  Provenance views for module privacy , 2010, PODS.

[30]  Susan B. Davidson,et al.  Generating sound workflow views for correct provenance analysis , 2011, TODS.

[31]  Juliana Freire,et al.  Provenance and scientific workflows: challenges and opportunities , 2008, SIGMOD Conference.

[32]  James Cheney,et al.  Provenance as Dependency Analysis , 2007, DBPL.

[33]  James Cheney,et al.  Causality and the Semantics of Provenance , 2010, DCM.

[34]  Guy E. Blelloch,et al.  Adaptive functional programming , 2002, POPL '02.

[35]  Steve A. Schneider,et al.  Formal analysis of a non-repudiation protocol , 1998, Proceedings. 11th IEEE Computer Security Foundations Workshop (Cat. No.98TB100238).

[36]  Matthias Felleisen,et al.  Correct blame for contracts: no more scapegoating , 2011, POPL '11.

[37]  Umut A. Acar Self-adjusting computation: (an overview) , 2009, PEPM '09.

[38]  James Cheney,et al.  Curated databases , 2008, PODS.

[39]  Jing Zhang,et al.  Do You Know Where Your Data's Been? - Tamper-Evident Database Provenance , 2009, Secure Data Management.

[40]  James Cheney,et al.  On the expressiveness of implicit provenance in query and update languages , 2008, TODS.

[41]  James Cheney,et al.  Provenance Traces , 2008, ArXiv.

[42]  James Cheney,et al.  Functional programs that explain their work , 2012, ICFP.

[43]  Radha Jagadeesan,et al.  Tapido: Trust and Authorization Via Provenance and Integrity in Distributed Objects (Extended Abstract) , 2008, ESOP.

[44]  Nataliya Guts,et al.  Reliable Evidence: Auditability by Typing , 2009, ESORICS.

[45]  James Cheney,et al.  A Formal Framework for Provenance Security , 2011, 2011 IEEE 24th Computer Security Foundations Symposium.

[46]  Limin Jia,et al.  Evidence-Based Audit , 2008, 2008 21st IEEE Computer Security Foundations Symposium.

[47]  Margo I. Seltzer,et al.  Provenance: a future history , 2009, OOPSLA Companion.