Provenance-based Belief

Provenance has been touted as a basis to establish trust in data. Intuitively, belief in a hypothesis should depend on how much one trusts the relevant data. However, current proposals to assess trust based solely on provenance are insufficient for rigourous decision making. We describe a model of provenance and belief that is necessary and sufficient to incorporate "trust in the data" in a way that supports normative inference. The model is based on the observation that provenance can be viewed as a causal structure which can be used to compute belief from assessments of the accuracy of sources and transformations that produced relevant data. In our model, data sources are like sensors with associated conditional probability tables. Provenance identifies dependencies among sensors. Together, this information allows construction of causal networks that can be used to compute the belief in a state of the world based on observation of data. This model formalizes the role of source accuracy, and provides a method for formally assessing belief that uses only information in the provenance store, not the contents of the data.

[1]  Dan Suciu,et al.  Believe It or Not: Adding Belief Annotations to Databases , 2009, Proc. VLDB Endow..

[2]  David Lindley Scoring rules and the inevitability of probability , 1982 .

[3]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[4]  Yolanda Gil,et al.  Towards content trust of web resources , 2006, WWW '06.

[5]  Partha Pratim Talukdar,et al.  The ORCHESTRA Collaborative Data Sharing System , 2008, SIGMOD Rec..

[6]  Carole A. Goble,et al.  Data Lineage Model for Taverna Workflows with Lightweight Annotation Requirements , 2008, IPAW.

[7]  Jennifer Golbeck,et al.  Trust on the World Wide Web: A Survey , 2006, Found. Trends Web Sci..

[8]  Divesh Srivastava,et al.  Integrating Conflicting Data: The Role of Source Dependence , 2009, Proc. VLDB Endow..

[9]  Elisa Bertino,et al.  Query Processing Techniques for Compliance with Data Confidence Policies , 2009, Secure Data Management.

[10]  Luc Moreau,et al.  The Open Provenance Model , 2007 .

[11]  Jennifer Widom,et al.  ULDBs: databases with uncertainty and lineage , 2006, VLDB.

[12]  Maurice van Keulen,et al.  Quality Measures in Uncertain Data Management , 2007, SUM.

[13]  A. Tversky,et al.  Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[14]  William McMullen,et al.  A Flexible And Generic Data Quality Metamodel , 2007, ICIQ.

[15]  Olaf Hartig,et al.  Using Web Data Provenance for Quality Assessment , 2009, SWPM.

[16]  Omer F. Rana,et al.  Evaluating Provenance-based Trust for Scientific Workflows , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[17]  David Maier,et al.  Scientific Exploration in the Era of Ocean Observatories , 2008, Computing in Science & Engineering.

[18]  Stuart E. Madnick,et al.  Measuring Data Believability: A Provenance Approach , 2007, Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008).