Reasoning about the Appropriate Use of Private Data through Computational Workflows

While there is a plethora of mechanisms to ensure lawful access to privacy-protected data, additional research is required in order to reassure individuals that their personal data is being used for the purpose that they consented to. This is particularly important in the context of new data mining approaches, as used, for instance, in biomedical research and commercial data mining. We argue for the use of computational workflows to ensure and enforce appropriate use of sensitive personal data. Computational workflows describe in a declarative manner the data processing steps and the expected results of complex data analysis processes such as data mining (Gil et al. 2007b; Taylor et al. 2006). We see workflows as an artifact that captures, among other things, how data is being used and for what purpose. Existing frameworks for computational workflows need to be extended to incorporate privacy policies that can govern the use of data.

[1]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[2]  James A. Hendler,et al.  Information accountability , 2008, CACM.

[3]  James A. Hendler,et al.  A Framework for Web Science , 2006, Found. Trends Web Sci..

[4]  Yolanda Gil,et al.  Wings for Pegasus: Creating Large-Scale Scientific Applications Using Semantic Representations of Computational Workflows , 2007, AAAI.

[5]  Paul T. Groth,et al.  Wings: Intelligent Workflow-Based Design of Computational Experiments , 2011, IEEE Intelligent Systems.

[6]  Qi Wang,et al.  Random-data perturbation techniques and privacy-preserving data mining , 2005, Knowledge and Information Systems.

[7]  Martin Dugas,et al.  Impact of integrating clinical and genetic information , 2001, German Conference on Bioinformatics.

[8]  Yolanda Gil,et al.  Privacy enforcement in data analysis workflows , 2007 .

[9]  Dennis Gannon,et al.  Workflows for e-Science, Scientific Workflows for Grids , 2014 .

[10]  James A. Hendler,et al.  A Framework for Web Science (Foundations and Trends(R) in Web Science) , 2006 .

[11]  I. Kohane,et al.  Public standards and patients' control: how to keep electronic medical records accessible but private. , 2001, BMJ : British Medical Journal.

[12]  Yolanda Gil,et al.  Towards privacy aware data analysis workflows for e-science , 2007 .

[13]  JiangWei,et al.  A secure distributed framework for achieving k-anonymity , 2006, VLDB 2006.

[14]  Daniel J. Weitzner Beyond Secrecy: New Privacy Protection Strategies for Open Information Spaces , 2007, IEEE Internet Computing.

[15]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[16]  Chris Clifton,et al.  A secure distributed framework for achieving k-anonymity , 2006, The VLDB Journal.

[17]  James A. Hendler,et al.  Transparent Accountable Data Mining: New Strategies for Privacy Protection , 2006, AAAI Spring Symposium: Semantic Web Meets eGovernment.

[18]  Russ B Altman,et al.  Health-information altruists--a potentially critical resource. , 2005, The New England journal of medicine.

[19]  Fahiem Bacchus,et al.  Planning for temporally extended goals , 1996, Annals of Mathematics and Artificial Intelligence.

[20]  Paul Helman,et al.  Protecting data privacy through hard-to-reverse negative databases , 2007, International Journal of Information Security.

[21]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[22]  Peter Szolovits,et al.  Evaluating the state-of-the-art in automatic de-identification. , 2007, Journal of the American Medical Informatics Association : JAMIA.

[23]  Latanya Sweeney,et al.  Finding lists of people on the web , 2004, CSOC.

[24]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2000, Journal of Cryptology.