Personal Data Management with the Databox: What's Inside the Box?

We are all increasingly the subjects of data collection and processing systems that use data generated both about and by us to provide and optimise a wide range of services. Means for others to collect and process data that concerns each of us -- often referred to possessively as "your data" -- are only increasing with the long-heralded advent of the Internet of Things just the latest example. As a result, means to enable personal data management is generally recognised as a pressing societal issue. We have previously proposed that one such means might be realised by the Databox, a collection of physical and cloud-hosted software components that provide for an individual data subject to manage, log and audit access to their data by other parties. In this paper we elaborate on this proposal, describing the software architecture we are developing, and the current status of a prototype implementation. We conclude with a brief discussion of Databox's limitations.

[1]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[2]  Hamed Haddadi,et al.  MobiAd: private and scalable mobile advertising , 2010, MobiArch '10.

[3]  Hari Balakrishnan,et al.  CryptDB: processing queries on an encrypted database , 2012, CACM.

[4]  Vinod Vaikuntanathan,et al.  Can homomorphic encryption be practical? , 2011, CCSW '11.

[5]  A. Anonymous,et al.  Consumer Data Privacy in a Networked World: A Framework for Protecting Privacy and Promoting Innovation in the Global Digital Economy , 2013, J. Priv. Confidentiality.

[6]  J. Crowcroft,et al.  Irminsule ; a branch-consistent distributed library database , 2014 .

[7]  Erez Shmueli,et al.  openPDS: Protecting the Privacy of Metadata through SafeAnswers , 2014, PloS one.

[8]  Hamed Haddadi,et al.  Privacy analytics , 2012, CCRV.

[9]  Tom Rodden,et al.  Homework: putting interaction into the infrastructure , 2012, UIST '12.

[10]  John P A Ioannidis,et al.  Informed Consent, Big Data, and the Oxymoron of Research That Is Not Research , 2013, The American journal of bioethics : AJOB.

[11]  Saikat Guha,et al.  Serving Ads from localhost for Performance, Privacy, and Profit , 2009, HotNets.

[12]  Adam J. Lee,et al.  Secured histories: computing group statistics on encrypted data while preserving individual privacy , 2010, ArXiv.

[13]  Seunghak Lee,et al.  More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.

[14]  D. Boyd,et al.  CRITICAL QUESTIONS FOR BIG DATA , 2012 .

[15]  Carmela Troncoso,et al.  On the Impact of Social Network Profiling on Anonymity , 2008, Privacy Enhancing Technologies.

[16]  Jon Crowcroft,et al.  Unikernels: library operating systems for the cloud , 2013, ASPLOS '13.

[17]  Antony I. T. Rowstron,et al.  Delay aware querying with Seaweed , 2007, The VLDB Journal.

[18]  Andy Crabtree,et al.  Human Data Interaction: Historical Lessons from Social Studies and CSCW , 2015, ECSCW.

[19]  Hamed Haddadi,et al.  Personal Data: Thinking Inside the Box , 2015, Aarhus Conference on Critical Alternatives.

[20]  Tyrone Grandison,et al.  Compliance with data protection laws using Hippocratic Database active enforcement and auditing , 2007, IBM Syst. J..

[21]  M. Howard Williams,et al.  Enabling Data Subjects to Remain Data Owners , 2015, KES-AMSTA.

[22]  Hamed Haddadi,et al.  Human-Data Interaction: The Human Face of the Data-Driven Society , 2014, ArXiv.