Tracking Personal Data Use: Provenance and Trust

In the era of Big Data, every individual is the target of intensive data collection by parties from the government to grocery store chains. Anecdotal evidence suggests that opting out of the data collection process is effectively impossible [3]. A recent report commissioned by the White House revealed a broad public concern about the collection and use of personal data by untrusted agencies and businesses [1]. As a result, we have seen an effort to improve the transparency of data collection and use. Due to legislative and public pressure, many data collectors now publish privacy policies that explain what personal data is stored and how it is processed. For example, Google’s policy [2] states that “We may combine personal information from one service with information, including personal information, from other Google services [...]. We will not combine DoubleClick cookie information with personally identifiable information unless we have your opt-in consent.” Such policies are useful but have shortcomings; as English-language documents, they are both too confusing for novice users and too vague for experts, and they require human effort to create and maintain. A better solution is to create technological tools that empower individuals to track what happens to their data. The same problem has been addressed in scientific data processing through abstractions and algorithms for workflow provenance [5]. It is time to apply these techniques to the problem of personal data use; just like scientists can trace what happens to individual data points from a dataset, individuals should have access to a “Personal Data Use Workbench”, where they can browse how a company or government agency is using their data.