Our digital world relies upon data but its data practices are increasingly opaque -- to users who want to know whether these practices are consistent with their preferences and needs, to those who develop digital technologies and need to understand their workings and effects, and to the regulators who seek to ensure compliance with laws. We need to make data practices more transparent and we need to enlist technology itself in this project through developing stronger transparency-enhancing technologies (TETs).
In this project, funded by the Office of the Privacy Commissioner of Canada (OPC), we created a prototype of such a TET, AppTrans (Transparency for Android Applications). It uses digital technologies to compare the declared data practices of mobile apps to their actual (or potential) practices in order to flag potential discrepancies. This involves three steps. The first involves using AI techniques to automate the reading of privacy policies, which are currently the main means through which apps declare their data practices. The second involves the scanning and analysis of an app’s static code in order to determine what the potential permission are in relation to personal data collection. The third involves comparing the first and second to determine potential personal data collection practices that are not declared in the relevant privacy policy.
We emphasize that the current tool was developed as a research prototype with a small staff (a couple of graduate students). As such, the goal of the tool was to establish feasibility for tools of this type, and it was not intended in its current form for broad deployment and use. Nonetheless, our results show that this type of tool is feasible and can offer regulators new insights into the activities that they regulate.
For example, our empirical results find several interesting phenomena. The first is that we found significant non-compliance between applications and their associated policies (average of 59.5%). We determined that this is often the result of undeclared collection of personally identifiable information by third party code used by the developer, rather than code written by the developer themselves. Apps are often developed through integrating components that have been created by others through the use of third party “libraries”. For example, the easiest way to create ad-supporting apps is through using ad libraries and those libraries can then potentially collect personally identifiable information that is available to the app. Another common type of library used is an analytics library, which measures and documents user engagement. Our results show that such use of personally identifiable information by third party code use is not being properly documented in the app privacy policies. We recommend that the OPC study the issue of third party code use by app developers and issue recommendations.
We also found that a large fraction of privacy policies are written at a language level that is above that of many smartphone users. We recommend that the OPC consider using available tools to study the language levels of app privacy policies, perhaps in specific sectors, in order to offer best practice guidelines in relation to the question of language levels.