Towards Change Detection in Privacy Policies with Natural Language Processing

Privacy policies notify users about the privacy practices of websites, mobile apps, and other products and services. However, users rarely read them and struggle to understand their contents. Due to the complicated nature of these documents, it gets even harder to understand and take note of any changes of interest or concern when the policies are changed or revised. With advances in machine learning and natural language processing, tools that can automatically annotate sentences of policies have been developed. These annotations can help a user identify and understand relevant parts of a privacy policy. In this paper, we present our attempt to further such annotations by also detecting the important changes that occurred across sentences. Using supervised machine learning models, word-embedding, similarity matching, and structural analysis of sentences, we present a process that takes two different versions of a privacy policy as input, matches the sentences of one version to another based on semantic similarity, and identifies relevant changes between two matched sentences. We present the results and insights of applying our approach on 79 privacy policies manually downloaded from Facebook, WhatsApp, Twitter, Google, LinkedIn and Snapchat, ranging between the period of 1999 to 2020.