Unsupervised Alignment of Privacy Policies using Hidden Markov Models

To support empirical study of online privacy policies, as well as tools for users with privacy concerns, we consider the problem of aligning sections of a thousand policy documents, based on the issues they address. We apply an unsupervised HMM; in two new (and reusable) evaluations, we find the approach more effective than clustering and topic models.

[1]  Jerry den Hartog,et al.  What Websites Know About You , 2012, DPM/SETOP.

[2]  Jerry den Hartog,et al.  A machine learning solution to assess privacy policy completeness: (short paper) , 2012, WPES '12.

[3]  Clare-Marie Karat,et al.  An empirical study of natural language parsing of privacy policy rules using the SPARCLE policy workbench , 2006, SOUPS '06.

[4]  Aleecia M. McDonald,et al.  The Cost of Reading Privacy Policies , 2009 .

[5]  Regina Barzilay,et al.  Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization , 2004, NAACL.

[6]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[7]  J. Reeve,et al.  Solutions to problematic polypharmacy: learning from the expertise of patients. , 2015, The British journal of general practice : the journal of the Royal College of General Practitioners.

[8]  Noah A. Smith,et al.  Automatic Categorization of Privacy Policies: A Pilot Study , 2012 .

[9]  Nora Cuppens-Boulahia,et al.  Data Privacy Management and Autonomous Spontaneous Security , 2014, Lecture Notes in Computer Science.

[10]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[11]  Tao Xie,et al.  Automated extraction of security policies from natural-language software documents , 2012, SIGSOFT FSE.

[12]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[13]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[14]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[15]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .