The Usable Privacy Policy Project : Combining Crowdsourcing , Machine Learning and Natural Language Processing to Semi-Automatically Answer Those Privacy Questions Users Care About

Natural language privacy policies have become a de facto standard to address expectations of “notice and choice” on the Web. However, users generally do not read these policies and those who do read them struggle to understand their content. Initiatives aimed at addressing this problem through the development of machine-readable standards have run into obstacles, with many website operators showing reluctance to commit to anything more than what they currently do. This project builds on recent advances in natural language processing, privacy preference modeling, crowdsourcing, formal methods, and privacy interface design to develop a practical framework based on websites’ existing natural language privacy policy that empowers users to more meaningfully control their privacy, without requiring additional cooperation from website operators. Our approach combines fundamental research with the development of scalable technologies to (1) semi-automatically extract key privacy policy features from natural language privacy policies, and (2) present these features to users in an easy-to-digest format that enables them to make more informed privacy decisions as they interact with different websites. This work will also involve the systematic collection and analysis of website privacy policies, looking for trends and deficiencies both in the wording and content of these policies across different sectors and using this analysis to inform public policy. This report outlines the project’s research agenda and overall approach.

[1]  R. W. White Motivation reconsidered: the concept of competence. , 1959, Psychological review.

[2]  F. Mosteller,et al.  Inference and Disputed Authorship: The Federalist , 1966 .

[3]  R. Decharms Personal causation : the internal affective determinants of behavior , 1968 .

[4]  William Samuelson,et al.  Status quo bias in decision making , 1988 .

[5]  Shlomo Zilberstein,et al.  Models of Bounded Rationality , 1995 .

[6]  Raymond J. Mooney,et al.  Learning to Parse Database Queries Using Inductive Logic Programming , 1996, AAAI/IAAI, Vol. 2.

[7]  Joel R. Reidenberg,et al.  The Use of Technology to Assure Internet Privacy : Adapting Labels and Filters for Data Protection , 1997 .

[8]  Ted O’Donoghue,et al.  The economics of immediate gratification , 2000 .

[9]  Daniel Gildea,et al.  Automatic Labeling of Semantic Roles , 2000, ACL.

[10]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[11]  Diego Calvanese,et al.  The Description Logic Handbook: Theory, Implementation, and Applications , 2003, Description Logic Handbook.

[12]  Colin Camerer,et al.  Behavioral Economics: Past, Present, Future , 2003 .

[13]  Lalana Kagal,et al.  A Policy-Based Approach to Governing Autonomous Behavior in Distributed Environments , 2004 .

[14]  Annie I. Antón,et al.  Analyzing goal semantics for rights, permissions, and obligations , 2005, 13th IEEE International Conference on Requirements Engineering (RE'05).

[15]  Oliver Günther,et al.  Privacy in e-commerce: stated preferences vs. actual behavior , 2005, CACM.

[16]  John F. Horty,et al.  Deontic logic as founded on nonmonotonic logic , 1993, Annals of Mathematics and Artificial Intelligence.

[17]  Annie I. Antón,et al.  Deriving semantic models from privacy policies , 2005, Sixth IEEE International Workshop on Policies for Distributed Systems and Networks (POLICY'05).

[18]  G. Loewenstein,et al.  Hot-cold empathy gaps and medical decision making. , 2005, Health psychology : official journal of the Division of Health Psychology, American Psychological Association.

[19]  Ponnurangam Kumaraguru,et al.  Privacy Indexes: A Survey of Westin's Studies , 2005 .

[20]  Helen Nissenbaum,et al.  Privacy and contextual integrity: framework and applications , 2006, 2006 IEEE Symposium on Security and Privacy (S&P'06).

[21]  L. Cranor,et al.  An Evaluation of the Effect of US Financial Privacy Legislation Through the Analysis of Privacy Policies , 2006 .

[22]  Lorrie Faith Cranor,et al.  User interfaces for privacy agents , 2006, TCHI.

[23]  Colin Potts,et al.  Tracking website data-collection and privacy practices with the iWatch web crawler , 2007, SOUPS '07.

[24]  Alessandro Acquisti,et al.  The Effect of Online Privacy Information on Purchasing Behavior: An Experimental Study , 2011, WEIS.

[25]  G. Loewenstein,et al.  The Economist as Therapist: Methodological Ramifications of 'Light' Paternalism , 2007 .

[26]  Gerald J. Sussman,et al.  Data-Purpose Algebra: Modeling Data Usage Policies , 2007, Eighth IEEE International Workshop on Policies for Distributed Systems and Networks (POLICY'07).

[27]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[28]  Jens Grossklags,et al.  What Can Behavioral Economics Teach Us about Privacy , 2008 .

[29]  Insup Lee,et al.  Privacy apis: formal models for analyzing legal privacy requirements , 2008 .

[30]  Regina Barzilay,et al.  Bayesian Unsupervised Topic Segmentation , 2008, EMNLP.

[31]  Jon Doyle,et al.  Semantic parameterization: A process for modeling domain descriptions , 2008, TSEM.

[32]  Annie I. Antón,et al.  Analyzing Regulatory Rules for Privacy and Security Requirements , 2008, IEEE Transactions on Software Engineering.

[33]  Jeffrey M. Bradshaw,et al.  New Developments in Ontology-Based Policy Management: Increasing the Practicality and Comprehensiveness of KAoS , 2008, 2008 IEEE Workshop on Policies for Distributed Systems and Networks.

[34]  Lorrie Faith Cranor,et al.  P3P deployment on websites , 2008, Electron. Commer. Res. Appl..

[35]  Lorrie Faith Cranor,et al.  Understanding and capturing people’s privacy policies in a mobile social networking application , 2009, Personal and Ubiquitous Computing.

[36]  Inc. Alias-i Multilevel Bayesian Models of Categorical Data Annotation , 2008 .

[37]  Annie I. Antón,et al.  Legal requirements acquisition for the specification of legally compliant information systems , 2009 .

[38]  Alessandro Acquisti,et al.  Nudging Privacy: The Behavioral Economics of Personal Information , 2009, IEEE Security & Privacy.

[39]  Aleecia M. McDonald,et al.  The Cost of Reading Privacy Policies , 2009 .

[40]  Lorrie Faith Cranor,et al.  Timing is everything?: the effects of timing and placement of online privacy indicators , 2009, CHI.

[41]  Norman M. Sadeh,et al.  Capturing social networking privacy preferences: can default policies help alleviate tradeoffs between expressiveness and user burden? , 2009, Privacy Enhancing Technologies.

[42]  Noah A. Smith,et al.  Predicting Risk from Financial Reports with Regression , 2009, NAACL.

[43]  J. Turow,et al.  Americans Reject Tailored Advertising and Three Activities that Enable It , 2009 .

[44]  George Loewenstein,et al.  Strategies for Promoting Healthier Food Choices. , 2009, The American economic review.

[45]  Lorrie Faith Cranor,et al.  A "nutrition label" for privacy , 2009, SOUPS.

[46]  Lorrie Faith Cranor,et al.  Standardizing privacy notices: an online study of the nutrition label approach , 2010, CHI.

[47]  Michael S. Bernstein,et al.  Soylent: a word processor with a crowd inside , 2010, UIST.

[48]  Lorrie Faith Cranor,et al.  Capturing location-privacy preferences: quantifying accuracy and user-burden tradeoffs , 2011, Personal and Ubiquitous Computing.

[49]  G. Loewenstein,et al.  Misplaced Confidences: Privacy and the Control Paradox. , 2010 .

[50]  Reinhard Muskens,et al.  Type-logical semantics , 2010 .

[51]  Kate Raynes-Goldie,et al.  Aliases, Creeping, and Wall Cleaning: Understanding Privacy in the Age of Facebook , 2010, First Monday.

[52]  Lorrie Faith Cranor,et al.  Token attempt: the misrepresentation of website privacy policies through the misuse of p3p compact policy tokens , 2010, WPES '10.

[53]  Noah A. Smith,et al.  Dual Decomposition with Many Overlapping Components , 2011, EMNLP.

[54]  Noah A. Smith,et al.  Structured Sparsity in Structured Prediction , 2011, EMNLP.

[55]  Yang Wang,et al.  "I regretted the minute I pressed share": a qualitative study of regrets on Facebook , 2011, SOUPS.

[56]  Alessandro Acquisti,et al.  Nudging Users Towards Privacy on Mobile Devices , 2011 .

[57]  Brendan T. O'Connor,et al.  Predicting a Scientific Community’s Response to an Article , 2011, EMNLP.

[58]  Norman Sadeh,et al.  Understandable Learning of Privacy Preferences Through Default Personas and Suggestions , 2011 .

[59]  Lujo Bauer,et al.  Of passwords and people: measuring the effect of password-composition policies , 2011, CHI.

[60]  Noah A. Smith Linguistic Structure Prediction , 2011, Synthesis Lectures on Human Language Technologies.

[61]  Ftc Staff,et al.  Protecting Consumer Privacy in an Era of Rapid Change–A Proposed Framework for Businesses and Policymakers , 2011 .

[62]  Dana Chandler,et al.  Breaking Monotony with Meaning: Motivation in Crowdsourcing Markets , 2012, ArXiv.

[63]  Noah A. Smith,et al.  Textual Predictors of Bill Survival in Congressional Committees , 2012, NAACL.

[64]  Norman M. Sadeh,et al.  Expectation and purpose: understanding users' mental models of mobile app privacy through crowdsourcing , 2012, UbiComp.

[65]  Lorrie Faith Cranor,et al.  A Conundrum of Permissions: Installing Applications on an Android Smartphone , 2012, Financial Cryptography Workshops.

[66]  Travis D. Breaux,et al.  Reconciling multi-jurisdictional legal requirements: A case study in requirements water marking , 2012, 2012 20th IEEE International Requirements Engineering Conference (RE).

[67]  Noah A. Smith,et al.  Automatic Categorization of Privacy Policies: A Pilot Study , 2012 .

[68]  Blase Ur,et al.  "i read my Twitter the next morning and was astonished": a conversational perspective on Twitter regrets , 2013, CHI.

[69]  Elisa Bertino,et al.  Quality Control in Crowdsourcing Systems: Issues and Directions , 2013, IEEE Internet Computing.

[70]  Lorrie Faith Cranor,et al.  Privacy as part of the app decision-making process , 2013, CHI.

[71]  Yang Wang,et al.  What matters to users?: factors that affect users' willingness to share information with online advertisers , 2013, SOUPS.

[72]  Noah A. Smith,et al.  Frame-Semantic Parsing , 2014, CL.

[73]  Norman M. Sadeh,et al.  Reconciling mobile app privacy and usability on smartphones: could user privacy profiles help? , 2014, WWW.

[74]  Chris Arney Nudge: Improving Decisions about Health, Wealth, and Happiness , 2015 .

[75]  Navneet Kaur,et al.  Opinion mining and sentiment analysis , 2016, 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom).