Investigating the Effectiveness of Android Privacy Policies

Investigating the Effectiveness of Android Privacy Policies Yi Ping Sun Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2018 Smartphones are nowadays an indispensable tool for people around the world. Privacy issues, however, arise along with the increasing capabilities equipped on smartphones. Privacy policies are the main mechanism through which users are informed about data practices performed by smartphone applications. In this thesis, we investigate the effectiveness of privacy policies of Android applications through a series of three studies on policy accuracy, policy understandability, and policy template usage. In our study, almost 60% of apps provided inaccurate policies describing data collection. We found the majority of Android privacy policies likely difficult to read for 26% of the US population. Furthermore, we estimate that 25% of app developers use policy template services to generate pre-written policy text, and most of such services offer poor coverage for Android-related data practices, contributing to policy inaccuracy.

[1]  Irene Pollach,et al.  What's wrong with online privacy policies? , 2007, CACM.

[2]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[3]  Eric Bodden,et al.  A Machine-learning Approach for Classifying and Categorizing Android Sources and Sinks , 2014, NDSS.

[4]  Jerry den Hartog,et al.  A machine learning solution to assess privacy policy completeness: (short paper) , 2012, WPES '12.

[5]  Benjamin Fabian,et al.  Privacy Policies and Users' Trust: Does Readability Matter? , 2014, AMCIS.

[6]  Tao Zhang,et al.  AutoPPG: Towards Automatic Generation of Privacy Policy for Android Applications , 2015, SPSM@CCS.

[7]  Siddharth Suri,et al.  Conducting behavioral research on Amazon’s Mechanical Turk , 2010, Behavior research methods.

[8]  R. Flesch A new readability yardstick. , 1948, The Journal of applied psychology.

[9]  Gongping Yang,et al.  On the Class Imbalance Problem , 2008, 2008 Fourth International Conference on Natural Computation.

[10]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[11]  Ajith Abraham,et al.  A Review of Class Imbalance Problem , 2014 .

[12]  E A Smith,et al.  Automated readability index. , 1967, AMRL-TR. Aerospace Medical Research Laboratories.

[13]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[14]  Monika Henzinger,et al.  Finding near-duplicate web pages: a large-scale evaluation of algorithms , 2006, SIGIR.

[15]  Bin Liu,et al.  Automated Analysis of Privacy Requirements for Mobile Apps , 2016, NDSS.

[16]  Geoffrey Zweig,et al.  Syntactic Clustering of the Web , 1997, Comput. Networks.

[17]  Marc Najork,et al.  On the evolution of clusters of near-duplicate Web pages , 2003, Proceedings of the IEEE/LEOS 3rd International Conference on Numerical Simulation of Semiconductor Optoelectronic Devices (IEEE Cat. No.03EX726).

[18]  Iadh Ounis,et al.  Crowdsourcing a News Query Classification Dataset , 2010 .

[19]  Colin Potts,et al.  Privacy policies as decision-making tools: an evaluation of online privacy notices , 2004, CHI.

[20]  Ram Krishnan,et al.  Toward a Framework for Detecting Privacy Policy Violations in Android Application Code , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[21]  Fernando Nogueira,et al.  Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning , 2016, J. Mach. Learn. Res..

[22]  David R. Karger,et al.  Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.

[23]  Jacques Klein,et al.  FlowDroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android apps , 2014, PLDI.

[24]  Turk Paul Wais,et al.  Towards Building a High-Quality Workforce with Mechanical , 2010 .

[25]  Noah A. Smith,et al.  Crowdsourcing Annotations for Websites' Privacy Policies: Can It Really Work? , 2016, WWW.

[26]  Fred H. Cate,et al.  The Limits of Notice and Choice , 2010, IEEE Security & Privacy.

[27]  Ali Sunyaev,et al.  Availability and quality of mobile health app privacy policies , 2015, J. Am. Medical Informatics Assoc..

[28]  Tao Zhang,et al.  Can We Trust the Privacy Policies of Android Apps? , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[29]  Steven M. Bellovin,et al.  Privee: An Architecture for Automatically Analyzing Web Privacy Policies , 2014, USENIX Security Symposium.

[30]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[31]  Aleecia M. McDonald,et al.  The Cost of Reading Privacy Policies , 2009 .

[32]  Zhen Huang,et al.  PScout: analyzing the Android permission specification , 2012, CCS.

[33]  Norman M. Sadeh,et al.  Identifying the Provision of Choices in Privacy Policy Text , 2017, EMNLP.

[34]  Mikhail Bilenko and Raymond J. Mooney,et al.  On Evaluation and Training-Set Construction for Duplicate Detection , 2003 .

[35]  Kang G. Shin,et al.  Polisis: Automated Analysis and Presentation of Privacy Policies Using Deep Learning , 2018, USENIX Security Symposium.

[36]  Thomas B. Norton,et al.  Privacy Harms and the Effectiveness of the Notice and Choice Framework , 2014 .

[37]  Frederick Liu,et al.  The Creation and Analysis of a Website Privacy Policy Corpus , 2016, ACL.

[38]  Lorrie Faith Cranor,et al.  The Privacy and Security Behaviors of Smartphone App Developers , 2014 .

[39]  Jason Nieh,et al.  A measurement study of google play , 2014, SIGMETRICS '14.

[40]  Lorrie Faith Cranor,et al.  Disagreeable Privacy Policies: Mismatches between Meaning and Users’ Understanding , 2014 .

[41]  Panagiotis G. Ipeirotis,et al.  Quality management on Amazon Mechanical Turk , 2010, HCOMP '10.

[42]  R. P. Fishburne,et al.  Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[43]  G. Harry McLaughlin,et al.  SMOG Grading - A New Readability Formula. , 1969 .

[44]  Lorrie Faith Cranor,et al.  Necessary But Not Sufficient: Standardized Mechanisms for Privacy Notice and Choice , 2012, J. Telecommun. High Technol. Law.