MAPS: Scaling Privacy Compliance Analysis to a Million Apps

Abstract The app economy is largely reliant on data collection as its primary revenue model. To comply with legal requirements, app developers are often obligated to notify users of their privacy practices in privacy policies. However, prior research has suggested that many developers are not accurately disclosing their apps’ privacy practices. Evaluating discrepancies between apps’ code and privacy policies enables the identification of potential compliance issues. In this study, we introduce the Mobile App Privacy System (MAPS) for conducting an extensive privacy census of Android apps. We designed a pipeline for retrieving and analyzing large app populations based on code analysis and machine learning techniques. In its first application, we conduct a privacy evaluation for a set of 1,035,853 Android apps from the Google Play Store. We find broad evidence of potential non-compliance. Many apps do not have a privacy policy to begin with. Policies that do exist are often silent on the practices performed by apps. For example, 12.1% of apps have at least one location-related potential compliance issue. We hope that our extensive analysis will motivate app stores, government regulators, and app developers to more effectively review apps for potential compliance issues.

[1]  G. Tottie Negation in English speech and writing : a study in variation , 1993 .

[2]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[3]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[4]  Jean Carletta,et al.  Squibs: Reliability Measurement without Limits , 2008, CL.

[5]  Aleecia M. McDonald,et al.  The Cost of Reading Privacy Policies , 2009 .

[6]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[7]  Byung-Gon Chun,et al.  TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoring on Smartphones , 2010, OSDI.

[8]  Hao Chen,et al.  AndroidLeaks: Automatically Detecting Potential Privacy Leaks in Android Applications on a Large Scale , 2012, TRUST.

[9]  Daniel J. Solove,et al.  The FTC and the New Common Law of Privacy , 2013 .

[10]  Noah A. Smith,et al.  Unsupervised Alignment of Privacy Policies using Hidden Markov Models , 2014, ACL.

[11]  Nancy Wilkins-Diehr,et al.  XSEDE: Accelerating Scientific Discovery , 2014, Computing in Science & Engineering.

[12]  Lorrie Faith Cranor,et al.  The Privacy and Security Behaviors of Smartphone App Developers , 2014 .

[13]  Norman M. Sadeh,et al.  Modeling Users' Mobile App Privacy Preferences: Restoring Usability in a Sea of Permission Settings , 2014, SOUPS.

[14]  Jason Nieh,et al.  A measurement study of google play , 2014, SIGMETRICS '14.

[15]  Steven M. Bellovin,et al.  Privee: An Architecture for Automatically Analyzing Web Privacy Policies , 2014, USENIX Security Symposium.

[16]  Jacques Klein,et al.  FlowDroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android apps , 2014, PLDI.

[17]  Hongxia Jin,et al.  Efficient Privilege De-Escalation for Ad Libraries in Mobile Apps , 2015, MobiSys.

[18]  Narseo Vallina-Rodriguez,et al.  Haystack: In Situ Mobile Traffic Analysis in User Space , 2015, ArXiv.

[19]  Jeff H. Perkins,et al.  Information Flow Analysis of Android Applications in DroidSafe , 2015, NDSS.

[20]  C. Kruegel,et al.  A Large-Scale Study of Mobile Web App Security , 2015 .

[21]  L. Cranor,et al.  Are They Worth Reading? An In-Depth Analysis of Online Trackers’ Privacy Policies , 2015 .

[22]  Mitsuaki Akiyama,et al.  Understanding the Inconsistencies between Text Descriptions and the Use of Privacy-sensitive Resources of Mobile Apps , 2015, SOUPS.

[23]  Benjamin Fabian,et al.  Readability of Privacy Policies of Healthcare Websites , 2015, Wirtschaftsinformatik.

[24]  Ricardo Neisse,et al.  A privacy enforcing framework for Android applications , 2016, Comput. Secur..

[25]  Arnaud Legout,et al.  ReCon: Revealing and Controlling PII Leaks in Mobile Network Traffic , 2015, MobiSys.

[26]  Blase Ur,et al.  A Large-Scale Evaluation of U.S. Financial Institutions’ Standardized Privacy Notices , 2016 .

[27]  Gianluca Stringhini,et al.  MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models (Extended Version) , 2016, NDSS 2017.

[28]  Ram Krishnan,et al.  Toward a Framework for Detecting Privacy Policy Violations in Android Application Code , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[29]  Tao Zhang,et al.  Can We Trust the Privacy Policies of Android Apps? , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[30]  Christopher Krügel,et al.  Going Native: Using a Large-Scale Analysis of Android Apps to Create a Practical Native-Code Sandboxing Policy , 2016, NDSS.

[31]  Frederick Liu,et al.  The Creation and Analysis of a Website Privacy Policy Corpus , 2016, ACL.

[32]  Bin Liu,et al.  Automated Analysis of Privacy Requirements for Mobile Apps , 2016, NDSS.

[33]  Gianluca Stringhini,et al.  MaMaDroid , 2019, ACM Trans. Priv. Secur..

[34]  Patrick Traynor,et al.  Regulators, Mount Up! Analysis of Privacy Policies for Mobile Money Services , 2017, SOUPS.

[35]  Norman M. Sadeh,et al.  Identifying the Provision of Choices in Privacy Policy Text , 2017, EMNLP.

[36]  Frederick Liu,et al.  Towards Automatic Classification of Privacy Policy Text , 2017 .

[37]  Wei You,et al.  Mass Discovery of Android Traffic Imprints through Instantiated Partial Execution , 2017, CCS.

[38]  Christopher Krügel,et al.  Obfuscation-Resilient Privacy Leak Detection for Mobile Apps Through Differential Analysis , 2017, NDSS.

[39]  Heng Yin,et al.  Dark Hazard: Learning-based, Large-Scale Discovery of Hidden Sensitive Operations in Android Apps , 2017, NDSS.

[40]  Haoyu Wang,et al.  An Explorative Study of the Mobile App Ecosystem from App Developers' Perspective , 2017, WWW.

[41]  Xiaoyin Wang,et al.  GUILeak : Identifying Privacy Practices on GUI-Based Data , 2017 .

[42]  Jie Huang,et al.  The ART of App Compartmentalization: Compiler-based Library Privilege Separation on Stock Android , 2017, CCS.

[43]  Yi He,et al.  Vulnerable Implicit Service: A Revisit , 2017, CCS.

[44]  Timothy Libert,et al.  An Automated Approach to Auditing Disclosure of Third-Party Data Collection in Website Privacy Policies , 2018, WWW.

[45]  Narseo Vallina-Rodriguez,et al.  Bug Fixes, Improvements, ... and Privacy Leaks - A Longitudinal Study of PII Leaks Across Android App Versions , 2018, NDSS.

[46]  Narseo Vallina-Rodriguez,et al.  Apps, Trackers, Privacy, and Regulators: A Global Study of the Mobile Tracking Ecosystem , 2018, NDSS.

[47]  Norman M. Sadeh,et al.  Which Apps Have Privacy Policies? - An Analysis of Over One Million Google Play Store Apps , 2018, APF.

[48]  Toru Nakamura,et al.  I Read but Don't Agree: Privacy Policy Benchmarking using Machine Learning and the EU GDPR , 2018, WWW.

[49]  Narseo Vallina-Rodriguez,et al.  “Won’t Somebody Think of the Children?” Examining COPPA Compliance at Scale , 2018, Proc. Priv. Enhancing Technol..

[50]  Carl A. Gunter,et al.  Resolving the Predicament of Android Custom Permissions , 2018, NDSS.

[51]  Kang G. Shin,et al.  Polisis: Automated Analysis and Presentation of Privacy Policies Using Deep Learning , 2018, USENIX Security Symposium.

[52]  Yu Hu,et al.  Sensibility Testbed: Automated IRB Policy Enforcement in Mobile Research Apps , 2018, HotMobile.

[53]  Yuan Zhang,et al.  Finding Clues for Your Secrets: Semantics-Driven, Learning-Based Privacy Discovery in Mobile Apps , 2018, NDSS.

[54]  Joanne Gray,et al.  Creating in an age of algorithms: won’t somebody think of the children? , 2019 .

[55]  Ziqi Wang,et al.  Natural Language Processing for Mobile App Privacy Compliance , 2019 .