Toward Automatically Generating Privacy Policy for Android Apps

A privacy policy is a statement informing users how their information will be collected, used, and disclosed. Failing to provide a correct privacy policy may result in a fine. However, writing privacy policy is tedious and error-prone, because the author may not understand the source code well as it could have been written by others (e.g., outsourcing), or the author does not know the internal working of third-party libraries used. In this paper, we propose and develop a novel system named AutoPPG to automatically construct correct and readable descriptions to facilitate the generation of privacy policy for Android applications (i.e., apps). Given an app, AutoPPG first conducts static code analysis to characterize its behaviors related to users’ personal information, and then applies natural language processing techniques to generating correct and accessible sentences for describing these behaviors. The experimental results using real apps and crowdsourcing indicate that: 1) AutoPPG creates correct and easy-to-understand descriptions for privacy policies; 2) the privacy policies constructed by AutoPPG usually reveal more operations related to users’ personal information than existing privacy policies; and 3) most developers, who reply us, would like to use AutoPPG to facilitate them.

[1]  Paul A. Strooper,et al.  Automated Generation of Test Cases Using Model-Driven Architecture , 2007, Second International Workshop on Automation of Software Test (AST '07).

[2]  Tao Xie,et al.  Automated extraction of security policies from natural-language software documents , 2012, SIGSOFT FSE.

[3]  Yu Le,et al.  VulHunter: Toward Discovering Vulnerabilities in Android Applications , 2015, IEEE Micro.

[4]  Sanjai Rayadurgam,et al.  Coverage based test-case generation using model checkers , 2001, Proceedings. Eighth Annual IEEE International Conference and Workshop On the Engineering of Computer-Based Systems-ECBS 2001.

[5]  Mark Rowan,et al.  Encouraging privacy by design concepts with privacy policy auto-generation in eclipse (page) , 2014, ETX.

[6]  Daniel Jurafsky,et al.  Parsing to Stanford Dependencies: Trade-offs between Speed and Accuracy , 2010, LREC.

[7]  Martin Cutts Oxford Guide to Plain English , 2004 .

[8]  Ram Krishnan,et al.  Toward a Framework for Detecting Privacy Policy Violations in Android Application Code , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[9]  Zhuoqing Morley Mao,et al.  AppProfiler: a flexible method of exposing privacy-related behavior in android applications to end users , 2013, CODASPY.

[10]  T. Grance,et al.  SP 800-122. Guide to Protecting the Confidentiality of Personally Identifiable Information (PII) , 2010 .

[11]  Tao Xie,et al.  AppContext: Differentiating Malicious and Benign Mobile App Behaviors Using Context , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[12]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[13]  David A. Wagner,et al.  I've got 99 problems, but vibration ain't one: a survey of smartphone users' concerns , 2012, SPSM '12.

[14]  Xinwen Zhang,et al.  Apex: extending Android permission model and enforcement with user-defined runtime constraints , 2010, ASIACCS '10.

[15]  Noah A. Smith,et al.  Unsupervised Alignment of Privacy Policies using Hidden Markov Models , 2014, ACL.

[16]  Tao Zhang,et al.  Can We Trust the Privacy Policies of Android Apps? , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[17]  Lukasz Ziarek,et al.  Information flows as a permission mechanism , 2014, ASE.

[18]  Eric Bodden,et al.  A Machine-learning Approach for Classifying and Categorizing Android Sources and Sinks , 2014, NDSS.

[19]  Tao Zhang,et al.  AutoPPG: Towards Automatic Generation of Privacy Policy for Android Applications , 2015, SPSM@CCS.

[20]  Steven M. Bellovin,et al.  Privee: An Architecture for Automatically Analyzing Web Privacy Policies , 2014, USENIX Security Symposium.

[21]  Denilson Barbosa,et al.  Open Information Extraction with Tree Kernels , 2013, NAACL.

[22]  Norman M. Sadeh,et al.  Expectation and purpose: understanding users' mental models of mobile app privacy through crowdsourcing , 2012, UbiComp.

[23]  Angelos Stavrou,et al.  Analysis of Android Applications' Permissions , 2012, 2012 IEEE Sixth International Conference on Software Security and Reliability Companion.

[24]  Erik Derr,et al.  On Demystifying the Android Application Framework: Re-Visiting Android Permission Specification Analysis , 2016, USENIX Security Symposium.

[25]  Jerry den Hartog,et al.  A machine learning solution to assess privacy policy completeness: (short paper) , 2012, WPES '12.

[26]  Mu Zhang,et al.  Towards Automatic Generation of Security-Centric Descriptions for Android Apps , 2015, CCS.

[27]  Hao Chen,et al.  Investigating User Privacy in Android Ad Libraries , 2012 .

[28]  Alexander L. Wolf,et al.  A Case for Test-Code Generation in Model-Driven Systems , 2003, GPCE.

[29]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[30]  Martin Schäf,et al.  Joogie: Infeasible Code Detection for Java , 2012, CAV.

[31]  Lorrie Faith Cranor,et al.  A Design Space for Effective Privacy Notices , 2015, SOUPS.

[32]  Boyu Wang,et al.  String Analysis of Android Applications (N) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[33]  Zhen Huang,et al.  PScout: analyzing the Android permission specification , 2012, CCS.

[34]  Mark Harman,et al.  Strong higher order mutation-based test data generation , 2011, ESEC/FSE '11.

[35]  Daniela Yidan Miao,et al.  PrivacyInformer : an automated privacy description generator for the MIT App Inventor , 2014 .

[36]  Maria T. Pazienza,et al.  Information Extraction , 2002, Lecture Notes in Computer Science.

[37]  Laurie A. Williams,et al.  Relation extraction for inferring access control rules from natural language artifacts , 2014, ACSAC.

[38]  Steven Y. Ko,et al.  String Analysis of Android Applications , 2015 .

[39]  Annie I. Antón,et al.  Analyzing Regulatory Rules for Privacy and Security Requirements , 2008, IEEE Transactions on Software Engineering.

[40]  Noah A. Smith,et al.  A Step Towards Usable Privacy Policy: Automatic Alignment of Privacy Statements , 2014, COLING.

[41]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[42]  Ian F. Darwin Android Cookbook , 2012 .

[43]  Christopher Krügel,et al.  EdgeMiner: Automatically Detecting Implicit Control Flow Transitions through the Android Framework , 2015, NDSS.

[44]  Jacques Klein,et al.  FlowDroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android apps , 2014, PLDI.