AutoPPG: Towards Automatic Generation of Privacy Policy for Android Applications

A privacy policy is a statement informing users how their information will be collected, used, and disclosed. Failing to provide a correct privacy policy may result in a fine. However, writing privacy policy is tedious and error-prone, because the author may not well understand the source code, which could be written by others (e.g., outsourcing), or does not know the internals of third-party libraries without source codes. In this paper, we propose and develop a novel system named AutoPPG to automatically construct correct and readable descriptions to facilitate the generation of privacy policy for Android applications (i.e., apps). Given an app, AutoPPG first conducts various static code analyses to characterize its behaviors related to users' private information and then applies natural language processing techniques to generating correct and accessible sentences for describing these behaviors. The experimental results using real apps and crowdsourcing indicate that: (1) AutoPPG creates correct and easy-to-understand descriptions for privacy policies; and (2) the privacy policies constructed by AutoPPG usually reveal more operations related to users' private information than existing privacy policies.

[1]  Xinwen Zhang,et al.  Apex: extending Android permission model and enforcement with user-defined runtime constraints , 2010, ASIACCS '10.

[2]  Noah A. Smith,et al.  Unsupervised Alignment of Privacy Policies using Hidden Markov Models , 2014, ACL.

[3]  Ian F. Darwin Android Cookbook , 2012 .

[4]  Jacques Klein,et al.  IccTA: Detecting Inter-Component Privacy Leaks in Android Apps , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[5]  Ram Krishnan,et al.  Toward a Framework for Detecting Privacy Policy Violations in Android Application Code , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[6]  Mark Harman,et al.  Strong higher order mutation-based test data generation , 2011, ESEC/FSE '11.

[7]  Yu Le,et al.  VulHunter: Toward Discovering Vulnerabilities in Android Applications , 2015, IEEE Micro.

[8]  Daniel Jurafsky,et al.  Parsing to Stanford Dependencies: Trade-offs between Speed and Accuracy , 2010, LREC.

[9]  Laurie A. Williams,et al.  Relation extraction for inferring access control rules from natural language artifacts , 2014, ACSAC.

[10]  Zhuoqing Morley Mao,et al.  AppProfiler: a flexible method of exposing privacy-related behavior in android applications to end users , 2013, CODASPY.

[11]  Annie I. Antón,et al.  Analyzing Regulatory Rules for Privacy and Security Requirements , 2008, IEEE Transactions on Software Engineering.

[12]  Jacques Klein,et al.  Effective inter-component communication mapping in Android with Epicc: an essential step towards holistic security analysis , 2013 .

[13]  Denilson Barbosa,et al.  Open Information Extraction with Tree Kernels , 2013, NAACL.

[14]  David A. Wagner,et al.  I've got 99 problems, but vibration ain't one: a survey of smartphone users' concerns , 2012, SPSM '12.

[15]  Norman M. Sadeh,et al.  Expectation and purpose: understanding users' mental models of mobile app privacy through crowdsourcing , 2012, UbiComp.

[16]  Angelos Stavrou,et al.  Analysis of Android Applications' Permissions , 2012, 2012 IEEE Sixth International Conference on Software Security and Reliability Companion.

[17]  Zhen Huang,et al.  PScout: analyzing the Android permission specification , 2012, CCS.

[18]  Noah A. Smith,et al.  A Step Towards Usable Privacy Policy: Automatic Alignment of Privacy Statements , 2014, COLING.

[19]  Tao Zhang,et al.  Can We Trust the Privacy Policies of Android Apps? , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[20]  Steven M. Bellovin,et al.  Privee: An Architecture for Automatically Analyzing Web Privacy Policies , 2014, USENIX Security Symposium.

[21]  Eric Bodden,et al.  A Machine-learning Approach for Classifying and Categorizing Android Sources and Sinks , 2014, NDSS.

[22]  Lorrie Faith Cranor,et al.  A Design Space for Effective Privacy Notices , 2015, SOUPS.

[23]  Lukasz Ziarek,et al.  Information flows as a permission mechanism , 2014, ASE.

[24]  M. L. Stein,et al.  How to write plain English , 1975 .

[25]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[26]  Jerry den Hartog,et al.  A machine learning solution to assess privacy policy completeness: (short paper) , 2012, WPES '12.

[27]  Mu Zhang,et al.  Towards Automatic Generation of Security-Centric Descriptions for Android Apps , 2015, CCS.

[28]  Alexander L. Wolf,et al.  A Case for Test-Code Generation in Model-Driven Systems , 2003, GPCE.

[29]  Martin Schäf,et al.  Joogie: Infeasible Code Detection for Java , 2012, CAV.

[30]  Yiannis Aloimonos,et al.  Corpus-Guided Sentence Generation of Natural Images , 2011, EMNLP.

[31]  Tao Xie,et al.  AppContext: Differentiating Malicious and Benign Mobile App Behaviors Using Context , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[32]  Wendy G. Lehnert,et al.  Information extraction , 1996, CACM.

[33]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[34]  Mark Rowan,et al.  Encouraging privacy by design concepts with privacy policy auto-generation in eclipse (page) , 2014, ETX.

[35]  Paul A. Strooper,et al.  Automated Generation of Test Cases Using Model-Driven Architecture , 2007, Second International Workshop on Automation of Software Test (AST '07).

[36]  Tao Xie,et al.  Automated extraction of security policies from natural-language software documents , 2012, SIGSOFT FSE.

[37]  Sanjai Rayadurgam,et al.  Coverage based test-case generation using model checkers , 2001, Proceedings. Eighth Annual IEEE International Conference and Workshop On the Engineering of Computer-Based Systems-ECBS 2001.