GUILeak: Tracing Privacy Policy Claims on User Input Data for Android Applications

The Android mobile platform supports billions of devices across more than 190 countries around the world. This popularity coupled with user data collection by Android apps has made privacy protection a well-known challenge in the Android ecosystem. In practice, app producers provide privacy policies disclosing what information is collected and processed by the app. However, it is difficult to trace such claims to the corresponding app code to verify whether the implementation is consistent with the policy. Existing approaches for privacy policy alignment focus on information directly accessed through the Android platform (e.g., location and device ID), but are unable to handle user input, a major source of private information. In this paper, we propose a novel approach that automatically detects privacy leaks of user-entered data for a given Android app and determines whether such leakage may violate the app's privacy policy claims. For evaluation, we applied our approach to 120 popular apps from three privacy-relevant app categories: finance, health, and dating. The results show that our approach was able to detect 21 strong violations and 18 weak violations from the studied apps.

[1]  Martin Porter,et al.  Snowball: A language for stemming algorithms , 2001 .

[2]  Xiaofeng Wang,et al.  UIPicker: User-Input Privacy Identification in Mobile Applications , 2015, USENIX Security Symposium.

[3]  Tao Xie,et al.  TranStrL: An automatic need-to-translate string locator for software internationalization , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[4]  Travis D. Breaux,et al.  Towards an information type lexicon for privacy policies , 2015, 2015 IEEE Eighth International Workshop on Requirements Engineering and Law (RELAW).

[5]  Travis D. Breaux,et al.  Scaling requirements extraction to the crowd: Experiments with privacy policies , 2014, 2014 IEEE 22nd International Requirements Engineering Conference (RE).

[6]  Bin Liu,et al.  Automated Analysis of Privacy Requirements for Mobile Apps , 2016, NDSS.

[7]  Hareton K. N. Leung,et al.  Enhancing the Description-to-Behavior Fidelity in Android Apps with Privacy Policy , 2018, IEEE Transactions on Software Engineering.

[8]  Peng Wang,et al.  AsDroid: detecting stealthy behaviors in Android applications by user interface and program behavior contradiction , 2014, ICSE.

[9]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[10]  Jianwei Niu,et al.  Lexical Similarity of Information Type Hypernyms, Meronyms and Synonyms in Privacy Policies , 2016, AAAI Fall Symposia.

[11]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[12]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[13]  Atanas Rountev,et al.  Static Reference Analysis for GUI Objects in Android Software , 2014, CGO '14.

[14]  Tao Xie,et al.  Locating need-to-translate constant strings for software internationalization , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[15]  R. Harmon,et al.  From the Health Resources and Services Administration. , 1990, JAMA.

[16]  Simone Pribbenow,et al.  A Conceptual Theory of Part-Whole Relations and its Applications , 1996, Data Knowl. Eng..

[17]  Eric Bodden,et al.  A Machine-learning Approach for Classifying and Categorizing Android Sources and Sinks , 2014, NDSS.

[18]  Lei Xue,et al.  Toward Automatically Generating Privacy Policy for Android Apps , 2017, IEEE Transactions on Information Forensics and Security.

[19]  L. Postman,et al.  Short-term Temporal Changes in Free Recall , 1965 .

[20]  P. Krebs,et al.  Health App Use Among US Mobile Phone Owners: A National Survey , 2015, JMIR mHealth and uHealth.

[21]  Xiangyu Zhang,et al.  Detecting sensitive data disclosure via bi-directional text correlation analysis , 2016, SIGSOFT FSE.

[22]  Mu Zhang,et al.  Towards Automatic Generation of Security-Centric Descriptions for Android Apps , 2015, CCS.

[23]  W. Neuman,et al.  Social Research Methods: Qualitative and Quantitative Approaches , 2002 .

[24]  H. Bernard Research Methods in Anthropology: Qualitative and Quantitative Approaches , 1988 .

[25]  Ram Krishnan,et al.  Toward a Framework for Detecting Privacy Policy Violations in Android Application Code , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[26]  Xiangyu Zhang,et al.  SUPOR: Precise and Scalable Sensitive User Input Detection for Android Apps , 2015, USENIX Security Symposium.

[27]  U. S. Code,et al.  Gramm-Leach-Bliley Act , 1999 .

[28]  Tao Xie,et al.  Automating presentation changes in dynamic web applications via collaborative hybrid analysis , 2012, SIGSOFT FSE.

[29]  Jacques Klein,et al.  FlowDroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android apps , 2014, PLDI.

[30]  Aske Simon Christensen,et al.  Precise Analysis of String Expressions , 2003, SAS.