An Empirical Evaluation of GDPR Compliance Violations in Android mHealth Apps

The purpose of the General Data Protection Regulation (GDPR) is to provide improved privacy protection. If an app controls personal data from users, it needs to be compliant with GDPR. However, GDPR lists general rules rather than exact step-by-step guidelines about how to develop an app that fulfills the requirements. Therefore, there may exist GDPR compliance violations in existing apps, which would pose severe privacy threats to app users. In this paper, we take mobile health applications (mHealth apps) as a peephole to examine the status quo of GDPR compliance in Android apps. We first propose an automated system, named HPDROID, to bridge the semantic gap between the general rules of GDPR and the app implementations by identifying the data practices declared in the app privacy policy and the data relevant behaviors in the app code. Then, based on HPDROID, we detect three kinds of GDPR compliance violations, including the incompleteness of privacy policy, the inconsistency of data collections, and the insecurity of data transmission. We perform an empirical evaluation of 796 mHealth apps. The results reveal that 189 (23.7%) of them do not provide complete privacy policies. Moreover, 59 apps collect sensitive data through different measures, but 46 (77.9%) of them contain at least one inconsistent collection behavior. Even worse, among the 59 apps, only 8 apps try to ensure the transmission security of collected data. However, all of them contain at least one encryption or SSL misuse. Our work exposes severe privacy issues to raise awareness of privacy protection for app users and developers.

[1]  Guido Governatori,et al.  Modelling Legal Knowledge for GDPR Compliance Checking , 2018, JURIX.

[2]  Tao Xie,et al.  Automated extraction of security policies from natural-language software documents , 2012, SIGSOFT FSE.

[3]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[4]  Qinghua Zheng,et al.  Graph Embedding Based Familial Analysis of Android Malware using Unsupervised Learning , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[5]  Bin Liu,et al.  Automated Analysis of Privacy Requirements for Mobile Apps , 2016, NDSS.

[6]  Peng Wang,et al.  AsDroid: detecting stealthy behaviors in Android applications by user interface and program behavior contradiction , 2014, ICSE.

[7]  Liliana Pasquale,et al.  The Grace Period Has Ended: An Approach to Operationalize GDPR Requirements , 2018, 2018 IEEE 26th International Requirements Engineering Conference (RE).

[8]  Yan Wang,et al.  Static Window Transition Graphs for Android (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[9]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[10]  Kang G. Shin,et al.  Polisis: Automated Analysis and Presentation of Privacy Policies Using Deep Learning , 2018, USENIX Security Symposium.

[11]  William K. Robertson,et al.  Hidden GEMs: Automated Discovery of Access Control Vulnerabilities in Graphical User Interfaces , 2014, 2014 IEEE Symposium on Security and Privacy.

[12]  David Brumley,et al.  An empirical study of cryptographic misuse in android applications , 2013, CCS.

[13]  Iulian Neamtiu,et al.  Targeted and depth-first exploration for systematic testing of android apps , 2013, OOPSLA.

[14]  Xue Qin,et al.  GUILeak: Tracing Privacy Policy Claims on User Input Data for Android Applications , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[15]  Bernd Freisleben,et al.  Why eve and mallory love android: an analysis of android SSL (in)security , 2012, CCS.

[16]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[17]  Mehrdad Sabetzadeh,et al.  Using Models to Enable Compliance Checking Against the GDPR: An Experience Report , 2019, 2019 ACM/IEEE 22nd International Conference on Model Driven Engineering Languages and Systems (MODELS).

[18]  Xiangyu Zhang,et al.  SUPOR: Precise and Scalable Sensitive User Input Detection for Android Apps , 2015, USENIX Security Symposium.

[19]  Travis D. Breaux,et al.  An Evaluation of Constituency-Based Hyponymy Extraction from Privacy Policies , 2017, 2017 IEEE 25th International Requirements Engineering Conference (RE).

[20]  Qinghua Zheng,et al.  Android Malware Familial Classification and Representative Sample Selection via Frequent Subgraph Analysis , 2018, IEEE Transactions on Information Forensics and Security.

[21]  Travis D. Breaux,et al.  Mining Privacy Goals from Privacy Policies Using Hybridized Task Recomposition , 2016, ACM Trans. Softw. Eng. Methodol..

[22]  Chao Yang,et al.  Who is peeping at your passwords at Starbucks? — To catch an evil twin access point , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).

[23]  Collin Jackson,et al.  Forcehttps: protecting high-security web sites from network attacks , 2008, WWW.

[24]  Laurie A. Williams,et al.  How Good Is a Security Policy against Real Breaches? A HIPAA Case Study , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[25]  Dengfeng Li,et al.  UiRef: analysis of sensitive user inputs in Android applications , 2017, WISEC.

[26]  Xiaofeng Wang,et al.  UIPicker: User-Input Privacy Identification in Mobile Applications , 2015, USENIX Security Symposium.

[27]  E Moss,et al.  The National Health Data Dictionary , 1994, Health information management : journal of the Health Information Management Association of Australia.

[28]  Jun Liu,et al.  CTDroid: Leveraging a Corpus of Technical Blogs for Android Malware Analysis , 2020, IEEE Transactions on Reliability.

[29]  Laurie Hendren,et al.  Soot: a Java bytecode optimization framework , 2010, CASCON.

[30]  Murat Kantarcioglu,et al.  CryptoGuard: High Precision Detection of Cryptographic Vulnerabilities in Massive-sized Java Projects , 2018, CCS.

[31]  Rong Jin,et al.  Understanding bag-of-words model: a statistical framework , 2010, Int. J. Mach. Learn. Cybern..

[32]  Eric Bodden,et al.  A Machine-learning Approach for Classifying and Categorizing Android Sources and Sinks , 2014, NDSS.

[33]  Jacques Klein,et al.  IccTA: Detecting Inter-Component Privacy Leaks in Android Apps , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[34]  Christopher Krügel,et al.  EdgeMiner: Automatically Detecting Implicit Control Flow Transitions through the Android Framework , 2015, NDSS.

[35]  Clare-Marie Karat,et al.  An empirical study of natural language parsing of privacy policy rules using the SPARCLE policy workbench , 2006, SOUPS '06.

[36]  Jacques Klein,et al.  FlowDroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android apps , 2014, PLDI.

[37]  Aristide Fattori,et al.  CopperDroid: Automatic Reconstruction of Android Malware Behaviors , 2015, NDSS.

[38]  Tao Zhang,et al.  Can We Trust the Privacy Policies of Android Apps? , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[39]  J. Murphy The General Data Protection Regulation (GDPR) , 2018, Irish medical journal.

[40]  Qinghua Zheng,et al.  Frequent Subgraph Based Familial Classification of Android Malware , 2016, 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE).

[41]  Ming Fan,et al.  DAPASA: Detecting Android Piggybacked Apps Through Sensitive Subgraph Analysis , 2017, IEEE Transactions on Information Forensics and Security.

[42]  Ram Krishnan,et al.  Toward a Framework for Detecting Privacy Policy Violations in Android Application Code , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[43]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[44]  Tao Xie,et al.  PolicyLint: Investigating Internal Privacy Policy Contradictions on Google Play , 2019, USENIX Security Symposium.

[45]  Michael Backes,et al.  A Stitch in Time: Supporting Android Developers in WritingSecure Code , 2017, CCS.

[46]  Yu Le,et al.  VulHunter: Toward Discovering Vulnerabilities in Android Applications , 2015, IEEE Micro.

[47]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[48]  Yajin Zhou,et al.  Malton: Towards On-Device Non-Invasive Mobile Malware Analysis for ART , 2017, USENIX Security Symposium.

[49]  Lionel Briand,et al.  An AI-assisted Approach for Checking the Completeness of Privacy Policies Against GDPR , 2020, 2020 IEEE 28th International Requirements Engineering Conference (RE).

[50]  Raimundas Matulevicius,et al.  Conceptual Representation of the GDPR: Model and Application Directions , 2018, BIR.

[51]  Jerry den Hartog,et al.  What Websites Know About You , 2012, DPM/SETOP.

[52]  Travis D. Breaux,et al.  A Theory of Vagueness and Privacy Risk Perception , 2016, 2016 IEEE 24th International Requirements Engineering Conference (RE).

[53]  Nick Feamster,et al.  Cleartext Data Transmissions in Consumer IoT Medical Devices , 2017, IoT S&P@CCS.