AUTOREB: Automatically Understanding the Review-to-Behavior Fidelity in Android Applications

Along with the increasing popularity of mobile devices, there exist severe security and privacy concerns for mobile apps. On Google Play, user reviews provide a unique understanding of security/privacy issues of mobile apps from users' perspective, and in fact they are valuable feedbacks from users by considering users' expectations. To best assist the end users, in this paper, we automatically learn the security/privacy related behaviors inferred from analysis on user reviews, which we call review-to-behavior fidelity. We design the system AUTOREB that automatically assesses the review-to-behavior fidelity of mobile apps. AUTOREB employs the state-of-the-art machine learning techniques to infer the relations between users' reviews and four categories of security-related behaviors. Moreover, it uses a crowdsourcing approach to automatically aggregate the security issues from review-level to app-level. To our knowledge, AUTOREB is the first work that explores the user review information and utilizes the review semantics to predict the risky behaviors at both review-level and app-level. We crawled a real-world dataset of 2,614,186 users, 12,783 apps and 13,129,783 reviews from Google play, and use it to comprehensively evaluate AUTOREB. The experiment result shows that our method can predict the mobile app behaviors at user-review level with accuracy as high as 94.05%, and also it can predict the security issues at app-level by aggregating the predictions at review-level. Our research offers an insight into understanding the mobile app security concerns from users' perspective, and helps bridge the gap between the security issues and users' perception.

[1]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[2]  Christopher Krügel,et al.  FORECAST: skimming off the malware cream , 2011, ACSAC '11.

[3]  Seungyeop Han,et al.  These aren't the droids you're looking for: retrofitting android to protect data from imperious applications , 2011, CCS '11.

[4]  Salvatore J. Stolfo,et al.  Data Mining Approaches for Intrusion Detection , 1998, USENIX Security Symposium.

[5]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[6]  Xuxian Jiang,et al.  Unsafe exposure analysis of mobile in-app advertisements , 2012, WISEC '12.

[7]  Bing Liu,et al.  Sentiment Analysis and Subjectivity , 2010, Handbook of Natural Language Processing.

[8]  David A. Wagner,et al.  I've got 99 problems, but vibration ain't one: a survey of smartphone users' concerns , 2012, SPSM '12.

[9]  Benjamin Livshits,et al.  Program Boosting , 2015, POPL.

[10]  Sencun Zhu,et al.  ViewDroid: towards obfuscation-resilient mobile application repackaging detection , 2014, WiSec '14.

[11]  Peng Ning,et al.  EASEAndroid: Automatic Policy Analysis and Refinement for Security Enhanced Android via Large-Scale Semi-Supervised Learning , 2015, USENIX Security Symposium.

[12]  Steve Hanna,et al.  Android permissions demystified , 2011, CCS '11.

[13]  Guanhua Yan,et al.  Discriminant malware distance learning on structural information for automated malware classification , 2013, SIGMETRICS.

[14]  David A. Wagner,et al.  Android permissions: user attention, comprehension, and behavior , 2012, SOUPS.

[15]  Fan Long,et al.  An analysis of patch plausibility and correctness for generate-and-validate patch generation systems , 2015, ISSTA.

[16]  Benjamin Livshits,et al.  Automatic Mediation of Privacy-Sensitive Resource Access in Smartphone Applications , 2013, USENIX Security Symposium.

[17]  Tao Xie,et al.  Identifying security bug reports via text mining: An industrial case study , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[18]  Bin Liu,et al.  Investigating Effects of Control and Ads Awareness on Android Users' Privacy Behaviors and Perceptions , 2015, MobileHCI.

[19]  Tao Xie,et al.  Inferring method specifications from natural language API descriptions , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[20]  Laurie A. Williams,et al.  Relation extraction for inferring access control rules from natural language artifacts , 2014, ACSAC.

[21]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[22]  Heng Yin,et al.  DroidScope: Seamlessly Reconstructing the OS and Dalvik Semantic Views for Dynamic Android Malware Analysis , 2012, USENIX Security Symposium.

[23]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[24]  Xiaofeng Wang,et al.  UIPicker: User-Input Privacy Identification in Mobile Applications , 2015, USENIX Security Symposium.

[25]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[26]  Yajin Zhou,et al.  RiskRanker: scalable and accurate zero-day android malware detection , 2012, MobiSys '12.

[27]  Tao Xie,et al.  WHYPER: Towards Automating Risk Assessment of Mobile Applications , 2013, USENIX Security Symposium.

[28]  Kai Chen,et al.  Towards Discovering and Understanding Unexpected Hazards in Tailoring Antivirus Software for Android , 2015, AsiaCCS.

[29]  Zoltán Gyöngyi,et al.  Social annotations in web search , 2012, CHI.

[30]  Norman M. Sadeh,et al.  Expectation and purpose: understanding users' mental models of mobile app privacy through crowdsourcing , 2012, UbiComp.

[31]  Benjamin Livshits,et al.  DynaMine: finding common error patterns by mining software revision histories , 2005, ESEC/FSE-13.

[32]  Andreas Dewald,et al.  Cujo: efficient detection and prevention of drive-by-download attacks , 2010, ACSAC '10.

[33]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[34]  Swarat Chaudhuri,et al.  A Study of Android Application Security , 2011, USENIX Security Symposium.

[35]  Hao Chen,et al.  AndroidLeaks: Automatically Detecting Potential Privacy Leaks in Android Applications on a Large Scale , 2012, TRUST.

[36]  Ed H. Chi,et al.  With a little help from my friends: examining the impact of social annotations in sensemaking tasks , 2009, CHI.

[37]  Yajin Zhou,et al.  Hey, You, Get Off of My Market: Detecting Malicious Apps in Official and Alternative Android Markets , 2012, NDSS.

[38]  Yuanyuan Zhou,et al.  /*icomment: bugs or bad comments?*/ , 2007, SOSP.

[39]  Patrick Traynor,et al.  MAST: triage for market-scale mobile malware analysis , 2013, WiSec '13.

[40]  Zhong Chen,et al.  AutoCog: Measuring the Description-to-permission Fidelity in Android Applications , 2014, CCS.

[41]  W. Bruce Croft,et al.  Quary Expansion Using Local and Global Document Analysis , 1996, SIGIR Forum.

[42]  Byung-Gon Chun,et al.  TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoring on Smartphones , 2010, OSDI.

[43]  Lawrence K. Saul,et al.  Beyond blacklists: learning to detect malicious web sites from suspicious URLs , 2009, KDD.

[44]  Hongxia Jin,et al.  Towards Permission Request Prediction on Mobile Apps via Structure Feature Learning , 2015, SDM.

[45]  Nathan Srebro,et al.  Learning Optimally Sparse Support Vector Machines , 2013, ICML.

[46]  Guanhua Yan,et al.  Exploring Discriminatory Features for Automated Malware Classification , 2013, DIMVA.

[47]  Philip S. Yu,et al.  GPLAG: detection of software plagiarism by program dependence graph analysis , 2006, KDD '06.

[48]  Mu Zhang,et al.  Semantics-Aware Android Malware Classification Using Weighted Contextual API Dependency Graphs , 2014, CCS.

[49]  Christos Faloutsos,et al.  Why people hate your app: making sense of user feedback in a mobile app store , 2013, KDD.

[50]  James Newsome,et al.  Polygraph: automatically generating signatures for polymorphic worms , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).