A Study of Grayware on Google Play

While there have been various studies identifying and classifying Android malware, there is limited discussion of the broader class of apps that fall in a gray area. Mobile grayware is distinct from PC grayware due to differences in operating system properties. Due to mobile grayware's subjective nature, it is difficult to identify mobile grayware via program analysis alone. Instead, we hypothesize enhancing analysis with text analytics can effectively reduce human effort when triaging grayware. In this paper, we design and implement heuristics for seven main categories of grayware. We then use these heuristics to simulate grayware triage on a large set of apps from Google Play. We then present the results of our empirical study, demonstrating a clear problem of grayware. In doing so, we show how even relatively simple heuristics can quickly triage apps that take advantage of users in an undesirable way.

[1]  Xuxian Jiang,et al.  Unsafe exposure analysis of mobile in-app advertisements , 2012, WISEC '12.

[2]  Stefano Zanero,et al.  HelDroid: Dissecting and Detecting Mobile Ransomware , 2015, RAID.

[3]  Jason Nieh,et al.  A measurement study of google play , 2014, SIGMETRICS '14.

[4]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Yajin Zhou,et al.  Dissecting Android Malware: Characterization and Evolution , 2012, 2012 IEEE Symposium on Security and Privacy.

[6]  Yajin Zhou,et al.  Hey, You, Get Off of My Market: Detecting Malicious Apps in Official and Alternative Android Markets , 2012, NDSS.

[7]  Jacques Klein,et al.  FlowDroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android apps , 2014, PLDI.

[8]  Hao Chen,et al.  Investigating User Privacy in Android Ad Libraries , 2012 .

[9]  Patrick Traynor,et al.  MAST: triage for market-scale mobile malware analysis , 2013, WiSec '13.

[10]  Zhong Chen,et al.  AutoCog: Measuring the Description-to-permission Fidelity in Android Applications , 2014, CCS.

[11]  Martin Porter,et al.  Snowball: A language for stemming algorithms , 2001 .

[12]  Byung-Gon Chun,et al.  TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoring on Smartphones , 2010, OSDI.

[13]  Fadi Mohsen,et al.  Android keylogging threat , 2013, 9th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing.

[14]  Simin Nadjm-Tehrani,et al.  Crowdroid: behavior-based malware detection system for Android , 2011, SPSM '11.

[15]  Hao Chen,et al.  AndroidLeaks: Automatically Detecting Potential Privacy Leaks in Android Applications on a Large Scale , 2012, TRUST.

[16]  Yuan Zhang,et al.  Vetting undesirable behaviors in android apps with permission use analysis , 2013, CCS.

[17]  Tao Xie,et al.  WHYPER: Towards Automating Risk Assessment of Mobile Applications , 2013, USENIX Security Symposium.

[18]  Julia Rubin,et al.  A Bayesian Approach to Privacy Enforcement in Smartphones , 2014, USENIX Security Symposium.

[19]  Steve Hanna,et al.  A survey of mobile malware in the wild , 2011, SPSM '11.

[20]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[21]  Christopher Krügel,et al.  PiOS: Detecting Privacy Leaks in iOS Applications , 2011, NDSS.

[22]  Steve Hanna,et al.  Juxtapp: A Scalable System for Detecting Code Reuse among Android Applications , 2012, DIMVA.

[23]  Yajin Zhou,et al.  Detecting repackaged smartphone applications in third-party android marketplaces , 2012, CODASPY '12.

[24]  Yuan Zhang,et al.  Evaluating Grayware Characteristics and Risks , 2011, J. Comput. Networks Commun..

[25]  Alessandra Gorla,et al.  Checking app behavior against app descriptions , 2014, ICSE.

[26]  Yajin Zhou,et al.  RiskRanker: scalable and accurate zero-day android malware detection , 2012, MobiSys '12.

[27]  Ahmed E. Hassan,et al.  Impact of Ad Libraries on Ratings of Android Mobile Apps , 2014, IEEE Software.

[28]  Jesse D. Kornblum Identifying almost identical files using context triggered piecewise hashing , 2006, Digit. Investig..

[29]  Wouter Joosen,et al.  Parking Sensors: Analyzing and Detecting Parked Domains , 2015, NDSS.

[30]  Wenke Lee,et al.  The Core of the Matter: Analyzing Malicious Traffic in Cellular Carriers , 2013, NDSS.