Mining Android Apps for Anomalies

Abstract How do we know a program does what it claims to do? Our CHABADA prototype can cluster Android™ apps by their description topics and identify outliers in each cluster with respect to their API usage. A “weather” app that sends messages thus becomes an anomaly; likewise, a “messaging” app would typically not be expected to access the current location and would also be identified. In this paper we present a new approach for anomaly detection that improves the classification results of our original CHABADA paper [ 1 ]. Applied on a set of 22,500+ Android applications, our CHABADA prototype can now predict 74% of novel malware and as such, without requiring any known malware patterns, maintains a false positive rate close to 10%.

[1]  Hans-Peter Kriegel,et al.  Interpreting and Unifying Outlier Scores , 2011, SDM.

[2]  Steve Hanna,et al.  Android permissions demystified , 2011, CCS '11.

[3]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[4]  Malik Yousef,et al.  One-Class SVMs for Document Classification , 2002, J. Mach. Learn. Res..

[5]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[6]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[7]  Yuanyuan Zhang,et al.  App store mining and analysis: MSR for app stores , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[8]  Alessandra Gorla,et al.  Checking app behavior against app descriptions , 2014, ICSE.

[9]  Byung-Gon Chun,et al.  TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoring on Smartphones , 2010, OSDI.

[10]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[11]  Tao Xie,et al.  WHYPER: Towards Automating Risk Assessment of Mobile Applications , 2013, USENIX Security Symposium.

[12]  Stephen D. Bay,et al.  Mining distance-based outliers in near linear time with randomization and a simple pruning rule , 2003, KDD '03.

[13]  Tao Xie,et al.  A Grey-Box Approach for Automated GUI-Model Generation of Mobile Applications , 2013, FASE.

[14]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[15]  Trevor F. Cox,et al.  Metric multidimensional scaling , 2000 .

[16]  David M. J. Tax,et al.  Kernel Whitening for One-Class Classification , 2003, Int. J. Pattern Recognit. Artif. Intell..

[17]  Bernhard Schölkopf,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[18]  Iulian Neamtiu,et al.  Automating GUI testing for Android applications , 2011, AST '11.

[19]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[20]  Peng Wang,et al.  AsDroid: detecting stealthy behaviors in Android applications by user interface and program behavior contradiction , 2014, ICSE.

[21]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[22]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[23]  Yuanyuan Zhou,et al.  /*icomment: bugs or bad comments?*/ , 2007, SOSP.

[24]  Norman M. Sadeh,et al.  Expectation and purpose: understanding users' mental models of mobile app privacy through crowdsourcing , 2012, UbiComp.

[25]  Salvatore J. Stolfo,et al.  One Class Support Vector Machines for Detecting Anomalous Windows Registry Accesses , 2003 .

[26]  Mayur Naik,et al.  Dynodroid: an input generation system for Android apps , 2013, ESEC/FSE 2013.

[27]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[28]  Tao Xie,et al.  Inferring method specifications from natural language API descriptions , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[29]  Yajin Zhou,et al.  Dissecting Android Malware: Characterization and Evolution , 2012, 2012 IEEE Symposium on Security and Privacy.

[30]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[31]  Michalis Faloutsos,et al.  ProfileDroid: multi-layer profiling of android applications , 2012, Mobicom '12.

[32]  Mira Mezini,et al.  Taming reflection: Aiding static analysis in the presence of reflection and custom class loaders , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[33]  Einar W. Høst,et al.  Debugging Method Names , 2009, ECOOP.

[34]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[35]  Alessandro Verri,et al.  Pattern Recognition with Support Vector Machines , 2002, Lecture Notes in Computer Science.

[36]  Porfirio Tramontana,et al.  Using GUI ripping for automated testing of Android applications , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[37]  Clara Pizzuti,et al.  Fast Outlier Detection in High Dimensional Spaces , 2002, PKDD.