Characterizing Android apps' behavior for effective detection of malapps at large scale

Abstract Android malicious applications (malapps) have surged and been sophisticated, posing a great threat to users. How to characterize, understand and detect Android malapps at a large scale is thus a big challenge. In this work, we are motivated to discover the discriminatory and persistent features extracted from Android APK files for automated malapp detection at a large scale. To achieve this goal, firstly we extract a very large number of features from each app and categorize the features into two groups, namely, app-specific features as well as platform-defined features. These feature sets will then be fed into four classifiers (i.e., Logistic Regression, linear SVM, Decision Tree and Random Forest) for the detection of malapps. Secondly, we evaluate the persistence of app-specific and platform-defined features on classification performance with two data sets collected in different time periods. Thirdly, we comprehensively analyze the relevant features selected by Logistic Regression classifier to identify the contributions of each feature set. We conduct extensive experiments on large real-world app sets consisting of 213,256 benign apps collected from six app markets, 4,363 benign apps from Google Play market, and 18,363 malapps. The experimental results and our analysis give insights regarding what discriminatory features are most effective to characterize malapps for building an effective and efficient malapp detection system. With the selected discriminatory features, the Logistic Regression classifier yields the best true positive rate as 96% with a false positive rate as 0.06%.

[1]  Byung-Gon Chun,et al.  TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoring on Smartphones , 2010, OSDI.

[2]  Tao Xie,et al.  WHYPER: Towards Automating Risk Assessment of Mobile Applications , 2013, USENIX Security Symposium.

[3]  Alessandra Gorla,et al.  Checking app behavior against app descriptions , 2014, ICSE.

[4]  Konrad Rieck,et al.  Structural detection of android malware using embedded call graphs , 2013, AISec.

[5]  Xiangliang Zhang,et al.  Detecting Android malicious apps and categorizing benign apps with ensemble of classifiers , 2018, Future Gener. Comput. Syst..

[6]  Aristide Fattori,et al.  CopperDroid: Automatic Reconstruction of Android Malware Behaviors , 2015, NDSS.

[7]  Sankardas Roy,et al.  Amandroid: A Precise and General Inter-component Data Flow Analysis Framework for Security Vetting of Android Apps , 2014, CCS.

[8]  Zhi Xu,et al.  TapLogger: inferring user inputs on smartphone touchscreens using on-board motion sensors , 2012, WISEC '12.

[9]  Konrad Rieck,et al.  DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket , 2014, NDSS.

[10]  Juan E. Tapiador,et al.  Dendroid: A text mining approach to analyzing and classifying code structures in Android malware families , 2014, Expert Syst. Appl..

[11]  Gabi Nakibly,et al.  Gyrophone: Recognizing Speech from Gyroscope Signals , 2014, USENIX Security Symposium.

[12]  Dawn Xiaodong Song,et al.  NetworkProfiler: Towards automatic fingerprinting of Android apps , 2013, 2013 Proceedings IEEE INFOCOM.

[13]  Eric Bodden,et al.  A Machine-learning Approach for Classifying and Categorizing Android Sources and Sinks , 2014, NDSS.

[14]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[15]  Ahmad-Reza Sadeghi,et al.  XManDroid: A New Android Evolution to Mitigate Privilege Escalation Attacks , 2011 .

[16]  Yajin Zhou,et al.  Hey, You, Get Off of My Market: Detecting Malicious Apps in Official and Alternative Android Markets , 2012, NDSS.

[17]  Sencun Zhu,et al.  Alde: Privacy Risk Analysis of Analytics Libraries in the Android Ecosystem , 2016, SecureComm.

[18]  Paul C. van Oorschot,et al.  A methodology for empirical analysis of permission-based security models and its application to android , 2010, CCS '10.

[19]  Pern Hui Chia,et al.  Is this app safe?: a large scale study on application permissions and risk signals , 2012, WWW.

[20]  Stephen Smalley,et al.  Security Enhanced (SE) Android: Bringing Flexible MAC to Android , 2013, NDSS.

[21]  Patrick D. McDaniel,et al.  On lightweight mobile phone application certification , 2009, CCS.

[22]  Ninghui Li,et al.  Android permissions: a perspective combining risks and benefits , 2012, SACMAT '12.

[23]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[24]  Simin Nadjm-Tehrani,et al.  Crowdroid: behavior-based malware detection system for Android , 2011, SPSM '11.

[25]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[26]  Heng Yin,et al.  DroidAPIMiner: Mining API-Level Features for Robust Malware Detection in Android , 2013, SecureComm.

[27]  Yuan Zhang,et al.  AppIntent: analyzing sensitive data transmission in android for privacy leakage detection , 2013, CCS.

[28]  Xinwen Zhang,et al.  Apex: extending Android permission model and enforcement with user-defined runtime constraints , 2010, ASIACCS '10.

[29]  Jacques Klein,et al.  IccTA: Detecting Inter-Component Privacy Leaks in Android Apps , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[30]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[31]  Jacques Klein,et al.  FlowDroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android apps , 2014, PLDI.

[32]  David A. Wagner,et al.  Android permissions: user attention, comprehension, and behavior , 2012, SOUPS.

[33]  Zhuoqing Morley Mao,et al.  AppProfiler: a flexible method of exposing privacy-related behavior in android applications to end users , 2013, CODASPY.

[34]  Wenke Lee,et al.  CHEX: statically vetting Android apps for component hijacking vulnerabilities , 2012, CCS.

[35]  Gianluca Dini,et al.  MADAM: Effective and Efficient Behavior-based Android Malware Detection and Prevention , 2018, IEEE Transactions on Dependable and Secure Computing.

[36]  Patrick Traynor,et al.  MAST: triage for market-scale mobile malware analysis , 2013, WiSec '13.

[37]  Hahn-Ming Lee,et al.  DroidMat: Android Malware Detection through Manifest and API Calls Tracing , 2012, 2012 Seventh Asia Joint Conference on Information Security.

[38]  Yuval Elovici,et al.  “Andromaly”: a behavioral malware detection framework for android devices , 2012, Journal of Intelligent Information Systems.

[39]  Xingquan Zhu,et al.  Machine Learning for Android Malware Detection Using Permission and API Calls , 2013, 2013 IEEE 25th International Conference on Tools with Artificial Intelligence.

[40]  Ninghui Li,et al.  Using probabilistic generative models for ranking risks of Android apps , 2012, CCS.

[41]  Xiangliang Zhang,et al.  Exploring Permission-Induced Risk in Android Applications for Malicious Application Detection , 2014, IEEE Transactions on Information Forensics and Security.

[42]  Yajin Zhou,et al.  Dissecting Android Malware: Characterization and Evolution , 2012, 2012 IEEE Symposium on Security and Privacy.

[43]  Mark J. Carlotto,et al.  Effect of errors in ground truth on classification accuracy , 2009 .

[44]  Gianluca Dini,et al.  MADAM: A Multi-level Anomaly Detector for Android Malware , 2012, MMM-ACNS.

[45]  Helen J. Wang,et al.  Permission Re-Delegation: Attacks and Defenses , 2011, USENIX Security Symposium.

[46]  Zhen Huang,et al.  PScout: analyzing the Android permission specification , 2012, CCS.

[47]  Xing Wang,et al.  Anomadroid: Profiling Android Applications' Behaviors for Identifying Unknown Malapps , 2016, 2016 IEEE Trustcom/BigDataSE/ISPA.

[48]  Patrick D. McDaniel,et al.  Semantically rich application-centric security in Android , 2012 .

[49]  Ole Tange,et al.  GNU Parallel: The Command-Line Power Tool , 2011, login Usenix Mag..

[50]  Eric Medvet,et al.  Effectiveness of Opcode ngrams for Detection of Multi Family Android Malware , 2015, 2015 10th International Conference on Availability, Reliability and Security.

[51]  Dawn Xiaodong Song,et al.  Mining Permission Request Patterns from Android and Facebook Applications , 2012, 2012 IEEE 12th International Conference on Data Mining.

[52]  Xiangliang Zhang,et al.  Discovering and understanding android sensor usage behaviors with data flow analysis , 2017, World Wide Web.