Detecting Android malicious apps and categorizing benign apps with ensemble of classifiers

Android platform has dominated the markets of smart mobile devices in recent years. The number of Android applications (apps) has seen a massive surge. Unsurprisingly, Android platform has also become the primary target of attackers. The management of the explosively expansive app markets has thus become an important issue. On the one hand, it requires effectively detecting malicious applications (malapps) in order to keep the malapps out of the app market. On the other hand, it needs to automatically categorize a big number of benign apps so as to ease the management, such as correcting an apps category falsely designated by the app developer. In this work, we propose a framework to effectively and efficiently manage a big app market in terms of detecting malapps and categorizing benign apps. We extract 11 types of static features from each app to characterize the behaviors of the app, and employ the ensemble of multiple classifiers, namely, Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Naive Bayes (NB), Classification and Regression Tree (CART) and Random Forest (RF), to detect malapps and to categorize benign apps. An alarm will be triggered if an app is identified as malicious. Otherwise, the benign app will be identified as a specific category. We evaluate the framework on a large app set consisting of 107,327 benign apps as well as 8,701 malapps. The experimental results show that our method achieves the accuracy of 99.39% in the detection of malapps and achieves the best accuracy of 82.93% in the categorization of benign apps. First work to provide a complete solution for automated categorization of apps.Extract 23,74,340 features from each APK file.Use ensemble of multiple classifiers to improve the detection accuracy.Use large data sets containing 107,327 benign apps and 8701 malapps for testing.Reach detection accuracy as 99.39% and categorization accuracy as 82.93%.

[1]  Xiangliang Zhang,et al.  Autonomic intrusion detection: Adaptively detecting anomalies over unlabeled audit data streams in computer networks , 2014, Knowl. Based Syst..

[2]  Elisa Bertino,et al.  Proceedings of the third ACM conference on Data and application security and privacy , 2013, CODASPY 2013.

[3]  Seungyeop Han,et al.  These aren't the droids you're looking for: retrofitting android to protect data from imperious applications , 2011, CCS '11.

[4]  Ninghui Li,et al.  Android permissions: a perspective combining risks and benefits , 2012, SACMAT '12.

[5]  Tao Xie,et al.  WHYPER: Towards Automating Risk Assessment of Mobile Applications , 2013, USENIX Security Symposium.

[6]  David A. Wagner,et al.  The Effectiveness of Application Permissions , 2011, WebApps.

[7]  Byung-Gon Chun,et al.  TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoring on Smartphones , 2010, OSDI.

[8]  Michèle Sebag,et al.  Data Stream Clustering With Affinity Propagation , 2014, IEEE Transactions on Knowledge and Data Engineering.

[9]  Xiangliang Zhang,et al.  Processing of massive audit data streams for real-time anomaly intrusion detection , 2008, Comput. Commun..

[10]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[11]  Babu M. Mehtre,et al.  Static Malware Analysis Using Machine Learning Methods , 2014, SNDS.

[12]  Christian Platzer,et al.  MARVIN: Efficient and Comprehensive Mobile App Classification through Static and Dynamic Analysis , 2015, 2015 IEEE 39th Annual Computer Software and Applications Conference.

[13]  Yuval Elovici,et al.  Automated Static Code Analysis for Classifying Android Applications Using Machine Learning , 2010, 2010 International Conference on Computational Intelligence and Security.

[14]  Xiangliang Zhang,et al.  Securing Recommender Systems Against Shilling Attacks Using Social-Based Clustering , 2013, Journal of Computer Science and Technology.

[15]  Xiangliang Zhang,et al.  Constructing attribute weights from computer audit data for effective intrusion detection , 2009, J. Syst. Softw..

[16]  Jens Myrup Pedersen,et al.  Analysis of Malware behavior: Type classification using machine learning , 2015, 2015 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA).

[17]  Ludovic Apvrille,et al.  Identifying Unknown Android Malware with Feature Extractions and Classification Techniques , 2015, 2015 IEEE Trustcom/BigDataSE/ISPA.

[18]  William Enck,et al.  AppsPlayground: automatic security analysis of smartphone applications , 2013, CODASPY '13.

[19]  T. Bayes An essay towards solving a problem in the doctrine of chances , 2003 .

[20]  Xiangliang Zhang,et al.  Abstracting massive data for lightweight intrusion detection in computer networks , 2016, Inf. Sci..

[21]  Daniele Sgandurra,et al.  A Survey on Security for Mobile Devices , 2013, IEEE Communications Surveys & Tutorials.

[22]  Wenke Lee,et al.  CHEX: statically vetting Android apps for component hijacking vulnerabilities , 2012, CCS.

[23]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[24]  Dawn Xiaodong Song,et al.  Mining Permission Request Patterns from Android and Facebook Applications , 2012, 2012 IEEE 12th International Conference on Data Mining.

[25]  Xiangliang Zhang,et al.  Profiling program behavior for anomaly intrusion detection based on the transition and frequency property of computer audit data , 2006, Comput. Secur..

[26]  Jiqiang Liu,et al.  Exploring sensor usage behaviors of Android applications based on data flow analysis , 2015, 2015 IEEE 34th International Performance Computing and Communications Conference (IPCCC).

[27]  Yajin Zhou,et al.  RiskRanker: scalable and accurate zero-day android malware detection , 2012, MobiSys '12.

[28]  Mohammed Atiquzzaman,et al.  Behavioral malware detection approaches for Android , 2016, 2016 IEEE International Conference on Communications (ICC).

[29]  Pern Hui Chia,et al.  Is this app safe?: a large scale study on application permissions and risk signals , 2012, WWW.

[30]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Small Sample Performance , 1952 .

[31]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[32]  Xiangliang Zhang,et al.  Fast intrusion detection based on a non-negative matrix factorization model , 2009, J. Netw. Comput. Appl..

[33]  Paul C. van Oorschot,et al.  A methodology for empirical analysis of permission-based security models and its application to android , 2010, CCS '10.

[34]  Yajin Zhou,et al.  Hey, You, Get Off of My Market: Detecting Malicious Apps in Official and Alternative Android Markets , 2012, NDSS.

[35]  Konrad Rieck,et al.  DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket , 2014, NDSS.

[36]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[37]  Xingquan Zhu,et al.  Machine Learning for Android Malware Detection Using Permission and API Calls , 2013, 2013 IEEE 25th International Conference on Tools with Artificial Intelligence.

[38]  Xing Wang,et al.  Anomadroid: Profiling Android Applications' Behaviors for Identifying Unknown Malapps , 2016, 2016 IEEE Trustcom/BigDataSE/ISPA.

[39]  Patrick D. McDaniel,et al.  On lightweight mobile phone application certification , 2009, CCS.

[40]  Sencun Zhu,et al.  Alde: Privacy Risk Analysis of Analytics Libraries in the Android Ecosystem , 2016, SecureComm.

[41]  Swarat Chaudhuri,et al.  A Study of Android Application Security , 2011, USENIX Security Symposium.

[42]  Ninghui Li,et al.  Using probabilistic generative models for ranking risks of Android apps , 2012, CCS.

[43]  Xiangliang Zhang,et al.  Exploring Permission-Induced Risk in Android Applications for Malicious Application Detection , 2014, IEEE Transactions on Information Forensics and Security.

[44]  Roberto Battiti,et al.  Identifying intrusions in computer networks with principal component analysis , 2006, First International Conference on Availability, Reliability and Security (ARES'06).