Android based malware detection using a multifeature collaborative decision fusion approach

Abstract Smart mobile device usage has expanded at a very high rate all over the world. Since the mobile devices nowadays are used for a wide variety of application areas like personal communication, data storage and entertainment, security threats emerge, comparable to those which a conventional PC is exposed to. Mobile malware has been growing in scale and complexity as smartphone usage continues to rise. Android has surpassed other mobile platforms as the most popular whilst also witnessing a dramatic increase in malware targeting the platform. In this work, we have considered Android based malware for analysis and a scalable detection mechanism is designed using multifeature collaborative decision fusion (MCDF). The different features of a malicious file like the permission based features and the API call based features are considered in order to provide a better detection by training an ensemble of classifiers and combining their decisions using collaborative approach based on probability theory. The performance of the proposed model is evaluated on a collection of Android based malware comprising of different malware families and the results show that our approach give a better performance than state-of-the-art ensemble schemes available.

[1]  Latifur Khan,et al.  A Machine Learning Approach to Android Malware Detection , 2012, 2012 European Intelligence and Security Informatics Conference.

[2]  Yuval Elovici,et al.  “Andromaly”: a behavioral malware detection framework for android devices , 2012, Journal of Intelligent Information Systems.

[3]  Stefan Axelsson A Preliminary Attempt to Apply Detection and Estimation Theory to Intrusion Detection , 2007 .

[4]  Boguslaw Cyganek One-Class Support Vector Ensembles for Image Segmentation and Classification , 2011, Journal of Mathematical Imaging and Vision.

[5]  R. Michalski,et al.  Discovering attribute dependence in databases by integrating symbolic learning and statistical analysis techniques , 1993 .

[6]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[7]  Bartosz Krawczyk,et al.  Improved Adaptive Splitting and Selection: the Hybrid Training Method of a Classifier Based on a Feature Space Partitioning , 2014, Int. J. Neural Syst..

[8]  Salvatore J. Stolfo,et al.  A framework for constructing features and models for intrusion detection systems , 2000, TSEC.

[9]  Pat Langley,et al.  Induction of One-Level Decision Trees , 1992, ML.

[10]  Yves Le Traon,et al.  Automatically securing permission-based software by reducing the attack surface: an application to Android , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[11]  P L D Roberts,et al.  Multiview, Broadband Acoustic Classification of Marine Fish: A Machine Learning Framework and Comparative Analysis , 2011, IEEE Journal of Oceanic Engineering.

[12]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[13]  Wenke Lee,et al.  A Data Mining Framework for Constructing Features and Models for Intrusion Detection Systems , 1999 .

[14]  Steve Hanna,et al.  Android permissions demystified , 2011, CCS '11.

[15]  Kieran McLaughlin,et al.  SVM Training Phase Reduction Using Dataset Feature Filtering for Malware Detection , 2013, IEEE Transactions on Information Forensics and Security.

[16]  Gonzalo Álvarez,et al.  MAMA: MANIFEST ANALYSIS FOR MALWARE DETECTION IN ANDROID , 2013, Cybern. Syst..

[17]  Igor Santos,et al.  Opcode sequences as representation of executables for data-mining-based unknown malware detection , 2013, Inf. Sci..

[18]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[19]  H WittenIan,et al.  The WEKA data mining software , 2009 .

[20]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[21]  Sahin Albayrak,et al.  Static Analysis of Executables for Collaborative Malware Detection on Android , 2009, 2009 IEEE International Conference on Communications.

[22]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[23]  Francisco Herrera,et al.  Empowering difficult classes with a similarity-based aggregation in multi-class classification problems , 2014, Inf. Sci..

[24]  Marek Kurzynski,et al.  Optimal selection of ensemble classifiers using measures of competence and diversity of base classifiers , 2014, Neurocomputing.

[25]  Mohan M. Trivedi,et al.  Learning, Modeling, and Classification of Vehicle Track Patterns from Live Video , 2008, IEEE Transactions on Intelligent Transportation Systems.

[26]  Salvatore J. Stolfo,et al.  Cost-sensitive, scalable and adaptive learning using ensemble-based methods , 2001 .

[27]  Peter Secretan Learning , 1965, Mental Health.

[28]  Simin Nadjm-Tehrani,et al.  Crowdroid: behavior-based malware detection system for Android , 2011, SPSM '11.

[29]  Heng Yin,et al.  DroidAPIMiner: Mining API-Level Features for Robust Malware Detection in Android , 2013, SecureComm.

[30]  Emilio Corchado,et al.  A survey of multiple classifier systems as hybrid systems , 2014, Inf. Fusion.

[31]  Patrick D. McDaniel,et al.  On lightweight mobile phone application certification , 2009, CCS.

[32]  Gonzalo Álvarez,et al.  PUMA: Permission Usage to Detect Malware in Android , 2012, CISIS/ICEUTE/SOCO Special Sessions.

[33]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[34]  Bartosz Krawczyk,et al.  Clustering-based ensembles for one-class classification , 2014, Inf. Sci..

[35]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[36]  Yajin Zhou,et al.  Dissecting Android Malware: Characterization and Evolution , 2012, 2012 IEEE Symposium on Security and Privacy.

[37]  Ethem Alpaydın,et al.  Combined 5 x 2 cv F Test for Comparing Supervised Classification Learning Algorithms , 1999, Neural Comput..