FeatureAnalytics: An approach to derive relevant attributes for analyzing Android Malware

Ever increasing number of Android malware, has always been a concern for cybersecurity professionals. Even though plenty of anti-malware solutions exist, a rational and pragmatic approach for the same is rare and has to be inspected further. In this paper, we propose a novel two-set feature selection approach based on Rough Set and Statistical Test named as RSST to extract relevant system calls. To address the problem of higher dimensional attribute set, we derived suboptimal system call space by applying the proposed feature selection method to maximize the separability between malware and benign samples. Comprehensive experiments conducted on a dataset consisting of 3500 samples with 30 RSST derived essential system calls resulted in an accuracy of 99.9%, Area Under Curve (AUC) of 1.0, with 1% False Positive Rate (FPR). However, other feature selectors (Information Gain, CFsSubsetEval, ChiSquare, FreqSel and Symmetric Uncertainty) used in the domain of malware analysis resulted in the accuracy of 95.5% with 8.5% FPR. Besides, empirical analysis of RSST derived system calls outperform other attributes such as permissions, opcodes, API, methods, call graphs, Droidbox attributes and network traces.

[1]  Jean-Marc Robert,et al.  Generative versus discriminative classifiers for android anomaly-based detection system using system calls filtering and abstraction process , 2016, Secur. Commun. Networks.

[2]  C. Q. Lee,et al.  Three-phase behavior-based detection and classification of known and unknown malware , 2015, Secur. Commun. Networks.

[3]  Ali Dehghantanha,et al.  M0Droid: An Android Behavioral-Based Malware Detection Model , 2015 .

[4]  Juan José Rodríguez Diez,et al.  An Experimental Study on Rotation Forest Ensembles , 2007, MCS.

[5]  Roman W. Świniarski,et al.  Rough sets methods in feature reduction and classification , 2001 .

[6]  Gerardo Canfora,et al.  A Classifier of Malicious Android Applications , 2013, 2013 International Conference on Availability, Reliability and Security.

[7]  Jian Liu,et al.  Understanding Android Obfuscation Techniques: A Large-Scale Investigation in the Wild , 2018, SecureComm.

[8]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[9]  Luo Si,et al.  A Probabilistic Discriminative Model for Android Malware Detection with Decompiled Source Code , 2015, IEEE Transactions on Dependable and Secure Computing.

[10]  M. Zhang,et al.  A rough sets based approach to feature selection , 2004, IEEE Annual Meeting of the Fuzzy Information, 2004. Processing NAFIPS '04..

[11]  Sandro Etalle,et al.  Hybrid Static-Runtime Information Flow and Declassification Enforcement , 2013, IEEE Transactions on Information Forensics and Security.

[12]  Muttukrishnan Rajarajan,et al.  Android Security: A Survey of Issues, Malware Penetration, and Defenses , 2015, IEEE Communications Surveys & Tutorials.

[13]  Joseph Gardiner,et al.  On the Security of Machine Learning in Malware C&C Detection , 2016, ACM Comput. Surv..

[14]  Muttukrishnan Rajarajan,et al.  PIndroid: A novel Android malware detection system using ensemble learning , 2017 .

[15]  Lior Rokach,et al.  Detection of Deviations in Mobile Applications Network Behavior , 2012, ArXiv.

[16]  Xiangliang Zhang,et al.  Detecting Android malicious apps and categorizing benign apps with ensemble of classifiers , 2018, Future Gener. Comput. Syst..

[17]  Konrad Rieck,et al.  DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket , 2014, NDSS.

[18]  Shu-Tao Xia,et al.  Back-propagation neural network on Markov chains from system call sequences: a new approach for detecting Android malware with system call sequences , 2017, IET Inf. Secur..

[19]  Richard E. Harang,et al.  Rapid Permissions-Based Detection and Analysis of Mobile Malware Using Random Decision Forests , 2013, MILCOM 2013 - 2013 IEEE Military Communications Conference.

[20]  Abdullah Talha Kabakus,et al.  APK Auditor: Permission-based Android malware detection system , 2015, Digit. Investig..

[21]  Juan Enrique Ramos,et al.  Using TF-IDF to Determine Word Relevance in Document Queries , 2003 .

[22]  M. Chuah,et al.  Smartphone Dual Defense Protection Framework: Detecting Malicious Applications in Android Markets , 2012, 2012 8th International Conference on Mobile Ad-hoc and Sensor Networks (MSN).

[23]  Xiaohua Hu Knowledge discovery in databases: an attribute-oriented rough set approach , 1996 .

[24]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[25]  Richard Jensen,et al.  Rough Set-Based Feature Selection: A Review , 2007 .

[26]  Triet Vo Huu,et al.  Inferring User Routes and Locations Using Zero-Permission Mobile Sensors , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[27]  Mi-Jung Choi,et al.  Analysis of Android malware detection performance using machine learning classifiers , 2013, 2013 International Conference on ICT Convergence (ICTC).

[28]  Lynn Batten,et al.  Zero permission android applications - attacks and defenses , 2012 .

[29]  Eric Medvet,et al.  Detecting Android malware using sequences of system calls , 2015, DeMobile@SIGSOFT FSE.

[30]  Peijun Du,et al.  Random Forest and Rotation Forest for fully polarized SAR image classification using polarimetric and spatial features , 2015 .

[31]  Yuval Elovici,et al.  “Andromaly”: a behavioral malware detection framework for android devices , 2012, Journal of Intelligent Information Systems.

[32]  Steve Hanna,et al.  Android permissions demystified , 2011, CCS '11.

[33]  Yajin Zhou,et al.  Dissecting Android Malware: Characterization and Evolution , 2012, 2012 IEEE Symposium on Security and Privacy.

[34]  Ali Feizollah,et al.  Evaluation of machine learning classifiers for mobile malware detection , 2014, Soft Computing.