Android botnet detection using machine learning models based on a comprehensive static analysis approach

Abstract Today, Android stands out amongst the most well-known and far reaching smartphones’ operating systems. It has millions of applications that are distributed at either accredited or informal stores. Botnet applications are classified as malwares that can be distributed by utilizing these stores and downloaded by the unfortunate users on their smartphones. This work investigates Android botnets using static analysis to extract possible features from the applications source code after being reverse engineered. The features are then used to develop effective machine learning models to detect such malicious applications. Additionally, the study proposes a new set of features related to accessing resources on the target mobile. The features are extracted from 1928 Android botnet applications (ISCX dataset) and 2224 of Android benign applications (downloaded and scanned by special tools developed as part of this work). The extracted features are categorized into six groups of features in addition to a group that contains all the extracted features. Each group of features undergoes training and testing processes using four popular ML classifiers (i.e. Random Forest, Multi-Layer Perceptron neural networks, Decision trees, and Naive Bayes). After comparing the results and performing features importance analysis, it can be noted that the URL set of features play the key role in the Android botnet detection problem and the Random Forest classifier obtains the best results based on all sets of features.

[1]  Ala’ M. Al-Zoubi,et al.  Spam Emails Detection Based on Distributed Word Embedding with Deep Learning , 2020 .

[2]  Lior Rokach,et al.  Proactive Data Mining with Decision Trees , 2014, SpringerBriefs in Electrical and Computer Engineering.

[3]  Zuzana Komínková Oplatková,et al.  Detection of mobile botnets using neural networks , 2016, 2016 Future Technologies Conference (FTC).

[4]  Ali A. Ghorbani,et al.  Android Botnets: What URLs are Telling Us , 2015, NSS.

[5]  Nurlida Basir,et al.  A Systematic Review Analysis of Root Exploitation for Mobile Botnet Detection , 2016 .

[6]  Wei Luo,et al.  Data-Driven Android Malware Intelligence: A Survey , 2019, ML4CS.

[7]  Hossam Faris,et al.  Fraud Detection Model Based on Multi-Verse Features Extraction Approach for Smart City Applications , 2019, Smart Cities Cybersecurity and Privacy.

[8]  Samia Boukir,et al.  Relevance of airborne lidar and multispectral image data for urban scene classification using Random Forests , 2011 .

[9]  Jasni Mohamad Zain,et al.  A static approach towards mobile botnet detection , 2016, 2016 3rd International Conference on Electronic Design (ICED).

[10]  Hossam Faris,et al.  Automatic Email Spam Detection using Genetic Programming with SMOTE , 2018, 2018 Fifth HCT Information Technology Trends (ITT).

[11]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[12]  Hossam Faris,et al.  A Hybrid Approach Based on Particle Swarm Optimization and Random Forests for E-Mail Spam Filtering , 2016, ICCCI.

[13]  Ali Dehghantanha,et al.  Machine Learning Aided Static Malware Analysis: A Survey and Tutorial , 2018, ArXiv.

[14]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[15]  Hasan N. Qunoo,et al.  A survey of Static Android Malware Detection Techniques , 2019, 2019 IEEE 7th Palestinian International Conference on Electrical and Computer Engineering (PICECE).

[16]  Nor Badrul Anuar,et al.  Botnet detection techniques: review, future trends, and issues , 2014, Journal of Zhejiang University SCIENCE C.

[17]  Hossam Faris,et al.  Spam profiles detection on social networks using computational intelligence methods: The effect of the lingual context , 2019, J. Inf. Sci..

[18]  Ali Feizollah,et al.  A Study Of Machine Learning Classifiers for Anomaly-Based Mobile Botnet Detection , 2013 .

[19]  Hossam Faris,et al.  Toward a Detection Framework for Android Botnet , 2017, 2017 International Conference on New Trends in Computing Sciences (ICTCS).

[20]  Ruchika Malhotra,et al.  Prediction & Assessment of Change Prone Classes Using Statistical & Machine Learning Techniques , 2017, J. Inf. Process. Syst..

[21]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[22]  Sylvio Barbon Junior,et al.  Detecting mobile botnets through machine learning and system calls analysis , 2017, 2017 IEEE International Conference on Communications (ICC).

[23]  Prerna Agrawal,et al.  A Survey on Android Malware and their Detection Techniques , 2019, 2019 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT).

[24]  Muttukrishnan Rajarajan,et al.  Employing Program Semantics for Malware Detection , 2015, IEEE Transactions on Information Forensics and Security.

[25]  Barry De Ville,et al.  Decision Trees for Business Intelligence and Data Mining: Using SAS Enterprise Miner , 2006 .

[26]  Hossam Faris,et al.  Applying computational intelligence methods for predicting the sales of newly published books in a real editorial business management environment , 2017, Knowl. Based Syst..

[27]  Ali Selamat,et al.  A survey on malware propagation, analysis, and detection , 2013 .

[28]  Anu Mary Chacko,et al.  Android malware detection a survey , 2017, 2017 IEEE International Conference on Circuits and Systems (ICCS).

[29]  Bing Wang,et al.  Artificial neural networks for the prediction of peptide drift time in ion mobility mass spectrometry , 2010, BMC Bioinformatics.

[30]  Sina Hojjatinia,et al.  Android Botnet Detection using Convolutional Neural Networks , 2019, 2020 28th Iranian Conference on Electrical Engineering (ICEE).

[31]  R. Anitha,et al.  Structural analysis and detection of android botnets using machine learning techniques , 2017, International Journal of Information Security.

[32]  Rahil Hosseini,et al.  A state-of-the-art survey of malware detection approaches using data mining techniques , 2018, Human-centric Computing and Information Sciences.

[33]  Munam Ali Shah,et al.  An enhanced botnet detection technique for mobile devices using log analysis , 2016, 2016 22nd International Conference on Automation and Computing (ICAC).

[34]  Gonzalo Álvarez,et al.  PUMA: Permission Usage to Detect Malware in Android , 2012, CISIS/ICEUTE/SOCO Special Sessions.

[35]  Гарнаева Мария Александровна,et al.  Kaspersky security Bulletin 2013 , 2014 .

[36]  John Wilkes,et al.  Keynote , 2019, FCRC '15.

[37]  R. J. Mangialardo,et al.  Integrating Static and Dynamic Malware Analysis Using Machine Learning , 2015, IEEE Latin America Transactions.

[38]  Hossam Faris,et al.  Improving email spam detection using content based feature engineering approach , 2017, 2017 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT).

[39]  Matthew N. Anyanwu,et al.  Comparative Analysis of Serial Decision Tree Classification Algorithms , 2009 .

[40]  Ping Yan,et al.  A survey on dynamic mobile malware detection , 2017, Software Quality Journal.

[41]  Nor Badrul Anuar,et al.  Mobile botnet detection: Proof of concept , 2014, 2014 IEEE 5th Control and System Graduate Research Colloquium.

[42]  Kang G. Shin,et al.  How to Construct a Mobile Botnet ? , 2010 .

[43]  Daniele Sgandurra,et al.  A Survey on Security for Mobile Devices , 2013, IEEE Communications Surveys & Tutorials.

[44]  Sylvio Barbon Junior,et al.  Mobile botnets detection based on machine learning over system calls , 2019, Int. J. Secur. Networks.

[45]  Thomas J. Watson,et al.  An empirical study of the naive Bayes classifier , 2001 .

[46]  Yajin Zhou,et al.  Dissecting Android Malware: Characterization and Evolution , 2012, 2012 IEEE Symposium on Security and Privacy.

[47]  J. R. Quinlan Induction of decision trees , 2004, Machine Learning.