Toward Developing a Systematic Approach to Generate Benchmark Android Malware Datasets and Classification

Malware detection is one of the most important factors in the security of smartphones. Academic researchers have extensively studied Android malware detection problems. Machine learning methods proposed in previous work typically reported high detection performance and fast prediction times on fixed and defective datasets. Therefore, based on these shortcomings most datasets are not suitable for real-world deployment. The main goal of this paper is to propose a systematic approach to generate Android malware datasets using real smartphones instead of emulators and develop a new dataset, namely CI-CAndMal2017, which covers all the shortcomings and limitations of previous datasets. Also, we offer 80 traffic features to select the best feature sets for detecting and classifying the malicious families just by traffic analysis. The proposed method showed an average precision of 85% and recall of 88% for three classifiers, namely Random Forest(RF), K-Nearest Neighbor (KNN), and Decision Tree (DT).

[1]  Ali A. Ghorbani,et al.  Android Botnets: What URLs are Telling Us , 2015, NSS.

[2]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[3]  Ali A. Ghorbani,et al.  Towards a Network-Based Framework for Android Malware Detection and Characterization , 2017, 2017 15th Annual Conference on Privacy, Security and Trust (PST).

[4]  Sankardas Roy,et al.  Deep Ground Truth Analysis of Current Android Malware , 2017, DIMVA.

[5]  Anshul Arora,et al.  Minimizing Network Traffic Features for Android Mobile Malware Detection , 2017, ICDCN.

[6]  Aziz Mohaisen,et al.  Detecting and Classifying Android Malware Using Static Analysis along with Creator Information , 2015, Int. J. Distributed Sens. Networks.

[7]  Konrad Rieck,et al.  DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket , 2014, NDSS.

[8]  Ali A. Ghorbani,et al.  Characterization of Encrypted and VPN Traffic using Time-related Features , 2016, ICISSP.

[9]  Aziz Mohaisen,et al.  Detecting and classifying method based on similarity matching of Android malware behavior with profile , 2016, SpringerPlus.

[10]  Ali A. Ghorbani,et al.  DroidKin: Lightweight Detection of Android Apps Similarity , 2014, SecureComm.

[11]  Yajin Zhou,et al.  Malton: Towards On-Device Non-Invasive Mobile Malware Analysis for ART , 2017, USENIX Security Symposium.

[12]  Bo Yang,et al.  TrafficAV: An effective and explainable detection of mobile malware behavior using network traffic , 2016, 2016 IEEE/ACM 24th International Symposium on Quality of Service (IWQoS).

[13]  Yajin Zhou,et al.  Dissecting Android Malware: Characterization and Evolution , 2012, 2012 IEEE Symposium on Security and Privacy.

[14]  Ali A. Ghorbani,et al.  Characterization of Tor Traffic using Time based Features , 2017, ICISSP.

[15]  Huy Kang Kim,et al.  Function-Oriented Mobile Malware Analysis as First Aid , 2016, Mob. Inf. Syst..

[16]  Valérie Viet Triem Tong,et al.  Kharon dataset: Android malware under a microscope , 2016 .

[17]  Aziz Mohaisen,et al.  Andro-Dumpsys: Anti-malware system based on the similarity of malware creator and malware centric information , 2016, Comput. Secur..

[18]  Lidong Zhai,et al.  Research of android malware detection based on network traffic monitoring , 2014, 2014 9th IEEE Conference on Industrial Electronics and Applications.

[19]  Anshul Arora,et al.  Malware Detection Using Network Traffic Analysis in Android Based Mobile Devices , 2014, 2014 Eighth International Conference on Next Generation Mobile Apps, Services and Technologies.