Finding Android Malware Trace from Highly Imbalanced Network Traffic

With the yearly increase of the amount of Android users, malicious applications for mobile terminals are emerging in endlessly. Many researchers have started to explore how malicious apps are detected from the perspective of network traffic. We design and implement a control and management system of Android traffic collection, which contains the functions of downloading APKs, malware static detection, network traffic collection and resources management. It can collect network traffic efficiently and manage the dataset easily. Furthermore, we address the machine learning based malware detection which using network traffic is an imbalanced learning problem. In addition, four imbalanced algorithms are applied to Android malware detection using the highly imbalanced network traffic dataset. The result of the experiments show that the combination of SMOTE and SVM are the best performer in the all combinations.

[1]  Zhi-Hua Zhou,et al.  Exploratory Undersampling for Class-Imbalance Learning , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[2]  Igor Kononenko,et al.  Semi-Naive Bayesian Classifier , 1991, EWSL.

[3]  Tian-Yu Liu,et al.  EasyEnsemble and Feature Selection for Imbalance Data Sets , 2009, 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing.

[4]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[5]  Bo Yang,et al.  Imbalanced traffic identification using an imbalanced data gravitation-based classification model , 2017, Comput. Commun..

[6]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[7]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[8]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[9]  Konrad Rieck,et al.  DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket , 2014, NDSS.

[10]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[11]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[12]  Klaus Reinhardt The Definitive Guide To Jython Python For The Java Platform , 2016 .

[13]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[14]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[15]  Bo Yang,et al.  TrafficAV: An effective and explainable detection of mobile malware behavior using network traffic , 2016, 2016 IEEE/ACM 24th International Symposium on Quality of Service (IWQoS).

[16]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[17]  Nick Feamster,et al.  Behavioral Clustering of HTTP-Based Malware and Signature Generation Using Malicious Network Traces , 2010, NSDI.

[18]  Anshul Arora,et al.  Malware Detection Using Network Traffic Analysis in Android Based Mobile Devices , 2014, 2014 Eighth International Conference on Next Generation Mobile Apps, Services and Technologies.

[19]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.