A Performance Evaluation on Distance Measures in KNN for Mobile Malware Detection

Most of the related works on mobile malware detection for Android Operating System (OS) that are based on machine learning often use classifiers' default settings, and focus on opting either the optimal features or classifier. Even if this approach is understandable and it has proven to provide valuable results classifiers different hyper-parameters should be configured properly in order to achieve classifier's best performance. Thus, this paper investigates the performance of one of the most simple machine learning classifier, such as K Nearest Neighbor (KNN), considering its different hyper-parameters with emphasis on different distance measures. The authors have performed an extensive comparison using various well known distance measures over the Drebin data set. Results show that the proper choice of the distance measure can provide a significant enhancement to the classification accuracy. Specifically, the Euclidean distance that is mostly used for KNN is not the optimal option, instead other distance measures i.e., Hamming, CityBlock, can boost classifier's performance in the context of mobile malware detection. For instance, CityBlock can improve KNN false positive rate up to 33% in comparison to the Euclidean distance.

[1]  Xiaohong Guan,et al.  Input extraction via motion-sensor behavior analysis on smartphones , 2015, Comput. Secur..

[2]  Elena Deza,et al.  Encyclopedia of Distances , 2014 .

[3]  Heng Yin,et al.  DroidAPIMiner: Mining API-Level Features for Robust Malware Detection in Android , 2013, SecureComm.

[4]  Abdelouahid Derhab,et al.  MalDozer: Automatic framework for android malware detection using deep learning , 2018, Digit. Investig..

[5]  Zhi Xu,et al.  TapLogger: inferring user inputs on smartphone touchscreens using on-board motion sensors , 2012, WISEC '12.

[6]  Ayman Youssef,et al.  Quantitave Dynamic Taint Analysis of Privacy Leakage in Android Arabic Apps , 2017, ARES.

[7]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[8]  Hahn-Ming Lee,et al.  DroidMat: Android Malware Detection through Manifest and API Calls Tracing , 2012, 2012 Seventh Asia Joint Conference on Information Security.

[9]  Konrad Rieck,et al.  DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket , 2014, NDSS.

[10]  Byung-Gon Chun,et al.  TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoring on Smartphones , 2010, OSDI.

[11]  William Enck,et al.  AppsPlayground: automatic security analysis of smartphone applications , 2013, CODASPY.

[12]  Patrick Traynor,et al.  (sp)iPhone: decoding vibrations from nearby keyboards using mobile phone accelerometers , 2011, CCS '11.

[13]  Juha Karhunen,et al.  A pragmatic android malware detection procedure , 2017, Comput. Secur..

[14]  Brett J. Borghetti,et al.  A Survey of Distance and Similarity Measures Used Within Network Intrusion Anomaly Detection , 2015, IEEE Communications Surveys & Tutorials.

[15]  Steve Hanna,et al.  Android permissions demystified , 2011, CCS '11.

[16]  Zhihua Wang,et al.  FgDetector: Fine-Grained Android Malware Detection , 2017, 2017 IEEE Second International Conference on Data Science in Cyberspace (DSC).