DroidClassifier: Efficient Adaptive Mining of Application-Layer Header for Classifying Android Malware

A recent report has shown that there are more than 5,000 malicious applications created for Android devices each day. This creates a need for researchers to develop effective and efficient malware classification and detection approaches. To address this need, we introduce DroidClassifier: a systematic framework for classifying network traffic generated by mobile malware. Our approach utilizes network traffic analysis to construct multiple models in an automated fashion using a supervised method over a set of labeled malware network traffic (the training dataset). Each model is built by extracting common identifiers from multiple HTTP header fields. Adaptive thresholds are designed to capture the disparate characteristics of different malware families. Clustering is then used to improve the classification efficiency. Finally, we aggregate the multiple models to construct a holistic model to conduct cluster-level malware classification. We then perform a comprehensive evaluation of DroidClassifier by using 706 malware samples as the training set and 657 malware samples and 5,215 benign apps as the testing set. Collectively, these malicious and benign apps generate 17,949 network flows. The results show that DroidClassifier successfully identifies over 90% of different families of malware with more than 90% accuracy with accessible computational cost. Thus, DroidClassifier can facilitate network management in a large network, and enable unobtrusive detection of mobile malware. By focusing on analyzing network behaviors, we expect DroidClassifier to work with reasonable accuracy for other mobile platforms such as iOS and Windows Mobile as well.

[1]  Ali A. Ghorbani,et al.  Automated malware classification based on network behavior , 2013, 2013 International Conference on Computing, Networking and Communications (ICNC).

[2]  Konrad Rieck,et al.  DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket , 2014, NDSS.

[3]  Anshul Arora,et al.  Malware Detection Using Network Traffic Analysis in Android Based Mobile Devices , 2014, 2014 Eighth International Conference on Next Generation Mobile Apps, Services and Technologies.

[4]  Vitor Monte Afonso,et al.  Identifying Android malware using dynamically obtained features , 2014, Journal of Computer Virology and Hacking Techniques.

[5]  Yajin Zhou,et al.  Dissecting Android Malware: Characterization and Evolution , 2012, 2012 IEEE Symposium on Security and Privacy.

[6]  Leyla Bilge,et al.  Automatically Generating Models for Botnet Detection , 2009, ESORICS.

[7]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[8]  Nizar Kheir,et al.  Analyzing HTTP User Agent Anomalies for Malware Detection , 2012, DPM/SETOP.

[9]  Yuan Zhang,et al.  Vetting undesirable behaviors in android apps with permission use analysis , 2013, CCS.

[10]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[11]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[12]  Yuval Elovici,et al.  “Andromaly”: a behavioral malware detection framework for android devices , 2012, Journal of Intelligent Information Systems.

[13]  Narseo Vallina-Rodriguez,et al.  Haystack: In Situ Mobile Traffic Analysis in User Space , 2015, ArXiv.

[14]  Yong Liao,et al.  SAMPLES: Self Adaptive Mining of Persistent LExical Snippets for Classifying Mobile Application Traffic , 2015, MobiCom.

[15]  Yanjun Qi,et al.  Automatically Evading Classifiers: A Case Study on PDF Malware Classifiers , 2016, NDSS.

[16]  Lior Rokach,et al.  Clustering Methods , 2005, The Data Mining and Knowledge Discovery Handbook.

[17]  Balachander Krishnamurthy,et al.  Best paper -- Follow the money: understanding economics of online aggregation and advertising , 2013, Internet Measurement Conference.

[18]  Narseo Vallina-Rodriguez,et al.  Breaking for commercials: characterizing mobile advertising , 2012, Internet Measurement Conference.

[19]  Mansour Ahmadi,et al.  Clustering android malware families by http traffic , 2015, 2015 10th International Conference on Malicious and Unwanted Software (MALWARE).

[20]  Seungyeop Han,et al.  These aren't the droids you're looking for: retrofitting android to protect data from imperious applications , 2011, CCS '11.

[21]  Qiang Xu,et al.  Automatic generation of mobile app signatures from traffic observations , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[22]  Minas Gjoka,et al.  AntMonitor: A System for Monitoring from Mobile Devices , 2015, C2BD@SIGCOMM.

[23]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[24]  Juan Caballero,et al.  FIRMA: Malware Clustering and Network Signature Generation with Mixed Network Behaviors , 2013, RAID.

[25]  Mukul R. Prasad,et al.  Automated testing with targeted event sequence generation , 2013, ISSTA.

[26]  Sung-Ju Lee,et al.  Systematic Mining of Associated Server Herds for Malware Campaign Discovery , 2015, 2015 IEEE 35th International Conference on Distributed Computing Systems.

[27]  Byung-Gon Chun,et al.  TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoring on Smartphones , 2010, OSDI.

[28]  Walid Dabbous,et al.  Meddle: middleboxes for increased transparency and control of mobile traffic , 2012, CoNEXT Student '12.

[29]  Ali Feizollah,et al.  Evaluation of machine learning classifiers for mobile malware detection , 2014, Soft Computing.

[30]  Nick Feamster,et al.  Behavioral Clustering of HTTP-Based Malware and Signature Generation Using Malicious Network Traces , 2010, NSDI.

[31]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[32]  Bo Yang,et al.  A First Look at Android Malware Traffic in First Few Minutes , 2015, 2015 IEEE Trustcom/BigDataSE/ISPA.