SATTVA: SpArsiTy inspired classificaTion of malware VAriants

There is an alarming increase in the amount of malware that is generated today. However, several studies have shown that most of these new malware are just variants of existing ones. Fast detection of these variants plays an effective role in thwarting new attacks. In this paper, we propose a novel approach to detect malware variants using a sparse representation framework. Exploiting the fact that most malware variants have small differences in their structure, we model a new/unknown malware sample as a sparse linear combination of other malware in the training set. The class with the least residual error is assigned to the unknown malware. Experiments on two standard malware datasets, Malheur dataset and Malimg dataset, show that our method outperforms current state of the art approaches and achieves a classification accuracy of 98.55\% and 92.83\% respectively. Further, by using a confidence measure to reject outliers, we obtain 100\% accuracy on both datasets, at the expense of throwing away a small percentage of outliers. Finally, we evaluate our technique on two large scale malware datasets: Offensive Computing dataset (2,124 classes, 42,480 malware) and Anubis dataset (209 classes, 36,784 samples). On both datasets our method obtained an average classification accuracy of 77\%, thus making it applicable to real world malware classification.

[1]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[2]  Jack W. Stokes,et al.  Large-scale malware classification using random projections and neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Christopher Krügel,et al.  A Static, Packer-Agnostic Filter to Detect Similar Malware Samples , 2012, DIMVA.

[4]  Vlado Keselj,et al.  N-gram-based detection of new malicious code , 2004, Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004..

[5]  Peng Li,et al.  On Challenges in Evaluating Malware Clustering , 2010, RAID.

[6]  B. S. Manjunath,et al.  Malware images: visualization and automatic classification , 2011, VizSec '11.

[7]  Alexander Ilin,et al.  Methodology for Behavioral-based Malware Analysis and Detection Using Random Projections and K-Nearest Neighbors Classifiers , 2011, 2011 Seventh International Conference on Computational Intelligence and Security.

[8]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[9]  Gary Carpenter 동적 사용자를 위한 Scalable 인증 그룹 키 교환 프로토콜 , 2005 .

[10]  Carsten Willems,et al.  Automatic analysis of malware behavior using machine learning , 2011, J. Comput. Secur..

[11]  Wenke Lee,et al.  McBoost: Boosting Scalability in Malware Collection and Analysis Using Statistical Classification of Executables , 2008, 2008 Annual Computer Security Applications Conference (ACSAC).

[12]  Vinod Yegneswaran,et al.  A comparative assessment of malware classification using binary texture analysis and dynamic analysis , 2011, AISec '11.

[13]  Marcus A. Maloof,et al.  Learning to Detect and Classify Malicious Executables in the Wild , 2006, J. Mach. Learn. Res..

[14]  Lakshmanan Nataraj,et al.  SARVAM : Search And RetrieVAl of Malware , 2013 .

[15]  David Brumley,et al.  BitShred: feature hashing malware for scalable triage and semantic analysis , 2011, CCS '11.

[16]  B. S. Manjunath,et al.  SigMal: a static signal processing based malware triage , 2013, ACSAC.

[17]  Andrew Walenstein,et al.  VILO: a rapid learning nearest-neighbor classifier for malware triage , 2013, Journal of Computer Virology and Hacking Techniques.

[18]  Kang G. Shin,et al.  Large-scale malware indexing using function-call graphs , 2009, CCS.

[19]  Joel A. Tropp,et al.  Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit , 2007, IEEE Transactions on Information Theory.

[20]  Volkan Cevher,et al.  Compressive Sensing for Background Subtraction , 2008, ECCV.

[21]  D. Donoho,et al.  Counting faces of randomly-projected polytopes when the projection radically lowers dimension , 2006, math/0607364.

[22]  Georg Wicherski,et al.  peHash: A Novel Approach to Fast Malware Clustering , 2009, LEET.

[23]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Christopher Krügel,et al.  Scalable, Behavior-Based Malware Clustering , 2009, NDSS.

[25]  Haibin Ling,et al.  Robust Visual Tracking and Vehicle Classification via Sparse Representation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Jesse D. Kornblum Identifying almost identical files using context triggered piecewise hashing , 2006, Digit. Investig..

[27]  Andrew Walenstein,et al.  Malware phylogeny generation using permutations of code , 2005, Journal in Computer Virology.

[28]  Christopher Krügel,et al.  Polymorphic Worm Detection Using Structural Information of Executables , 2005, RAID.

[29]  Rama Chellappa,et al.  Secure and Robust Iris Recognition Using Random Projections and Sparse Representations , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.