MalPaCA: Malware Packet Sequence Clustering and Analysis

Malware family characterization is a challenging problem because ground-truth labels are not known. Anti-virus solutions provide labels for malware samples based on their static analysis. However, these labels are known to be inconsistent, causing the evaluation of analysis methods to depend on unreliable ground truth labels. These analysis methods are often black-boxes that make it impossible to verify the assigned family labels. To support malware analysts, we propose a whitebox method named MalPaCA to cluster malware's attacking capabilities reflected in their network traffic. We use sequential features to model temporal behavior. We also propose an intuitive, visualization-based cluster evaluation method to solve interpretability issues. The results show that clustering malware's attacking capabilities provides a more intimate profile of a family's behavior. The identified clusters capture various attacking capabilities, such as port scans and reuse of C\&C servers. We discover a number of discrepancies between behavioral clusters and traditional malware family designations. In these cases, behavior within a family group was so varied that many supposedly related malwares had more in common with malware from other families than within their family designation. We also show that sequential features are better suited for modeling temporal behavior than statistical aggregates.

[1]  Hao Wen Research of encrypted network traffic type identification , 2009 .

[2]  W. Bruce Croft,et al.  Probabilistic Retrieval of OCR Degraded Text Using N-Grams , 1997, ECDL.

[3]  Christopher Krügel,et al.  Limits of Static Analysis for Malware Detection , 2007, Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007).

[4]  Leyla Bilge,et al.  Disclosure: detecting botnet command and control servers through large-scale NetFlow analysis , 2012, ACSAC '12.

[5]  Ibrahim Ghafir,et al.  Blacklist-based malicious IP traffic detection , 2015, 2015 Global Conference on Communication Technologies (GCCT).

[6]  Sebastian Garcia,et al.  THE NETWORK BEHAVIOUR OF MALWARE TO BLOCK MALICIOUS PATTERNS . THE STRATOSPHERE PROJECT : A BEHAVIOURAL IPS , 2016 .

[7]  Neil Wong Hon Chan SCANNER: Sequence Clustering of resource Access to find Nearest Neighbors , 2015 .

[8]  Michel Cukier,et al.  Identifying infected users via network traffic , 2019, Comput. Secur..

[9]  Javier Del Ser,et al.  On-Line Dynamic Time Warping for Streaming Time Series , 2017, ECML/PKDD.

[10]  Jayant Gadge,et al.  Port scan detection , 2008, 2008 16th IEEE International Conference on Networks.

[11]  Yajin Zhou,et al.  Dissecting Android Malware: Characterization and Evolution , 2012, 2012 IEEE Symposium on Security and Privacy.

[12]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[13]  Stefano Zanero,et al.  Finding Non-trivial Malware Naming Inconsistencies , 2011, ICISS.

[14]  Radu State,et al.  Behavioral clustering of non-stationary IP flow record data , 2016, 2016 12th International Conference on Network and Service Management (CNSM).

[15]  J. Bauer,et al.  Economics of Malware: Security Decisions, Incentives and Externalities , 2008 .

[16]  Natalia Stakhanova,et al.  Android authorship attribution through string analysis , 2018, ARES.

[17]  Qin Lin,et al.  Learning behavioral fingerprints from Netflows using Timed Automata , 2017, 2017 IFIP/IEEE Symposium on Integrated Network and Service Management (IM).

[18]  Arushi Sharma,et al.  Malware Capability Assessment using Fuzzy Logic , 2019, Cybern. Syst..

[19]  Juan E. Tapiador,et al.  Dendroid: A text mining approach to analyzing and classifying code structures in Android malware families , 2014, Expert Syst. Appl..

[20]  Andrzej Duda,et al.  Markov chain fingerprinting to classify encrypted traffic , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[21]  Zhenkai Liang,et al.  Monet: A User-Oriented Behavior-Based Malware Variants Detection System for Android , 2016, IEEE Transactions on Information Forensics and Security.

[22]  Michael Carl Tschantz,et al.  Better Malware Ground Truth: Techniques for Weighting Anti-Virus Vendor Labels , 2015, AISec@CCS.

[23]  Christos Makris,et al.  Two Novel Techniques for Space Compaction on Biological Sequences , 2016, WEBIST.

[24]  Md. Rafiqul Islam,et al.  An automated classification system based on the strings of trojan and virus families , 2009, 2009 4th International Conference on Malicious and Unwanted Software (MALWARE).

[25]  Juan Caballero,et al.  AVclass: A Tool for Massive Malware Labeling , 2016, RAID.

[26]  Jiyong Jang,et al.  Android Malware Clustering through Malicious Payload Mining , 2017, RAID.

[27]  Arvind Mallari Rao,et al.  Technical Aspects of Cyber Kill Chain , 2015, SSCC.

[28]  C. Kruegel,et al.  Mining the Network Behavior of Bots , 2009 .

[29]  Marcus A. Maloof,et al.  Learning to Detect and Classify Malicious Executables in the Wild , 2006, J. Mach. Learn. Res..

[30]  Ke Wang,et al.  Fileprints: identifying file types by n-gram analysis , 2005, Proceedings from the Sixth Annual IEEE SMC Information Assurance Workshop.

[31]  Samaneh Tajalizadehkhoob,et al.  Why them? Extracting intelligence about target selection from Zeus financial malware , 2014, WEIS 2014.

[32]  Vlado Keselj,et al.  Detection of New Malicious Code Using N-grams Signatures , 2004, PST.

[33]  Christopher Krügel,et al.  A survey on automated dynamic malware-analysis techniques and tools , 2012, CSUR.

[34]  Seong-Je Cho,et al.  Parallel multiple pattern matching schemes based on cuckoo filter for deep packet inspection on graphics processing units , 2018, IET Inf. Secur..

[35]  Aziz Mohaisen,et al.  AMAL: High-fidelity, behavior-based automated malware analysis and classification , 2014, Comput. Secur..

[36]  Surasak Sanguanpong,et al.  A Rule-based Approach for Port Scanning Detection , 2000 .

[37]  Julong Lan,et al.  QoS-aware Traffic Classification Architecture Using Machine Learning and Deep Packet Inspection in SDNs , 2018 .

[38]  P. Vinod,et al.  MOMENTUM: MetamOrphic malware exploration techniques using MSA signatures , 2012, 2012 International Conference on Innovations in Information Technology (IIT).

[39]  Christopher Krügel,et al.  BotFinder: finding bots in network traffic without deep packet inspection , 2012, CoNEXT '12.

[40]  Heejo Lee,et al.  GMAD: Graph-based Malware Activity Detection by DNS traffic analysis , 2014, Comput. Commun..

[41]  Subharthi Paul,et al.  Deciphering malware’s use of TLS (without decryption) , 2016, Journal of Computer Virology and Hacking Techniques.

[42]  Aziz Mohaisen,et al.  Towards a Methodical Evaluation of Antivirus Scans and Labels - "If You're Not Confused, You're Not Paying Attention" , 2013, WISA.

[43]  Bhavani M. Thuraisingham,et al.  A new intrusion detection system using support vector machines and hierarchical clustering , 2007, The VLDB Journal.

[44]  Christopher Krügel,et al.  Scalable, Behavior-Based Malware Clustering , 2009, NDSS.

[45]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[46]  Nick Feamster,et al.  Behavioral Clustering of HTTP-Based Malware and Signature Generation Using Malicious Network Traces , 2010, NSDI.

[47]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Isil Dillig,et al.  Apposcopy: semantics-based detection of Android malware through static analysis , 2014, SIGSOFT FSE.

[49]  Walid G. Aref,et al.  WARP: time warping for periodicity detection , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[50]  Zsolt Miklós Kovács-Vajna,et al.  A Fingerprint Verification System Based on Triangular Matching and Dynamic Time Warping , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  Engin Kirda Malware Behavior Clustering , 2011, Encyclopedia of Cryptography and Security.

[52]  Aziz Mohaisen,et al.  Capturing DDoS Attack Dynamics Behind the Scenes , 2015, DIMVA.

[53]  Iqbal Gondal,et al.  A survey of similarities in banking malware behaviours , 2018, Comput. Secur..

[54]  Peng Li,et al.  On Challenges in Evaluating Malware Clustering , 2010, RAID.

[55]  Ricardo J. G. B. Campello,et al.  Density-Based Clustering Based on Hierarchical Density Estimates , 2013, PAKDD.