Throttling Malware Families in 2D

Malicious software are categorized into families based on their static and dynamic characteristics, infection methods, and nature of threat. Visual exploration of malware instances and families in a low dimensional space helps in giving a first overview about dependencies and relationships among these instances, detecting their groups and isolating outliers. Furthermore, visual exploration of different sets of features is useful in assessing the quality of these sets to carry a valid abstract representation, which can be later used in classification and clustering algorithms to achieve a high accuracy. In this paper, we investigate one of the best dimensionality reduction techniques known as t-SNE to reduce the malware representation from a high dimensional space consisting of thousands of features to a low dimensional space. We experiment with different feature sets and depict malware clusters in 2-D. Surprisingly, t-SNE does not only provide nice 2-D drawings, but also dramatically increases the generalization power of SVM classifiers. Moreover, obtained results showed that cross-validation accuracy is much better using the 2-D embedded representation of samples than using the original high-dimensional representation.

[1]  Sr. Principal Analyst IoT platforms : enabling the Internet of Things , 2016 .

[2]  Mikio Aoyama,et al.  An Automation Method of SLA Contract of Web APIs and Its Platform Based on Blockchain Concept , 2017, 2017 IEEE International Conference on Cognitive Computing (ICCC).

[3]  Giovane C. M. Moura,et al.  ENTRADA: A high-performance network traffic data streaming warehouse , 2016, NOMS 2016 - 2016 IEEE/IFIP Network Operations and Management Symposium.

[4]  Ying Tan,et al.  Generating Adversarial Malware Examples for Black-Box Attacks Based on GAN , 2017, DMBD.

[6]  Linda Doyle,et al.  Smart Contract SLAs for Dense Small-Cell-as-a-Service , 2017, ArXiv.

[7]  Nicholas Hopper,et al.  Challenges in Protecting Tor Hidden Services from Botnet Abuse , 2014, Financial Cryptography.

[8]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[9]  Hovav Shacham,et al.  A Systematic Analysis of the Juniper Dual EC Incident , 2016, IACR Cryptol. ePrint Arch..

[10]  Peng Li,et al.  On Challenges in Evaluating Malware Clustering , 2010, RAID.

[11]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[12]  Maher Alharby,et al.  Blockchain-based Smart Contracts: A Systematic Mapping Study , 2017, ICAISC 2017.

[13]  Vladimir V. Bochkarev,et al.  Average word length dynamics as indicator of cultural changes in society , 2012, ArXiv.

[14]  Larry L. Peterson,et al.  binpac: a yacc for writing application protocol parsers , 2006, IMC '06.

[15]  B. Padmavathi,et al.  Adaptive behaviour pattern based botnet detection using traffic analysis and flow interavals , 2017, 2017 International conference of Electronics, Communication and Aerospace Technology (ICECA).

[16]  Guevara Noubir,et al.  OnionBots: Subverting Privacy Infrastructure for Cyber Attacks , 2015, 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[17]  Taejoong Chung,et al.  Rolling With Confidence: Managing the Complexity of DNSSEC Operations , 2019, IEEE Transactions on Network and Service Management.

[18]  Spiros Skiadopoulos,et al.  SECRETA: A System for Evaluating and Comparing RElational and Transaction Anonymization algorithms , 2014, EDBT.

[19]  Muhammad Sher,et al.  Flow-based intrusion detection: Techniques and challenges , 2017, Comput. Secur..

[20]  Benoit Claise,et al.  Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of Flow Information , 2013, RFC.

[21]  Shane Greenstein,et al.  Evidence of Decreasing Internet Entropy: The Lack of Redundancy in Dns Resolution by Major Websites and Services , 2018, Journal of Quantitative Description: Digital Media.

[22]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[23]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[24]  Paul V. Mockapetris,et al.  Domain names: Concepts and facilities , 1983, RFC.

[25]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[26]  Laurens van der Maaten,et al.  Learning a Parametric Embedding by Preserving Local Structure , 2009, AISTATS.

[27]  George Varghese,et al.  Design principles for packet parsers , 2013, Architectures for Networking and Communications Systems.

[28]  B. S. Manjunath,et al.  Malware images: visualization and automatic classification , 2011, VizSec '11.

[29]  Hyrum S. Anderson,et al.  Learning to Evade Static PE Machine Learning Malware Models via Reinforcement Learning , 2018, ArXiv.

[30]  Daniel A. Keim,et al.  Mastering the Information Age - Solving Problems with Visual Analytics , 2010 .

[31]  William Stallings,et al.  Cryptography and Network Security: Principles and Practice , 1998 .

[32]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[33]  Kato Mivule,et al.  A study of usability-aware network trace anonymization , 2015, 2015 Science and Information Conference (SAI).

[34]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  S. Momina Tabish,et al.  A Framework for Efficient Mining of Structural Information to Detect Zero-Day Malicious Portable Executables , 2009 .

[36]  Christopher Krügel,et al.  Scalable, Behavior-Based Malware Clustering , 2009, NDSS.

[37]  Fabian Prasser,et al.  Putting Statistical Disclosure Control into Practice: The ARX Data Anonymization Tool , 2015, Medical Data Privacy Handbook.

[38]  Martin Loebl,et al.  DNA-inspired information concealing: A survey , 2010, Comput. Sci. Rev..

[39]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[40]  Daniel Gibert Llauradó Convolutional neural networks for malware classification , 2016 .

[41]  George Varghese,et al.  Leaping Multiple Headers in a Single Bound: Wire-Speed Parsing Using the Kangaroo System , 2010, 2010 Proceedings IEEE INFOCOM.

[42]  Curtis Busby-Earle,et al.  The role of machine learning in botnet detection , 2016, 2016 11th International Conference for Internet Technology and Secured Transactions (ICITST).

[43]  Fernando M. V. Ramos,et al.  Software-Defined Networking: A Comprehensive Survey , 2014, Proceedings of the IEEE.

[44]  Gordon Fyodor Lyon,et al.  Nmap Network Scanning: The Official Nmap Project Guide to Network Discovery and Security Scanning , 2009 .

[45]  Brian Trammell,et al.  YAF: Yet Another Flowmeter , 2010, LISA.

[46]  Alexander Kraskov,et al.  Hierarchical Clustering Based on Mutual Information , 2003, ArXiv.

[47]  Antonio Pescapè,et al.  Analyzing internet censorship in Pakistan , 2016, 2016 IEEE 2nd International Forum on Research and Technologies for Society and Industry Leveraging a better tomorrow (RTSI).

[48]  Antonio Capone,et al.  5G Network Slicing - Part 1: Concepts, Principles, and Architectures , 2017, IEEE Commun. Mag..

[49]  Arvind Narayanan,et al.  An Empirical Study of Namecoin and Lessons for Decentralized Namespace Design , 2015, WEIS.

[50]  Olivier Bonaventure,et al.  Revealing middlebox interference with tracebox , 2013, Internet Measurement Conference.

[51]  Mansour Ahmadi,et al.  Microsoft Malware Classification Challenge , 2018, ArXiv.

[52]  Ying Wang,et al.  Analysis for Botnet Detection Techniques , 2010, 2010 International Conference on Internet Technology and Applications.

[53]  Yang Yang,et al.  Botnets Drilling Away Privacy Infrastructure , 2015, ArXiv.

[54]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[55]  Blaine Nelson,et al.  Can machine learning be secure? , 2006, ASIACCS '06.

[56]  Fahim Kawsar,et al.  The Internet of Things: The Next Technological Revolution , 2013, Computer.

[57]  Radek Krejcí,et al.  Flow Information Storage Assessment Using IPFIXcol , 2012, AIMS.

[58]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[59]  M. Templ Statistical Disclosure Control for Microdata Using the R-Package sdcMicro , 2008, Trans. Data Priv..

[60]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[61]  Gustavo de Veciana,et al.  Multi-Tenant Radio Access Network Slicing: Statistical Multiplexing of Spatial Loads , 2016, IEEE/ACM Transactions on Networking.

[62]  J. Higgins,et al.  Cochrane Handbook for Systematic Reviews of Interventions , 2010, International Coaching Psychology Review.

[63]  Saeedeh Parsaeefard,et al.  Joint User-Association and Resource-Allocation in Virtualized Wireless Networks , 2015, IEEE Access.

[64]  Glenn Greenwald,et al.  No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State , 2014 .

[65]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[66]  Luís M. Correia,et al.  A model for virtual radio resource management in virtual RANs , 2015, EURASIP J. Wirel. Commun. Netw..

[67]  P. Singhal Analysis and Detection of Botnets and Encrypted Tunnels , 2017 .

[68]  Nickolai Zeldovich,et al.  Nail: A Practical Tool for Parsing and Generating Data Formats , 2014, OSDI.

[69]  Gabi Dreo Rodosek,et al.  How Anonymous Is the Tor Network? A Long-Term Black-Box Investigation , 2016, Computer.

[70]  Jonathan Loo,et al.  Dynamic Network Slicing for Multitenant Heterogeneous Cloud Radio Access Networks , 2018, IEEE Transactions on Wireless Communications.

[71]  Saad Mubeen,et al.  Management of Service Level Agreements for Cloud Services in IoT: A Systematic Mapping Study , 2018, IEEE Access.

[72]  Konrad Rieck,et al.  Sally: a tool for embedding strings in vector spaces , 2012, J. Mach. Learn. Res..

[73]  Pavel Celeda,et al.  An investigation into teredo and 6to4 transition mechanisms: Traffic analysis , 2013, 38th Annual IEEE Conference on Local Computer Networks - Workshops.

[74]  George Varghese,et al.  P4: programming protocol-independent packet processors , 2013, CCRV.

[75]  Patrick D. McDaniel,et al.  Adversarial Perturbations Against Deep Neural Networks for Malware Classification , 2016, ArXiv.

[76]  Benoit Donnet,et al.  Network fingerprinting: TTL-based router signatures , 2013, Internet Measurement Conference.

[77]  Marcus A. Maloof,et al.  Learning to detect malicious executables in the wild , 2004, KDD.

[78]  Jürgen Quittek,et al.  Architecture for IP Flow Information Export , 2009, RFC.

[79]  Luca Deri,et al.  nProbe: an Open Source NetFlow Probe for Gigabit Networks , 2003 .