New Era of Deeplearning-Based Malware Intrusion Detection: The Malware Detection and Prediction Based On Deep Learning

With the development of artificial intelligence algorithms like deep learning models and the successful applications in many different fields, further similar trails of deep learning technology have been made in cyber security area. It shows the preferable performance not only in academic security research but also in industry practices when dealing with part of cyber security issues by deep learning methods compared to those conventional rules. Especially for the malware detection and classification tasks, it saves generous time cost and promotes the accuracy for a total pipeline of malware detection system. In this paper, we construct special deep neural network, ie, MalDeepNet (TB-Malnet and IB-Malnet) for malware dynamic behavior classification tasks. Then we build the family clustering algorithm based on deep learning and fulfil related testing. Except that, we also design a novel malware prediction model which could detect the malware coming in future through the Mal Generative Adversarial Network (Mal-GAN) implementation. All those algorithms present fairly considerable value in related datasets afterwards.

[1]  Qinghua Zhang,et al.  MetaAware: Identifying Metamorphic Malware , 2007, Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007).

[2]  Tom Schaul,et al.  FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[3]  Sakir Sezer,et al.  N-opcode analysis for android malware classification and categorization , 2016, 2016 International Conference On Cyber Security And Protection Of Digital Services (Cyber Security).

[4]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[5]  Zhi Chen,et al.  Adversarial Feature Matching for Text Generation , 2017, ICML.

[6]  Zhenkai Liang,et al.  Monet: A User-Oriented Behavior-Based Malware Variants Detection System for Android , 2016, IEEE Transactions on Information Forensics and Security.

[7]  Matt J. Kusner,et al.  GANS for Sequences of Discrete Elements with the Gumbel-softmax Distribution , 2016, ArXiv.

[8]  Andrew H. Sung,et al.  Static analyzer of vicious executables (SAVE) , 2004, 20th Annual Computer Security Applications Conference.

[9]  Ilia Nouretdinov,et al.  Transcend: Detecting Concept Drift in Malware Classification Models , 2017, USENIX Security Symposium.

[10]  Somesh Jha,et al.  Semantics-aware malware detection , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[11]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[12]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[13]  Heng Yin,et al.  Renovo: a hidden code extractor for packed executables , 2007, WORM '07.

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Salvatore J. Stolfo,et al.  Data mining methods for detection of new malicious executables , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[16]  D. Massart,et al.  The Mahalanobis distance , 2000 .

[17]  Isil Dillig,et al.  Automated Synthesis of Semantic Malware Signatures using Maximum Satisfiability , 2016, NDSS.

[18]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[19]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[20]  Somesh Jha,et al.  Static Analysis of Executables to Detect Malicious Patterns , 2003, USENIX Security Symposium.

[21]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Tudor Dumitras,et al.  FeatureSmith: Automatically Engineering Features for Malware Detection by Mining the Security Literature , 2016, CCS.

[23]  Christopher Krügel,et al.  Effective and Efficient Malware Detection at the End Host , 2009, USENIX Security Symposium.

[24]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[25]  Li Bai,et al.  Cosine Similarity Metric Learning for Face Verification , 2010, ACCV.

[26]  Rob Fergus,et al.  Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[27]  Yong Yu,et al.  Long Text Generation via Adversarial Training with Leaked Information , 2017, AAAI.

[28]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[29]  Christopher Krügel,et al.  Behavior-based Spyware Detection , 2006, USENIX Security Symposium.

[30]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[31]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[32]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[33]  Christopher Krügel,et al.  Limits of Static Analysis for Malware Detection , 2007, Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007).

[34]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[35]  Alexander Pretschner,et al.  Predicting the Resilience of Obfuscated Code Against Symbolic Execution Attacks via Machine Learning , 2017, USENIX Security Symposium.

[36]  Douglas S. Reeves,et al.  Fast malware classification by automated behavioral graph matching , 2010, CSIIRW '10.

[37]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[38]  Fu Jiang,et al.  XGBoost Classifier for DDoS Attack Detection and Analysis in SDN-Based Cloud , 2018, 2018 IEEE International Conference on Big Data and Smart Computing (BigComp).

[39]  Timothy Baldwin,et al.  An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation , 2016, Rep4NLP@ACL.

[40]  Joris Kinable,et al.  Malware classification based on call graph clustering , 2010, Journal in Computer Virology.

[41]  Bernhard Schölkopf,et al.  DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification , 2016, WSDM.

[42]  Somesh Jha,et al.  OmniUnpack: Fast, Generic, and Safe Unpacking of Malware , 2007, Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007).

[43]  Yoseba K. Penya,et al.  Idea: Opcode-Sequence-Based Malware Detection , 2010, ESSoS.

[44]  Halvar Flake,et al.  Structural Comparison of Executable Objects , 2004, DIMVA.

[45]  Kang G. Shin,et al.  Large-scale malware indexing using function-call graphs , 2009, CCS.

[46]  Marcus A. Maloof,et al.  Learning to Detect and Classify Malicious Executables in the Wild , 2006, J. Mach. Learn. Res..

[47]  J. Nash,et al.  NON-COOPERATIVE GAMES , 1951, Classics in Game Theory.

[48]  Andrew M. Dai,et al.  MaskGAN: Better Text Generation via Filling in the ______ , 2018, ICLR.

[49]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[50]  Karel Bartos,et al.  Optimized Invariant Representation of Network Traffic for Detecting Unseen Malware Variants , 2016, USENIX Security Symposium.

[51]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[52]  Samuel T. King,et al.  MAVMM: Lightweight and Purpose Built VMM for Malware Analysis , 2009, 2009 Annual Computer Security Applications Conference.

[53]  Wenke Lee,et al.  Ether: malware analysis via hardware virtualization extensions , 2008, CCS.

[54]  Hae-Jung Kim,et al.  Image-Based Malware Classification Using Convolutional Neural Network , 2017, CSA/CUTE.

[55]  A. Figalli Book Review: Optimal transport: old and new , 2010 .

[56]  Jules Desharnais,et al.  Static Detection of Malicious Code in Executable Programs , 2000 .

[57]  Carsten Willems,et al.  Learning and Classification of Malware Behavior , 2008, DIMVA.

[58]  Tyler Moore,et al.  Polymorphic Malware Detection Using Sequence Classification Methods , 2016, 2016 IEEE Security and Privacy Workshops (SPW).

[59]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[60]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[61]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[62]  A.H. Sung,et al.  Polymorphic malicious executable scanner by API sequence analysis , 2004, Fourth International Conference on Hybrid Intelligent Systems (HIS'04).

[63]  Juan Enrique Ramos,et al.  Using TF-IDF to Determine Word Relevance in Document Queries , 2003 .

[64]  Gianluca Stringhini,et al.  MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models (Extended Version) , 2016, NDSS 2017.

[65]  Jianguo Jiang,et al.  Based on Multi-features and Clustering Ensemble Method for Automatic Malware Categorization , 2017, 2017 IEEE Trustcom/BigDataSE/ICESS.

[66]  Gabriela Mesnita,et al.  Light GBM Machine Learning Algorithm to Online Click Fraud Detection , 2019, Journal of Information Assurance & Cybersecurity.

[67]  Zhi Wang,et al.  Xede: Practical Exploit Early Detection , 2015, RAID.

[68]  Thomas E. Dube Metamorphism as a Software Protection for Non-Malicious Code , 2012 .

[69]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[70]  Engin Kirda,et al.  UNVEIL: A large-scale, automated approach to detecting ransomware (keynote) , 2016, SANER.

[71]  Abdullah Al Nahid,et al.  Effective Intrusion Detection System Using XGBoost , 2018, Inf..

[72]  Yanfang Ye,et al.  IMDS: intelligent malware detection system , 2007, KDD '07.

[73]  Elizabeth R. Jessup,et al.  Matrices, Vector Spaces, and Information Retrieval , 1999, SIAM Rev..

[74]  Andy K. Bissett,et al.  Some human dimensions of computer virus creation and infection , 2000, Int. J. Hum. Comput. Stud..

[75]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[76]  Wenke Lee,et al.  PolyUnpack: Automating the Hidden-Code Extraction of Unpack-Executing Malware , 2006, 2006 22nd Annual Computer Security Applications Conference (ACSAC'06).

[77]  Yuichiro Kanzaki,et al.  Exploiting self-modification mechanism for program protection , 2003, Proceedings 27th Annual International Computer Software and Applications Conference. COMPAC 2003.

[78]  Hae-Sang Park,et al.  A simple and fast algorithm for K-medoids clustering , 2009, Expert Syst. Appl..

[79]  George Karypis,et al.  Hierarchical Clustering Algorithms for Document Datasets , 2005, Data Mining and Knowledge Discovery.

[80]  S. Katzenbeisser,et al.  Malware Normalization , 2005 .

[81]  Yoshua Bengio,et al.  Audio Chord Recognition with Recurrent Neural Networks , 2013, ISMIR.

[82]  Philip K. Chan,et al.  Scalable Function Call Graph-based Malware Classification , 2017, CODASPY.

[83]  Dimitris N. Metaxas,et al.  StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[84]  Eric Medvet,et al.  Effectiveness of Opcode ngrams for Detection of Multi Family Android Malware , 2015, 2015 10th International Conference on Availability, Reliability and Security.

[85]  Yin Cheng-xian An Improved K-Means Clustering Algorithm , 2014 .

[86]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[87]  Kevin Leach,et al.  LO-PHI: Low-Observable Physical Host Instrumentation for Malware Analysis , 2016, NDSS.

[88]  Juan Caballero,et al.  AVclass: A Tool for Massive Malware Labeling , 2016, RAID.

[89]  Yann LeCun,et al.  Very Deep Convolutional Networks for Text Classification , 2016, EACL.

[90]  Zhe Gan,et al.  Generating Text via Adversarial Training , 2016 .

[91]  David Keppel,et al.  Shade: a fast instruction-set simulator for execution profiling , 1994, SIGMETRICS.