Autoencoder-based feature learning for cyber security applications

This paper presents a novel feature learning model for cyber security tasks. We propose to use Auto-encoders (AEs), as a generative model, to learn latent representation of different feature sets. We show how well the AE is capable of automatically learning a reasonable notion of semantic similarity among input features. Specifically, the AE accepts a feature vector, obtained from cyber security phenomena, and extracts a code vector that captures the semantic similarity between the feature vectors. This similarity is embedded in an abstract latent representation. Because the AE is trained in an unsupervised fashion, the main part of this success comes from appropriate original feature set that is used in this paper. It can also provide more discriminative features in contrast to other feature engineering approaches. Furthermore, the scheme can reduce the dimensionality of the features thereby signicantly minimising the memory requirements. We selected two different cyber security tasks: networkbased anomaly intrusion detection and Malware classication. We have analysed the proposed scheme with various classifiers using publicly available datasets for network anomaly intrusion detection and malware classifications. Several appropriate evaluation metrics show improvement compared to prior results.

[1]  Yang Yu,et al.  A Hybrid Spectral Clustering and Deep Neural Network Ensemble Algorithm for Intrusion Detection in Sensor Networks , 2016, Sensors.

[2]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[3]  Václav Snásel,et al.  Fuzzy classification by evolutionary algorithms , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[4]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[5]  Yoshua Bengio,et al.  Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.

[6]  K. Strimmer,et al.  Optimal Whitening and Decorrelation , 2015, 1512.00809.

[7]  Mansour Ahmadi,et al.  Novel Feature Extraction, Selection and Fusion for Effective Malware Family Classification , 2015, CODASPY.

[8]  Ahmad Akbari,et al.  Class dependent feature transformation for intrusion detection systems , 2011, 2011 19th Iranian Conference on Electrical Engineering.

[9]  John J. Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities , 1999 .

[10]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[11]  Christian Diedrich,et al.  Accelerated deep neural networks for enhanced Intrusion Detection System , 2016, 2016 IEEE 21st International Conference on Emerging Technologies and Factory Automation (ETFA).

[12]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[13]  Siu-Ming Yiu,et al.  A multi-task learning model for malware classification with useful file access pattern from API call sequence , 2016, ArXiv.

[14]  Quoc V. Le,et al.  On optimization methods for deep learning , 2011, ICML.

[15]  John Cavazos,et al.  HADM: Hybrid Analysis for Detection of Malware , 2016, IntelliSys.

[16]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[17]  Ahmed Patel,et al.  A survey of intrusion detection and prevention systems , 2010, Inf. Manag. Comput. Secur..

[18]  Yang Liu,et al.  subgraph2vec: Learning Distributed Representations of Rooted Sub-graphs from Large Graphs , 2016, ArXiv.

[19]  Ali A. Ghorbani,et al.  A detailed analysis of the KDD CUP 99 data set , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[20]  Max Mühlhäuser,et al.  Unsupervised Anomaly Detection in Noisy Business Process Event Logs Using Denoising Autoencoders , 2016, DS.

[21]  Yu-Lin He,et al.  Fuzziness based semi-supervised learning approach for intrusion detection system , 2017, Inf. Sci..

[22]  Vijay Varadharajan,et al.  Intrusion detection techniques in cloud environment: A survey , 2017, J. Netw. Comput. Appl..

[23]  Gabriel Maciá-Fernández,et al.  Anomaly-based network intrusion detection: Techniques, systems and challenges , 2009, Comput. Secur..

[24]  Nhien-An Le-Khac,et al.  Collective Anomaly Detection Based on Long Short-Term Memory Recurrent Neural Networks , 2016, FDSE.

[25]  Takeshi Yagi,et al.  Malware Detection with Deep Neural Network Using Process Behavior , 2016, 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC).

[26]  Richard S. Zemel,et al.  Minimizing Description Length in an Unsupervised Neural Network , 2000 .

[27]  Carsten Willems,et al.  Automatic analysis of malware behavior using machine learning , 2011, J. Comput. Secur..

[28]  Mansoor Alam,et al.  A Deep Learning Approach for Network Intrusion Detection System , 2016, EAI Endorsed Trans. Security Safety.

[29]  Yao Wang,et al.  A deep learning approach for detecting malicious JavaScript code , 2016, Secur. Commun. Networks.

[30]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[31]  Salvatore J. Stolfo,et al.  Anomalous Payload-Based Network Intrusion Detection , 2004, RAID.

[32]  Miguel Á. Carreira-Perpiñán,et al.  On Contrastive Divergence Learning , 2005, AISTATS.

[33]  Razvan Pascanu,et al.  Malware classification with recurrent networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[34]  Sungzoon Cho,et al.  Variational Autoencoder based Anomaly Detection using Reconstruction Probability , 2015 .

[35]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[36]  Ruslan Salakhutdinov,et al.  Learning Deep Generative Models , 2009 .

[37]  Nathan S. Netanyahu,et al.  DeepSign: Deep learning for automatic malware signature generation and classification , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[38]  Wenyi Huang,et al.  MtNet: A Multi-Task Neural Network for Dynamic Malware Classification , 2016, DIMVA.

[39]  Miguel Nicolau,et al.  A Hybrid Autoencoder and Density Estimation Model for Anomaly Detection , 2016, PPSN.

[40]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.