Learning Neural Representations for Network Anomaly Detection

This paper proposes latent representation models for improving network anomaly detection. Well-known anomaly detection algorithms often suffer from challenges posed by network data, such as high dimension and sparsity, and a lack of anomaly data for training, model selection, and hyperparameter tuning. Our approach is to introduce new regularizers to a classical autoencoder (AE) and a variational AE, which force normal data into a very tight area centered at the origin in the nonsaturating area of the bottleneck unit activations. These trained AEs on normal data will push normal points toward the origin, whereas anomalies, which differ from normal data, will be put far away from the normal region. The models are very different from common regularized AEs, sparse AE, and contractive AE, in which the regularized AEs tend to make their latent representation less sensitive to changes of the input data. The bottleneck feature space is now used as a new data representation. A number of one-class learning algorithms are used for evaluating the proposed models. The experiments testify that our models help these classifiers to perform efficiently and consistently on high-dimensional and sparse network datasets, even with relatively few training points. More importantly, the models can minimize the effect of model selection on these classifiers since their performance is insensitive to a wide range of hyperparameter settings.

[1]  Nathalie Japkowicz,et al.  A Novelty Detection Approach to Classification , 1995, IJCAI.

[2]  Robert P. W. Duin,et al.  Support Vector Data Description , 2004, Machine Learning.

[3]  Hussein A. Abbass,et al.  Evaluation of an adaptive genetic-based signature extraction system for network intrusion detection , 2011, Pattern Analysis and Applications.

[4]  Ali A. Ghorbani,et al.  A detailed analysis of the KDD CUP 99 data set , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[5]  Colin Campbell,et al.  A Linear Programming Approach to Novelty Detection , 2000, NIPS.

[6]  Alejandro Zunino,et al.  An empirical comparison of botnet detection methods , 2014, Comput. Secur..

[7]  Junaid Qadir,et al.  Unsupervised Machine Learning for Networking: Techniques, Applications and Research Challenges , 2017, IEEE Access.

[8]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[9]  Malcolm I. Heywood,et al.  Data analytics on network traffic flows for botnet behaviour detection , 2016, 2016 IEEE Symposium Series on Computational Intelligence (SSCI).

[10]  Kalyan Veeramachaneni,et al.  AI^2: Training a Big Data Machine to Defend , 2016, 2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS).

[11]  Shehroz S. Khan,et al.  One-class classification: taxonomy of study and review of techniques , 2013, The Knowledge Engineering Review.

[12]  Alfredo De Santis,et al.  Network anomaly detection with the restricted Boltzmann machine , 2013, Neurocomputing.

[13]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[14]  Vir V. Phoha,et al.  Internet Security Dictionary , 2002, Springer New York.

[15]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[16]  John McHugh,et al.  Testing Intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory , 2000, TSEC.

[17]  Christopher Leckie,et al.  High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning , 2016, Pattern Recognit..

[18]  Nour Moustafa,et al.  UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set) , 2015, 2015 Military Communications and Information Systems Conference (MilCIS).

[19]  Mohiuddin Ahmed,et al.  A survey of network anomaly detection techniques , 2016, J. Netw. Comput. Appl..

[20]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[21]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Christopher Leckie,et al.  An Efficient Clustering Scheme to Exploit Hierarchical Data in Network Traffic Analysis , 2008, IEEE Transactions on Knowledge and Data Engineering.

[23]  Malcolm I. Heywood,et al.  Smart Phone User Behaviour Characterization Based on Autoencoders and Self Organizing Maps , 2016, 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW).

[24]  Takehisa Yairi,et al.  Anomaly Detection Using Autoencoders with Nonlinear Dimensionality Reduction , 2014, MLSDA'14.

[25]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[26]  Miguel Nicolau,et al.  One-Class Classification for Anomaly Detection with Kernel Density Estimation and Genetic Programming , 2016, EuroGP.

[27]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[28]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[29]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[30]  Miguel Nicolau,et al.  A Hybrid Autoencoder and Density Estimation Model for Anomaly Detection , 2016, PPSN.

[31]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[32]  H. Bourlard,et al.  Auto-association by multilayer perceptrons and singular value decomposition , 1988, Biological Cybernetics.

[33]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[34]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[35]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[36]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[37]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[38]  Arthur Zimek,et al.  On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study , 2016, Data Mining and Knowledge Discovery.

[39]  Hans-Peter Kriegel,et al.  A survey on unsupervised outlier detection in high‐dimensional numerical data , 2012, Stat. Anal. Data Min..

[40]  Lutz Prechelt,et al.  Early Stopping-But When? , 1996, Neural Networks: Tricks of the Trade.

[41]  Christopher Leckie,et al.  R1SVM: A Randomised Nonlinear Approach to Large-Scale Anomaly Detection , 2015, AAAI.

[42]  Hongxing He,et al.  Outlier Detection Using Replicator Neural Networks , 2002, DaWaK.

[43]  Marc'Aurelio Ranzato,et al.  Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[44]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[45]  Jaideep Srivastava,et al.  A Comparative Study of Anomaly Detection Schemes in Network Intrusion Detection , 2003, SDM.