Shallow neural network with kernel approximation for prediction problems in highly demanding data networks

Abstract Intrusion detection and network traffic classification are two of the main research applications of machine learning to highly demanding data networks e.g. IoT/sensors networks. These applications present new prediction challenges and strict requirements to the models applied for prediction. The models must be fast, accurate, flexible and capable of managing large datasets. They must be fast at the training, but mainly at the prediction phase, since inevitable environment changes require constant periodic training, and real-time prediction is mandatory. The models need to be accurate due to the consequences of prediction errors. They need also to be flexible and able to detect complex behaviors, usually encountered in non-linear models and, finally, training and prediction datasets are usually large due to traffic volumes. These requirements present conflicting solutions, between fast and simple shallow linear models and the slower and richer non-linear and deep learning models. Therefore, the perfect solution would be a mixture of both worlds. In this paper, we present such a solution made of a shallow neural network with linear activations plus a feature transformation based on kernel approximation algorithms which provide the necessary richness and non-linear behavior to the whole model. We have studied several kernel approximation algorithms: Nystrom, Random Fourier Features and Fastfood transformation and have applied them to three datasets related to intrusion detection and network traffic classification. This work presents the first application of a shallow linear model plus a kernel approximation to prediction problems with highly demanding network requirements. We show that the prediction performance obtained by these algorithms is positioned in the same range as the best non-linear classifiers, with a significant reduction in computational times, making them appropriate for new highly demanding networks.

[1]  Nguyen Lam,et al.  Building Resilient and Autonomous Systems for IoT Network Management - Advantages and Difficulties in adopting Machine Learning Techniques , 2018 .

[2]  Rong Jin,et al.  Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison , 2012, NIPS.

[3]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[4]  G. Wahba,et al.  A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines , 1970 .

[5]  Shao-Bo Lin Limitations of shallow nets approximation , 2017, Neural Networks.

[6]  Věra Kůrková,et al.  Probabilistic lower bounds for approximation by shallow perceptron networks , 2017, Neural Networks.

[7]  Jugal K. Kalita,et al.  Network Anomaly Detection: Methods, Systems and Tools , 2014, IEEE Communications Surveys & Tutorials.

[8]  Jill Slay,et al.  The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set , 2016, Inf. Secur. J. A Glob. Perspect..

[9]  Ali A. Ghorbani,et al.  A detailed analysis of the KDD CUP 99 data set , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[10]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[11]  Marco Canini,et al.  Efficient application identification and the temporal and spatial stability of classification schema , 2009, Comput. Networks.

[12]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[13]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[14]  Jaime Lloret,et al.  Network Traffic Classifier With Convolutional and Recurrent Neural Networks for Internet of Things , 2017, IEEE Access.

[15]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[16]  Chaozheng Wang,et al.  An improved network traffic classification algorithm based on Hadoop decision tree , 2016, 2016 IEEE International Conference of Online Analysis and Computing Science (ICOACS).

[17]  Xin Wang,et al.  Machine Learning for Networking: Workflow, Advances and Opportunities , 2017, IEEE Network.

[18]  Hai-Hua Gao,et al.  LS-SVM Based Intrusion Detection using Kernel Space Approximation and Kernel-Target Alignment , 2006, 2006 6th World Congress on Intelligent Control and Automation.

[19]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[20]  Chaouki Khammassi,et al.  A GA-LR wrapper approach for feature selection in network intrusion detection , 2017, Comput. Secur..

[21]  Anamika Yadav,et al.  Performance analysis of NSL-KDD dataset using ANN , 2015, 2015 International Conference on Signal Processing and Communication Engineering Systems.

[22]  Franco Scarselli,et al.  On the Complexity of Neural Network Classifiers: A Comparison Between Shallow and Deep Architectures , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Shengnan Hao,et al.  Network traffic classification based on improved DAG-SVM , 2015, 2015 International Conference on Communications, Management and Telecommunications (ComManTel).

[24]  Mingtian Zhou,et al.  Internet traffic classification using feed-forward neural network , 2011, 2011 International Conference on Computational Problem-Solving (ICCP).

[25]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[26]  Federico Girosi,et al.  An Equivalence Between Sparse Approximation and Support Vector Machines , 1998, Neural Computation.

[27]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[28]  Salah El Hadaj,et al.  A Two-Stage Classifier Approach using RepTree Algorithm for Network Intrusion Detection , 2017 .

[29]  Tomaso A. Poggio,et al.  Learning Real and Boolean Functions: When Is Deep Better Than Shallow , 2016, ArXiv.

[30]  Cristian Sminchisescu,et al.  Efficient Match Kernel between Sets of Features for Visual Recognition , 2009, NIPS.

[31]  Jaime Lloret,et al.  Conditional Variational Autoencoder for Prediction and Feature Recovery Applied to Intrusion Detection in IoT , 2017, Sensors.

[32]  Andrew W. Moore,et al.  Traffic Classification Using a Statistical Approach , 2005, PAM.

[33]  Nour Moustafa,et al.  UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set) , 2015, 2015 Military Communications and Information Systems Conference (MilCIS).

[34]  Tapio Pahikkala,et al.  Fast Regularized Least Squares and k-means Clustering Method for Intrusion Detection Systems , 2015, ICPRAM.

[35]  Mahmod S. Mahmod,et al.  A COMPARISON STUDY FOR INTRUSION DATABASE (KDD99, NSL-KDD) BASED ON SELF ORGANIZATION MAP (SOM) ARTIFICIAL NEURAL NETWORK , 2013 .

[36]  Brian Kingsbury,et al.  Kernel Approximation Methods for Speech Recognition , 2017, J. Mach. Learn. Res..

[37]  Manas Ranjan Patra,et al.  Discriminative multinomial Naïve Bayes for network intrusion detection , 2010, 2010 Sixth International Conference on Information Assurance and Security.