Multi-Scale LSTM Model for BGP Anomaly Classification

As a policy-based routing protocol, the primary purpose of Border Gateway Protocol (BGP) is to exchange routing reachability information to provide sufficient end-to-end Quality-of-Service (QoS). The constant increase of anomalous traffic of BGP affects the connectivity and reachability of routing information among different Autonomous Systems (ASs), which calls for building accurate alerting models to provide stable routing services in the Internet. The previous works classify anomalies without considering the characteristic of multiple time scales, which may lead to inaccurate classification. In this paper, we propose a novel Multi-Scale Long Short-Term Memory (MSLSTM) model to capture the anomalous behaviors from BGP traffic. In our model, a Discrete Wavelet Transform is used to obtain temporal information on multiple scales, and a hierarchical two-layer LSTM architecture is devised where the first layer learns the attentions of different time scales to generate an integrated historical representation, and the second layer captures the temporal dependency in the learned representation. To evaluate the feasibility in different alerting scenarios, we conduct comprehensive experiments based on several BGP data sets collected from real world applications. The results demonstrate that our model achieves a promising performance compared with the state-of-the-art approaches.

[1]  Yoshua Bengio,et al.  Attention-Based Models for Speech Recognition , 2015, NIPS.

[2]  Martin D. Levine,et al.  Visual Saliency Based on Scale-Space Analysis in the Frequency Domain , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[4]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[5]  Ljiljana Trajkovic,et al.  Detecting BGP anomalies using machine learning techniques , 2016, 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[6]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[7]  Daniel Roggen,et al.  Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition , 2016, Sensors.

[8]  Xizhao Wang,et al.  Classification of BGP anomalies using decision trees and fuzzy rough sets , 2014, 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[9]  Jing Yuan,et al.  Wavelet transform based on inner product in fault diagnosis of rotating machinery: A review , 2016 .

[10]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[11]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[12]  Xiaoli Li,et al.  Deep Convolutional Neural Networks on Multichannel Time Series for Human Activity Recognition , 2015, IJCAI.

[13]  Yong Du,et al.  Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[15]  Reza Pulungan,et al.  Prediction by a Hybrid of Wavelet Transform and Long-Short-Term-Memory Neural Network , 2017 .

[16]  Greg Mori,et al.  A Hierarchical Deep Temporal Model for Group Activity Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Benjamin Schrauwen,et al.  Training and Analysing Deep Recurrent Neural Networks , 2013, NIPS.

[18]  Lovekesh Vig,et al.  Long Short Term Memory Networks for Anomaly Detection in Time Series , 2015, ESANN.

[19]  Daniel A. Keim,et al.  Visual Analysis of Time‐Series Similarities for Anomaly Detection in Sensor Networks , 2014, Comput. Graph. Forum.

[20]  Guoying Wang,et al.  Unsupervised network anomaly detection based on abnormality weights and subspace clustering , 2016, 2016 Sixth International Conference on Information Science and Technology (ICIST).

[21]  Lovekesh Vig,et al.  Anomaly detection in ECG time signals via deep long short-term memory networks , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[22]  Luis F. Ortega A Neuro-wavelet Method for the Forecasting of Financial Time Series , 2012 .

[23]  Peter A. Flach,et al.  A Coherent Interpretation of AUC as a Measure of Aggregated Classification Performance , 2011, ICML.

[24]  Eamonn J. Keogh,et al.  Extracting Optimal Performance from Dynamic Time Warping , 2016, KDD.

[25]  Antonio Pescapè,et al.  Issues and future directions in traffic classification , 2012, IEEE Network.

[26]  Andrea Rosà,et al.  Failure Analysis and Prediction for Big-Data Systems , 2017, IEEE Transactions on Services Computing.

[27]  Stefano Russo,et al.  Assessing Invariant Mining Techniques for Cloud-Based Utility Computing Systems , 2020, IEEE Transactions on Services Computing.

[28]  R. Desimone,et al.  Neural mechanisms of selective visual attention. , 1995, Annual review of neuroscience.

[29]  Dai Tho Nguyen,et al.  POCAD: A novel pay load-based one-class classifier for anomaly detection , 2016, 2016 3rd National Foundation for Science and Technology Development Conference on Information and Computer Science (NICS).

[30]  Ljiljana Trajkovic,et al.  Comparison of machine learning models for classification of BGP anomalies , 2012 .

[31]  Geoffrey E. Hinton,et al.  Grammar as a Foreign Language , 2014, NIPS.

[32]  Ljiljana Trajkovic,et al.  Feature selection for classification of BGP anomalies using Bayesian models , 2012, 2012 International Conference on Machine Learning and Cybernetics.

[33]  Richard Socher,et al.  Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.

[34]  Georgios Theodoridis,et al.  BGPfuse: using visual feature fusion for the detection and attribution of BGP anomalies , 2013, VizSec '13.

[35]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[36]  Anukool Lakhina,et al.  Multivariate Online Anomaly Detection Using Kernel Recursive Least Squares , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[37]  Kazunori Matsumoto,et al.  Sequence-to-Sequence Model with Attention for Time Series Classification , 2016, 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW).

[38]  Charles Elkan,et al.  Learning to Diagnose with LSTM Recurrent Neural Networks , 2015, ICLR.

[39]  Jing Lin,et al.  Sliding Window-Based Fault Detection From High-Dimensional Data Streams , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[40]  Pavel Filonov,et al.  Multivariate Industrial Time Series with Cyber-Attack Simulation: Fault Detection Using an LSTM-based Predictive Data Model , 2016, ArXiv.

[41]  Huanhuan Chen,et al.  Model-Based Oversampling for Imbalanced Sequence Classification , 2016, CIKM.

[42]  Tingting Li,et al.  Multi-level Anomaly Detection in Industrial Control Systems via Package Signatures and LSTM Networks , 2017, 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[43]  Qian Xu,et al.  MS-LSTM: A multi-scale LSTM model for BGP anomaly detection , 2016, 2016 IEEE 24th International Conference on Network Protocols (ICNP).