Detecting and Coloring Anomalies in Real Cellular Network Using Principle Component Analysis

Anomaly detection in a communication network is a powerful tool for predicting faults, detecting network sabotage attempts and learning user profiles for marketing purposes and quality of services improvements. In this article, we convert the unsupervised data mining learning problem into a supervised classification problem. We will propose three methods for creating an associative anomaly within a given commercial traffic data database and demonstrate how, using the Principle Component Analysis (PCA) algorithm, we can detect the network anomaly behavior and classify between a regular data stream and a data stream that deviates from a routine, at the IP network layer level. Although the PCA method was used in the past for the task of anomaly detection, there are very few examples where such tasks were performed on real traffic data that was collected and shared by a commercial company.

[1]  T. W. Anderson,et al.  An Introduction to Multivariate Statistical Analysis , 1959 .

[2]  Krishna P. Gummadi,et al.  Towards Detecting Anomalous User Behavior in Online Social Networks , 2014, USENIX Security Symposium.

[3]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[4]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[5]  Albert G. Greenberg,et al.  Network anomography , 2005, IMC '05.

[6]  Martin May,et al.  Impact of packet sampling on anomaly detection metrics , 2006, IMC '06.

[7]  Mark Crovella,et al.  Mining anomalies using traffic feature distributions , 2005, SIGCOMM '05.

[8]  Fernando Silveira,et al.  Detectability of Traffic Anomalies in Two Adjacent Networks , 2007, PAM.

[9]  Paul Barford,et al.  A signal analysis of network traffic anomalies , 2002, IMW '02.

[10]  I K Fodor,et al.  A Survey of Dimension Reduction Techniques , 2002 .

[11]  Mark Crovella,et al.  Characterization of network-wide anomalies in traffic flows , 2004, IMC '04.

[12]  Yali Amit,et al.  Shape Quantization and Recognition with Randomized Trees , 1997, Neural Computation.

[13]  Carsten Lund,et al.  Online identification of hierarchical heavy hitters: algorithms, evaluation, and applications , 2004, IMC '04.

[14]  Christophe Diot,et al.  Diagnosing network-wide traffic anomalies , 2004, SIGCOMM.

[15]  Konstantina Papagiannaki,et al.  Structural analysis of network traffic flows , 2004, SIGMETRICS '04/Performance '04.

[16]  Hui Zang,et al.  Is sampled data sufficient for anomaly detection? , 2006, IMC '06.

[17]  R. Muirhead Aspects of Multivariate Statistical Theory , 1982, Wiley Series in Probability and Statistics.

[18]  Balachander Krishnamurthy,et al.  Sketch-based change detection: methods, evaluation, and applications , 2003, IMC '03.

[19]  Hui Zang,et al.  Impact of Packet Sampling on Portscan Detection , 2006, IEEE Journal on Selected Areas in Communications.

[20]  Kavé Salamatian,et al.  Combining filtering and statistical methods for anomaly detection , 2005, IMC '05.

[21]  Kezhi Mao,et al.  Identifying critical variables of principal components for unsupervised feature selection , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[22]  George Varghese,et al.  Automatically inferring patterns of resource consumption in network traffic , 2003, SIGCOMM '03.

[23]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.

[24]  Ramesh Govindan,et al.  Detection and identification of network anomalies using sketch subspaces , 2006, IMC '06.