Distributed PCA-based anomaly detection in telephone networks through legitimate-user profiling

In this paper we present a distributed mechanism based on Principal Component Analysis (PCA) to profile the behavior of the legitimate users in telephone networks. The idea is to take advantage of probes distributed over the network to obtain a compact snapshot of the users they serve. A collector node effectively combines such information to gather the description of the legitimate-user behavior. Eventually, it distributes the profile to the probes, which perform anomaly detection. Experimental results on several weeks of phone data collected by a telecom operator show that our profiling mechanism is stable over time and allows an operator to decentralize the anomaly detection stage directly to its probes. Furthermore, when compared to a centralized-PCA approach, our technique has the advantage of preventing the creation of polluted profiles, since it avoids that widespread anomalies, which are localized within one (or few) probes, enter into the description of the legitimate-user behavior.

[1]  Ling Huang,et al.  Distributed PCA and Network Anomaly Detection , 2006 .

[2]  Mark Handley,et al.  SIP: Session Initiation Protocol , 1999, RFC.

[3]  Christophe Diot,et al.  Diagnosing network-wide traffic anomalies , 2004, SIGCOMM.

[4]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[5]  Thomas Hofmann,et al.  Map-Reduce for Machine Learning on Multicore , 2007 .

[6]  J. A. López del Val,et al.  Principal Components Analysis , 2018, Applied Univariate, Bivariate, and Multivariate Statistics Using Python.

[7]  A. Ben Hamza,et al.  Cluster pca for outliers detection in high-dimensional data , 2007, 2007 IEEE International Conference on Systems, Man and Cybernetics.

[8]  Konstantina Papagiannaki,et al.  Structural analysis of network traffic flows , 2004, SIGMETRICS '04/Performance '04.

[9]  Mark Crovella,et al.  Distributed Spatial Anomaly Detection , 2008, IEEE INFOCOM 2008 - The 27th Conference on Computer Communications.

[10]  Mark Crovella,et al.  Characterization of network-wide anomalies in traffic flows , 2004, IMC '04.

[11]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  Saverio Niccolini,et al.  Analyzing Telemarketer Behavior in Massive Telecom Data Records , 2011 .