A PCA Analysis of Daily Unwanted Traffic

This paper investigates the macroscopic behavior of unwanted traffic (e.g., virus, worm, backscatter of (D)DoS or misconfiguration) passing through the Internet. The data set we used are unwanted packets measured at /18 darknet in Japan from Oct. 2006 to Apr. 2009 that included the recent Conficker outbreak. The traffic behavior is quantified by the entropy of ten packet features (e.g., 5-tuple). Then, we apply PCA (principal component analysis) to a ten dimensional entropy time series matrix to obtain a suitable representation of unwanted traffic. PCA is a well-known and studied method for finding out normal and anomalous behaviors in Internet backbone traffic, however, few studies applied it to darknet traffic. We first demonstrate the high variability nature of the entropy time series for ten packet features. Next, we show that the top four principal components are sufficiently enough to describe the original traffic behavior. In particular, the first component can be interpreted as the type of unwanted traffic (i. e., worm/virus or scanning), and the second one as the difference in communication patterns (e. g., one-to-many or many-to-one). Those two components account for 63.8\% of the original data set in terms of the total variance. On the other hand, the outliers in the higher components indicate the presence of specific anomalies although most of mapped data to the components have less variability. Furthermore, we show that the scatter plot of the first and second principal component scores provides us with a better view of the macroscopic unwanted traffic behavior.

[1]  Kensuke Fukuda,et al.  An image processing approach to traffic anomaly detection , 2008, AINTEC '08.

[2]  Ian T. Jolliffe,et al.  Discarding Variables in a Principal Component Analysis. I: Artificial Data , 1972 .

[3]  R. Cattell FACTOR ANALYIS: AN INTRODUCTION TO ESSENTIALS. II. THE ROLE OF FACTOR ANALYSIS IN RESEARCH. , 1965, Biometrics.

[4]  Paul Barford,et al.  A signal analysis of network traffic anomalies , 2002, IMW '02.

[5]  Martin Roesch,et al.  Snort - Lightweight Intrusion Detection for Networks , 1999 .

[6]  Farnam Jahanian,et al.  The Internet Motion Sensor - A Distributed Blackhole Monitoring System , 2005, NDSS.

[7]  Christophe Diot,et al.  Diagnosing network-wide traffic anomalies , 2004, SIGCOMM.

[8]  Kensuke Fukuda,et al.  Seven Years and One Day: Sketching the Evolution of Internet Traffic , 2009, IEEE INFOCOM 2009.

[9]  Kensuke Fukuda,et al.  Extracting hidden anomalies using sketch and non Gaussian multiresolution statistical detection procedures , 2007, LSAD '07.

[10]  Mark Crovella,et al.  Mining anomalies using traffic feature distributions , 2005, SIGCOMM '05.

[11]  Vyas Sekar,et al.  An empirical evaluation of entropy-based traffic anomaly detection , 2008, IMC '08.

[12]  Vinod Yegneswaran,et al.  Characteristics of internet background radiation , 2004, IMC '04.

[13]  David Moore,et al.  Code-Red: a case study on the spread and victims of an internet worm , 2002, IMW '02.

[14]  Kensuke Fukuda,et al.  Correlation Among Piecewise Unwanted Traffic Time Series , 2008, IEEE GLOBECOM 2008 - 2008 IEEE Global Telecommunications Conference.

[15]  Jennifer Rexford,et al.  Sensitivity of PCA for traffic anomaly detection , 2007, SIGMETRICS '07.

[16]  Hari Balakrishnan,et al.  Fast portscan detection using sequential hypothesis testing , 2004, IEEE Symposium on Security and Privacy, 2004. Proceedings. 2004.

[17]  Michael K. Reiter,et al.  Traffic Aggregation for Malware Detection , 2008, DIMVA.

[18]  Vern Paxson,et al.  A brief history of scanning , 2007, IMC '07.

[19]  Kotagiri Ramamohanarao,et al.  A probabilistic approach to detecting network scans , 2002, NOMS 2002. IEEE/IFIP Network Operations and Management Symposium. ' Management Solutions for the New Communications World'(Cat. No.02CH37327).

[20]  Stefan Savage,et al.  Inferring Internet denial-of-service activity , 2001, TOCS.