Principal Components of Port-Address Matrices in Port-Scan Analysis

There are many studies aiming at using port-scan traffic data for the fast and accurate detection of rapidly spreading worms. This paper proposes two new methods for reducing the traffic data to a simplified form comprising significant components of smaller dimensionality. (1) Dimension reduction via Term Frequency --- Inverse Document Frequency (TF-IDF) values, a technique used in information retrieval, is used to choose significant ports and addresses in terms of their "importance" for classification. (2) Dimension reduction via Principal Component Analysis (PCA), widely used as a tool in exploratory data analysis, enables estimation of how uniformly the sensors are distributed over the reduced coordinate system. PCA gives a scatter plot for the sensors, which helps to detect abnormal behavior in both the source address space and the destination port space. In addition to our proposals, we report on experiments that use the Internet Scan Data Acquisition System (ISDAS) distributed observation data from the Japan Computer Emergency Response Team (JPCERT).

[1]  Donald F. Towsley,et al.  Code red worm propagation modeling and analysis , 2002, CCS '02.

[2]  David Moore,et al.  The Spread of the Witty Worm , 2004, IEEE Secur. Priv..

[3]  Carrie Gates,et al.  SWorD - A Simple Worm Detection Scheme , 2007, OTM Conferences.

[4]  Stefan Savage,et al.  Inside the Slammer Worm , 2003, IEEE Secur. Priv..

[5]  Abhishek Kumar,et al.  Exploiting Underlying Structure for Detailed Reconstruction of an Internet-scale Event , 2005, Internet Measurement Conference.

[6]  Hari Balakrishnan,et al.  Fast portscan detection using sequential hypothesis testing , 2004, IEEE Symposium on Security and Privacy, 2004. Proceedings. 2004.