Profiling and Clustering Internet Hosts

Identifying groups of Internet hosts with a similar behavior is very useful for many applications of Internet security control, such as DDoS defense, worm and virus detection, detection of botnets, etc. There are two major difficulties for modeling host behavior correctly and efficiently: the huge number of overall entities, and the dynamics of each individual. In this paper, we present and formulate the Internet host profiling problem using the header data from public packet traces to select relevant features of frequently-seen hosts for profile creation, and using hierarchical clustering techniques on the profiles to build a dendrogram containing all the hosts. The well-known agglomerative algorithm is used to discover and combine similarly-behaved hosts into clusters, and domain-knowledge is used to analyze and evaluate clustering results. In this paper, we show the results of applying the proposed clustering approach to a data set from NLANRPMA Internet traffic archive with more than 60,000 active hosts. On this dataset, our approach successfully identifies clusters with significant and interpretable features. We next use the created host profiles to detect anomalous behavior during the Slammer worm spread. The experimental results show that our profiling and clustering approach can successfully detect Slammer outbreak and identify majority of infected hosts.

[1]  Mark Crovella,et al.  Mining anomalies using traffic feature distributions , 2005, SIGCOMM '05.

[2]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[3]  Somesh Jha,et al.  Global Intrusion Detection in the DOMINO Overlay System , 2004, NDSS.

[4]  Sushil Jajodia,et al.  Detecting Novel Network Intrusions Using Bayes Estimators , 2001, SDM.

[5]  Jaideep Srivastava,et al.  A Comparative Study of Anomaly Detection Schemes in Network Intrusion Detection , 2003, SDM.

[6]  Michalis Faloutsos,et al.  BLINC: multilevel traffic classification in the dark , 2005, SIGCOMM '05.

[7]  Margaret H. Dunham,et al.  Data Mining: Introductory and Advanced Topics , 2002 .

[8]  Salvatore J. Stolfo,et al.  Behavior Profiling of Email , 2003, ISI.

[9]  Mark Allman,et al.  An Architecture for Developing Behavioral History , 2005, SRUTI.

[10]  Balachander Krishnamurthy,et al.  On network-aware clustering of Web clients , 2000, SIGCOMM.

[11]  Stefanos Manganaris,et al.  A Data Mining Analysis of RTID Alarms , 2000, Recent Advances in Intrusion Detection.

[12]  Jaideep Srivastava,et al.  Data Mining for Network Intrusion Detection , 2002 .

[13]  Jaideep Srivastava,et al.  Mining for Network Intrusion Detection , 2002 .

[14]  Andrew B. Nobel,et al.  Statistical Clustering of Internet Communication Patterns , 2003 .

[15]  Salvatore J. Stolfo,et al.  Data Mining Approaches for Intrusion Detection , 1998, USENIX Security Symposium.

[16]  Mark Handley,et al.  Topologically-aware overlay construction and server selection , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[17]  Anthony McGregor,et al.  Flow Clustering Using Machine Learning Techniques , 2004, PAM.

[18]  E. Bloedorn,et al.  Data mining for network intrusion detection : How to get started , 2001 .

[19]  Zhi-Li Zhang,et al.  Profiling internet backbone traffic: behavior models and applications , 2005, SIGCOMM '05.