An Efficient Clustering Scheme to Exploit Hierarchical Data in Network Traffic Analysis

There is significant interest in the data mining and network management communities about the need to improve existing techniques for clustering multivariate network traffic flow records so that we can quickly infer underlying traffic patterns. In this paper, we investigate the use of clustering techniques to identify interesting traffic patterns from network traffic data in an efficient manner. We develop a framework to deal with mixed type attributes including numerical, categorical, and hierarchical attributes for a one-pass hierarchical clustering algorithm. We demonstrate the improved accuracy and efficiency of our approach in comparison to previous work on clustering network traffic.

[1]  James Won-Ki Hong,et al.  A flow-based method for abnormal network traffic detection , 2004, 2004 IEEE/IFIP Network Operations and Management Symposium (IEEE Cat. No.04CH37507).

[2]  Eugene H. Spafford,et al.  A PATTERN MATCHING MODEL FOR MISUSE INTRUSION DETECTION , 1994 .

[3]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[4]  Simin Nadjm-Tehrani,et al.  ADWICE - Anomaly Detection with Real-Time Incremental Clustering , 2004, ICISC.

[5]  Marina Vannucci,et al.  Detecting Traffic Anomalies at the Source through aggregate analysis of packet header data , 2003 .

[6]  Ajita John,et al.  PISA: Automatic Extraction of Traffic Signatures , 2005, NETWORKING.

[7]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[8]  Michael A. West,et al.  Bayesian Inference on Network Traffic Using Link Count Data , 1998 .

[9]  kc claffy,et al.  Application of sampling methodologies to network traffic characterization , 1993, SIGCOMM 1993.

[10]  Matthew V. Mahoney,et al.  Network traffic anomaly detection based on packet bytes , 2003, SAC '03.

[11]  V. Paxson,et al.  WHERE MATHEMATICS MEETS THE INTERNET , 1998 .

[12]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[13]  George Varghese,et al.  Automatically inferring patterns of resource consumption in network traffic , 2003, SIGCOMM '03.

[14]  George Varghese,et al.  New directions in traffic measurement and accounting , 2002, CCRV.

[15]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[16]  Bing Yu,et al.  Time-Varying Network Tomography: Router Link Data , 2000 .

[17]  Salvatore J. Stolfo,et al.  Anomalous Payload-Based Network Intrusion Detection , 2004, RAID.

[18]  George Kesidis,et al.  Efficient Mining of the Multidimensional Traffic Cluster Hierarchy for Digesting, Visualization, and Anomaly Identification , 2006, IEEE Journal on Selected Areas in Communications.

[19]  Vince Fuller,et al.  Classless Inter-Domain Routing (CIDR): an Address Assignment and Aggregation Strategy , 1993, RFC.

[20]  M. Shyu,et al.  A Novel Anomaly Detection Scheme Based on Principal Component Classifier , 2003 .

[21]  Carsten Lund,et al.  Charging from sampled network usage , 2001, IMW '01.

[22]  Yin Zhang,et al.  On the constancy of internet path properties , 2001, IMW '01.

[23]  Paul Barford,et al.  A signal analysis of network traffic anomalies , 2002, IMW '02.

[24]  Anupam Joshi,et al.  On Mining Web Access Logs , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[25]  Kun-Chan Lan,et al.  A measurement study of correlations of Internet flow characteristics , 2006, Comput. Networks.

[26]  Byeong-Hee Roh,et al.  A Novel Detection Methodology of Network Attack Symptoms at Aggregate Traffic Level on Highspeed Internet Backbone Links , 2004, ICT.

[27]  Kavé Salamatian,et al.  Traffic matrix estimation: existing techniques and new directions , 2002, SIGCOMM '02.

[28]  Konstantina Papagiannaki,et al.  Structural analysis of network traffic flows , 2004, SIGMETRICS '04/Performance '04.

[29]  Divesh Srivastava,et al.  Finding Hierarchical Heavy Hitters in Data Streams , 2003, VLDB.

[30]  Anja Feldmann,et al.  A non-instrusive, wavelet-based approach to detecting network performance problems , 2001, IMW '01.

[31]  Yakov Rekhter,et al.  An Architecture for IP Address Allocation with CIDR , 1993, RFC.

[32]  Y. Vardi,et al.  Network Tomography: Estimating Source-Destination Traffic Intensities from Link Data , 1996 .

[33]  Abhishek Kumar,et al.  Data streaming algorithms for efficient and accurate estimation of flow size distribution , 2004, SIGMETRICS '04/Performance '04.

[34]  Mark Crovella,et al.  Characterization of network-wide anomalies in traffic flows , 2004, IMC '04.

[35]  Divesh Srivastava,et al.  Diamond in the rough: finding Hierarchical Heavy Hitters in multi-dimensional data , 2004, SIGMOD '04.

[36]  Koji Koyamada,et al.  Hierarchical visualization of network intrusion detection data , 2006, IEEE Computer Graphics and Applications.

[37]  M. Sloman Network and distributed systems management , 1994 .

[38]  Farouk Kamoun,et al.  Traffic Anomaly Detection and Characterization in the Tunisian National University Network , 2006, Networking.

[39]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[40]  Jennifer Widom,et al.  Exploiting hierarchical domain structure to compute similarity , 2003, TOIS.

[41]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[42]  John Heidemann,et al.  On the correlation of Internet flow characteristics , 2003 .

[43]  Jianping Pan,et al.  Fast and accurate traffic matrix measurement using adaptive cardinality counting , 2005, MineNet '05.

[44]  Kristopher Kendall,et al.  A Database of Computer Attacks for the Evaluation of Intrusion Detection Systems , 1999 .

[45]  Moshe Zukerman,et al.  Broadband traffic modeling: simple solutions to hard problems , 1998, IEEE Commun. Mag..

[46]  Christophe Diot,et al.  A Two-step Statistical Approach for Inferring Network Traffic Demands ∗ , 2004 .

[47]  Jean-Chrysostome Bolot,et al.  Characterizing End-to-End Packet Delay and Loss in the Internet , 1993, J. High Speed Networks.

[48]  Vern Paxson,et al.  End-to-end Internet packet dynamics , 1997, SIGCOMM '97.