NetCluster: A clustering-based framework to analyze internet passive measurements data

Internet measured data collected via passive measurement are analyzed to obtain localization information on nodes by clustering (i.e., grouping together) nodes that exhibit similar network path properties. Since traditional clustering algorithms fail to correctly identify clusters of homogeneous nodes, we propose the NetCluster novel framework, suited to analyze Internet measurement datasets. We show that the proposed framework correctly analyzes synthetically generated traces. Finally, we apply it to real traces collected at the access link of Politecnico di Torino campus LAN and discuss the network characteristics as seen at the vantage point.

[1]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[2]  Elena Baralis,et al.  NetCluster: A Clustering-Based Framework for Internet Tomography , 2009, 2009 IEEE International Conference on Communications.

[3]  Aidong Zhang,et al.  WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases , 1998, VLDB.

[4]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[6]  Walid Dabbous,et al.  Securing internet coordinate embedding systems , 2007, SIGCOMM 2007.

[7]  Aurora Pons-Porrata,et al.  An Incremental Clustering Algorithm Based on Compact Sets with Radius alpha , 2005, CIARP.

[8]  Biing-Hwang Juang,et al.  The segmental K-means algorithm for estimating parameters of hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[9]  Yuval Shavitt,et al.  Spatial-temporal analysis of passive TCP measurements , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[10]  A. Feldmann,et al.  Can ISPs and P2P systems co-operate for improved performance? , 2007 .

[11]  Yin Zhang,et al.  BGP routing stability of popular destinations , 2002, IMW '02.

[12]  Miin-Shen Yang,et al.  A robust EM clustering algorithm for Gaussian mixture models , 2012, Pattern Recognit..

[13]  Walid Dabbous,et al.  Securing internet coordinate embedding systems , 2007, SIGCOMM '07.

[14]  Donald F. Towsley,et al.  Multicast-based inference of network-internal loss characteristics , 1999, IEEE Trans. Inf. Theory.

[15]  Mark Handley,et al.  Topologically-aware overlay construction and server selection , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[16]  Hans-Peter Kriegel,et al.  Incremental Clustering for Mining in a Data Warehousing Environment , 1998, VLDB.

[17]  Alfred O. Hero,et al.  Initialization Free Graph Based Clustering , 2009 .

[18]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[19]  Dario Rossi,et al.  Experiences of Internet traffic monitoring with tstat , 2011, IEEE Network.

[20]  Giuseppe Di Battista,et al.  26 Computer Networks , 2004 .

[21]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[22]  Steve Uhlig,et al.  IP geolocation databases: unreliable? , 2011, CCRV.

[23]  Dario Rossi,et al.  Passive analysis of TCP anomalies , 2008, Comput. Networks.

[24]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[25]  Efendi N. Nasibov,et al.  Robustness of density-based clustering methods with various neighborhood relations , 2009, Fuzzy Sets Syst..

[26]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[27]  Yuval Shavitt,et al.  A Geolocation Databases Study , 2011, IEEE Journal on Selected Areas in Communications.

[28]  Y. Vardi,et al.  Network Tomography: Estimating Source-Destination Traffic Intensities from Link Data , 1996 .

[29]  Marco Mellia,et al.  Uncovering the Big Players of the Web , 2012, TMA.

[30]  Robert D. Nowak,et al.  Passive network tomography using EM algorithms , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[31]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[32]  Paul Barford,et al.  Network radar: tomography from round trip time measurements , 2004, IMC '04.

[33]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[34]  Paul Barford,et al.  Network discovery from passive measurements , 2008, SIGCOMM '08.

[35]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[36]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[37]  M. Cugmas,et al.  On comparing partitions , 2015 .