Discovering users with similar internet access performance through cluster analysis

A new methodology to analyze Internet access behavior using frequency histograms.A two-level clustering approach to analyze real network measurements with noisy data.A new distance measure to identify user histogram outliers.Data mining support for distributed Internet monitoring applications. Users typically subscribe to an Internet access service on the basis of a specific download speed, but the actual service may differ. Several projects are active collecting internet access performance measurements on a large scale at the end user location. However, less attention has been devoted to analyzing such data and to inform users on the received services. This paper presents MiND, a cluster-based methodology to analyze the characteristics of periodic Internet measurements collected at the end user location. MiND allows to discover (i) groups of users with a similar Internet access behavior and (ii) the (few) users with somehow anomalous service. User measurements over time have been modeled through histograms and then analyzed through a new two-level clustering strategy. MiNDhas been evaluated on real data collected by Neubot, an open source tool, voluntary installed by users, that periodically collects Internet measurements. Experimental results show that the majority of users can be grouped into homogeneous and cohesive clusters according to the Internet access service that they receive in practice, while a few users receiving anomalous services are correctly identified as outliers. Both users and ISPs can benefit from such information: users can constantly monitor the ISP offered service, whereas ISPs can quickly identify anomalous behaviors in their offered services and act accordingly.

[1]  Biing-Hwang Juang,et al.  The segmental K-means algorithm for estimating parameters of hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[2]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[3]  Elena Baralis,et al.  YouLighter: An Unsupervised Methodology to Unveil YouTube CDN Changes , 2015, 2015 27th International Teletraffic Congress.

[4]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[5]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[6]  Morteza Haghir Chehreghani,et al.  Density link-based methods for clustering web pages , 2009, Decis. Support Syst..

[7]  Maurice K. Wong,et al.  Algorithm AS136: A k-means clustering algorithm. , 1979 .

[8]  Sophia Daskalaki,et al.  Comparing forecasting approaches for Internet traffic , 2015, Expert Syst. Appl..

[9]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[10]  Jim Kurose,et al.  Computer Networking: A Top-Down Approach (6th Edition) , 2007 .

[11]  Raimir Holanda Filho,et al.  An Internet traffic classification methodology based on statistical discriminators , 2008, NOMS 2008 - 2008 IEEE Network Operations and Management Symposium.

[12]  Vern Paxson,et al.  Empirically derived analytic models of wide-area TCP connections , 1994, TNET.

[13]  Raimir Holanda Filho,et al.  Network traffic prediction using PCA and K-means , 2010, 2010 IEEE Network Operations and Management Symposium - NOMS 2010.

[14]  James Won-Ki Hong,et al.  An effective similarity metric for application traffic classification , 2010, 2010 IEEE Network Operations and Management Symposium - NOMS 2010.

[15]  Elena Baralis,et al.  Characterizing network traffic by means of the NetMine framework , 2009, Comput. Networks.

[16]  Simone Basso,et al.  Challenges and Issues on Collecting and Analyzing Large Volumes of Network Data Measurements , 2013, ADBIS.

[17]  Michel van de Velden,et al.  Online profiling and clustering of Facebook users , 2015, Decis. Support Syst..

[18]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[19]  Elena Baralis,et al.  SaFe-NeC: A scalable and flexible system for network data characterization , 2016, NOMS 2016 - 2016 IEEE/IFIP Network Operations and Management Symposium.

[20]  Elena Baralis,et al.  SeaRum: A Cloud-Based Service for Association Rule Mining , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.

[21]  Elena Baralis,et al.  NetCluster: A clustering-based framework to analyze internet passive measurements data , 2013, Comput. Networks.

[22]  Jim Kurose,et al.  Computer Networking: A Top-Down Approach , 1999 .

[23]  Tania Cerquitelli,et al.  Exploiting clustering algorithms in a multiple-level fashion: A comparative study in the medical care scenario , 2016, Expert Syst. Appl..

[24]  Catherine Combes,et al.  Clustering using principal component analysis applied to autonomy-disability of elderly people , 2013, Decis. Support Syst..

[25]  Marcel R. Ackermann,et al.  Clustering for metric and non-metric distance measures , 2008, SODA '08.

[26]  Bin Wu,et al.  Role defining using behavior-based clustering in telecommunication network , 2011, Expert Syst. Appl..

[27]  Sylvio Barbon Junior,et al.  Unsupervised learning clustering and self-organized agents applied to help network management , 2016, Expert Syst. Appl..