Clustering hosts in P2P and global computing platforms

Being able to identify clusters of nearby hosts among Internet clients provides very useful information for a number of internet and p2p applications. Examples of such applications include web applications, request routing in peer-to-peer overlay network, and distributed computing applications. In this paper, we present and formulate the internet host clustering problem. Leveraging previous work on internet host distance measurement, we propose two hierarchical clustering techniques to solve this problem. The first technique is a marker based hierarchical partitioning approach. The second technique is based on the well known K-means clustering algorithm. We evaluated these two approaches in simulation using a representative Internet topology generated with the GT ITM generator for over 1,000 hosts. Our simulation results demonstrate that our algorithmic clustering approaches effectively identify clusters with arbitrary diameters. Our conclusion is that by leveraging previous work on internet host distance estimation, it is possible to cluster Internet hosts to benefit various applications with various requirements.

[1]  Ibrahim Matta,et al.  BRITE: an approach to universal topology generation , 2001, MASCOTS 2001, Proceedings Ninth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[2]  Hui Zhang,et al.  Predicting Internet network distance with coordinates-based approaches , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[3]  Francine Berman,et al.  Models and scheduling mechanisms for global computing applications , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[4]  Krishna P. Gummadi,et al.  King: estimating latency between arbitrary internet end hosts , 2002, IMW '02.

[5]  Balachander Krishnamurthy,et al.  On network-aware clustering of Web clients , 2000, SIGCOMM.

[6]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[7]  Paul Francis,et al.  An architecture for a global Internet host distance estimation service , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[8]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[9]  Ben Y. Zhao,et al.  An Infrastructure for Fault-tolerant Wide-area Location and Routing , 2001 .

[10]  Azer Bestavros,et al.  DNS-based Internet client clustering and characterization , 2001, Proceedings of the Fourth Annual IEEE International Workshop on Workload Characterization. WWC-4 (Cat. No.01EX538).

[11]  Robert Morris,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM 2001.

[12]  Ellen W. Zegura,et al.  How to model an internetwork , 1996, Proceedings of IEEE INFOCOM '96. Conference on Computer Communications.

[13]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.