Client clustering for traffic and location estimation

Resource management mechanisms for large-scale, globally distributed network services need to assign groups of clients to servers according to network location and expected load generated by these clients. Current proposals address network location and traffic modeling separately. We develop a novel clustering technique that addresses both network proximity and traffic modeling. Our approach combines techniques from network-aware clustering, location inference, and spatial analysis. We conduct a large, measurement-based study to identify and evaluate Web traffic clusters. Our study links millions of Web transactions collected from two world-wide sporting event Websites, with millions of network delay measurements to thousands of Internet address clusters. Because our techniques are equally applicable to other traffic types, they are useful in a variety of wide-area distributed computing optimizations, and Internet modeling and simulation scenarios.

[1]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[2]  Balachander Krishnamurthy,et al.  On network-aware clustering of Web clients , 2000, SIGCOMM.

[3]  Krishna P. Gummadi,et al.  King: estimating latency between arbitrary internet end hosts , 2002, IMW '02.

[4]  Paul Francis,et al.  IDMaps: a global internet host distance estimation service , 2001, TNET.

[5]  Henning Schulzrinne,et al.  Models and algorithms for resource management in distributed computing cooperatives , 2004 .

[6]  Anees Shaikh,et al.  Modeling redirection in geographically diverse server sets , 2003, WWW '03.

[7]  Lakshminarayanan Subramanian,et al.  An investigation of geographic mapping techniques for internet hosts , 2001, SIGCOMM.

[8]  Mark Handley,et al.  Topologically-aware overlay construction and server selection , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[9]  B. Huffaker,et al.  Distance Metrics in the Internet , 2002, Anais do 2002 International Telecommunications Symposium.

[10]  Ellen W. Zegura,et al.  A novel server selection technique for improving the response time of a replicated service , 1998, Proceedings. IEEE INFOCOM '98, the Conference on Computer Communications. Seventeenth Annual Joint Conference of the IEEE Computer and Communications Societies. Gateway to the 21st Century (Cat. No.98.