Constructing internet coordinate system based on delay measurement

In this paper, we consider the problem of how to represent the locations of Internet hosts in a Cartesian coordinate system to facilitate estimation of network distances among arbitrary Internet hosts. We envision an infrastructure that consists of beacon nodes and provides the service of estimating network distance between pairs of hosts without direct delay measurement. We show that the principal component analysis (PCA) technique can effectively extract topological information from delay measurements between beacon hosts. Based on PCA, we devise a transformation method that projects the raw distance space into a new coordinate system of (much) smaller dimensions. The transformation retains as much topological information as possible and yet enables end hosts to determine their coordinates in the coordinate system. The resulting new coordinate system is termed as the Internet Coordinate System (ICS). As compared to existing work (e.g., IDMaps and GNP), ICS incurs smaller computation overhead in calculating the coordinates of hosts and smaller measurement overhead (required for end hosts to measure their distances to beacon hosts). Finally, we show via experiments with both real-life and synthetic data sets that ICS makes robust and accurate estimates of network distances, incurs little computational overhead, and its performance is not susceptible to the number of beacon nodes (as long as it exceeds a certain threshold) and the network topology.

[1]  Ellen W. Zegura,et al.  How to model an internetwork , 1996, Proceedings of IEEE INFOCOM '96. Conference on Computer Communications.

[2]  Michael F. Schwartz,et al.  Locating nearby copies of replicated Internet servers , 1995, SIGCOMM '95.

[3]  B. Noble Applied Linear Algebra , 1969 .

[4]  Mark Handley,et al.  Topologically-aware overlay construction and server selection , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[5]  Vern Paxson,et al.  End-to-end routing behavior in the Internet , 1996, TNET.

[6]  Mark Crovella,et al.  Virtual landmarks for the internet , 2003, IMC '03.

[7]  Chris H. Q. Ding,et al.  Adaptive dimension reduction for clustering high dimensional data , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[8]  F SchwartzMichael,et al.  Locating nearby copies of replicated Internet servers , 1995 .

[9]  Paul Francis,et al.  An architecture for a global Internet host distance estimation service , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[10]  Jon Crowcroft,et al.  Lighthouses for Scalable Distributed Location , 2003, IPTPS.

[11]  Hui Zhang,et al.  Predicting Internet network distance with coordinates-based approaches , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[12]  I. Jolliffe Principal Component Analysis , 2002 .

[13]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[14]  Ka Yee Yeung,et al.  Principal component analysis for clustering gene expression data , 2001, Bioinform..

[15]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[16]  Tom Minka,et al.  Automatic Choice of Dimensionality for PCA , 2000, NIPS.

[17]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .