Sampling biases in IP topology measurements

Considerable attention has been focused on the properties of graphs derived from Internet measurements. Router-level topologies collected via traceroute-like methods have led some to conclude that the router graph of the Internet is well modeled as a power-law random graph. In such a graph, the degree distribution of nodes follows a distribution with a power-law tail. We argue that the evidence to date for this conclusion is at best insufficient We show that when graphs are sampled using traceroute-like methods, the resulting degree distribution can differ sharply from that of the underlying graph. For example, given a sparse Erdos-Renyi random graph, the subgraph formed by a collection of shortest paths from a small set of random sources to a larger set of random destinations can exhibit a degree distribution remarkably like a power-law. We explore the reasons for how this effect arises, and show that in such a setting, edges are sampled in a highly biased manner. This insight allows us to formulate tests for determining when sampling bias is present. When we apply these tests to a number of well-known datasets, we find strong evidence for sampling bias.

[1]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[2]  Ramesh Govindan,et al.  Heuristics for Internet map discovery , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[3]  Rajeev Motwani,et al.  Randomized Algorithms , 1995, SIGA.

[4]  Damien Magoni,et al.  Analysis and Comparison of Internet Topology Generators , 2002, NETWORKING.

[5]  Fan Chung Graham,et al.  A random graph model for massive graphs , 2000, STOC '00.

[6]  Anees Shaikh,et al.  Issues with inferring Internet topological attributes , 2002, SPIE ITCom.

[7]  Damien Magoni,et al.  Comparative Study of Internet-like Topology Generators , 2001 .

[8]  Christos H. Papadimitriou,et al.  Heuristically Optimized Trade-Offs: A New Paradigm for Power Laws in the Internet , 2002, ICALP.

[9]  kc claffy,et al.  Internet topology: connectivity of IP graphs , 2001, SPIE ITCom.

[10]  Ratul Mahajan,et al.  Measuring ISP topologies with rocketfuel , 2002, TNET.

[11]  Gerard Hooghiemstra,et al.  A scaling law for the hopcount in internet , 2000 .

[12]  Ibrahim Matta,et al.  On the geographic location of Internet resources , 2003, IEEE J. Sel. Areas Commun..

[13]  Piet Van Mieghem,et al.  On the efficiency of multicast , 2001, TNET.

[14]  Jean-Jacques Pansiot,et al.  On routes and multicast trees in the Internet , 1998, CCRV.

[15]  Piet Van Mieghem,et al.  Stability of a multicast tree , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[16]  B. Bollobás The evolution of random graphs , 1984 .

[17]  Ramesh Govindan,et al.  Internet path inflation due to policy routing , 2001, SPIE ITCom.

[18]  Fan Chung Graham,et al.  The Diameter of Sparse Random Graphs , 2001, Adv. Appl. Math..

[19]  F. Chung,et al.  The Diameter of Random Sparse Graphs , 2000 .

[20]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[21]  David Wetherall,et al.  Scriptroute: A Public Internet Measurement Facility , 2003, USENIX Symposium on Internet Technologies and Systems.

[22]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[23]  Azer Bestavros,et al.  On the marginal utility of network topology measurements , 2001, IMW '01.

[24]  Walter Willinger,et al.  Network topology generators: degree-based vs. structural , 2002, SIGCOMM '02.

[25]  Deborah Estrin,et al.  The impact of routing policy on Internet paths , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[26]  Ray Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.