Bounding the bias of tree-like sampling in IP topologies

It is widely believed that the Internet's AS-graph degree distribution obeys a power-law form. However, it was recently argued that since Internet data is collected in a tree-like fashion, it only produces a sample of the degree distribution, and this sample may be biased. This argument was backed by simulation data and mathematical analysis, which demonstrated that under certain conditions a tree sampling procedure can produce an artificial power-law in the degree distribution. Thus, although the observed degree distribution of the AS-graph follows a power-law, this phenomenon may be an artifact of the sampling process. In this work we provide some evidence to the contrary. We show, by analysis and simulation, that when the underlying graph degree distribution obeys a power-law with an exponent $\gamma>2$, a tree-like sampling process produces a negligible bias in the sampled degree distribution. Furthermore, recent data collected from the DIMES project, which is not based on single source sampling, indicates that the Internet indeed obeys a power-law degree distribution with an exponent $\gamma>2$. Combining this empirical data with our simulation of traceroute experiments on DIMES-measured AS-graph as the underlying graph , and with our analysis, we conclude that the bias in the degree distribution calculated from BGP data is negligible.

[1]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[2]  A. Rbnyi ON THE EVOLUTION OF RANDOM GRAPHS , 2001 .

[3]  Yuval Shavitt,et al.  DIMES: let the internet measure itself , 2005, CCRV.

[4]  Avishai Wool,et al.  An Incremental Super-Linear Preferential Internet Topology Model: Extended Abstract , 2004, PAM.

[5]  Ramesh Govindan,et al.  Heuristics for Internet map discovery , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[6]  A. Clauset,et al.  Traceroute sampling makes random graphs appear to have power law degree distributions , 2003, cond-mat/0312674.

[7]  Alessandro Vespignani,et al.  Exploring networks with traceroute-like probes: theory and simulations , 2004, Theor. Comput. Sci..

[8]  Cristopher Moore,et al.  Accuracy and scaling phenomena in Internet mapping. , 2004, Physical review letters.

[9]  Walter Willinger,et al.  Scaling phenomena in the Internet: Critically examining criticality , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Azer Bestavros,et al.  On the marginal utility of network topology measurements , 2001, IMW '01.

[11]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[12]  Walter Willinger,et al.  The origin of power laws in Internet topologies revisited , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[13]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[14]  B. Bollobás The evolution of random graphs , 1984 .

[15]  FaloutsosMichalis,et al.  On power-law relationships of the Internet topology , 1999 .

[16]  G. B. A. Barab'asi Competition and multiscaling in evolving networks , 2000, cond-mat/0011029.

[17]  Cristopher Moore,et al.  Why Mapping the Internet is Hard , 2004 .

[18]  Ginestra Bianconi,et al.  Competition and multiscaling in evolving networks , 2001 .

[19]  Russ Bubley,et al.  Randomized algorithms , 1995, CSUR.

[20]  Cristopher Moore,et al.  On the bias of traceroute sampling: or, power-law degree distributions in regular graphs , 2005, STOC '05.

[21]  Michalis Faloutsos,et al.  A simple conceptual model for the Internet topology , 2001, GLOBECOM'01. IEEE Global Telecommunications Conference (Cat. No.01CH37270).

[22]  KempeDavid,et al.  On the bias of traceroute sampling , 2009 .

[23]  Alessandro Vespignani,et al.  Statistical theory of Internet exploration. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[24]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[25]  David Bawden,et al.  Book Review: Evolution and Structure of the Internet: A Statistical Physics Approach. , 2006 .

[26]  Peng Xie,et al.  Sampling biases in IP topology measurements , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[27]  Alain Barrat,et al.  What is the real size of a sampled network? The case of the Internet. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[28]  Ilkka Norros,et al.  On the power-law random graph model of massive data networks , 2004, Perform. Evaluation.

[29]  S Redner,et al.  Degree distributions of growing networks. , 2001, Physical review letters.

[30]  Walter Willinger,et al.  Network topology generators: degree-based vs. structural , 2002, SIGCOMM '02.

[31]  Sugih Jamin,et al.  Inet-3.0: Internet Topology Generator , 2002 .

[32]  T. Petermann,et al.  Exploration of scale-free networks , 2004, cond-mat/0401065.

[33]  Albert,et al.  Topology of evolving networks: local events and universality , 2000, Physical review letters.

[34]  Xiang Li,et al.  A local-world evolving network model , 2003 .

[35]  I M Sokolov,et al.  Evolving networks with disadvantaged long-range connections. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[36]  Avishai Wool,et al.  A geographic directed preferential Internet topology model , 2005, 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.