A statistical approach to the traceroute-like exploration of networks: theory and simulations

Mapping the Internet generally consists in sampling the network from a limited set of sources by using "traceroute"-like probes. This methodology, akin to the merging of different spanning trees to a set of destinations, has been argued to introduce uncontrolled sampling biases that might produce statistical properties of the sampled graph which sharply differ from the original ones. Here we explore these biases and provide a statistical analysis of their origin. We derive a mean-field analytical approximation for the probability of edge and vertex detection that exploits the role of the number of sources and targets and allows us to relate the global topological properties of the underlying network with the statistical accuracy of the sampled graph. In particular we find that the edge and vertex detection probability is depending on the betweenness centrality of each element. This allows us to show that shortest path routed sampling provides a better characterization of underlying graphs with scale-free topology. We complement the analytical discussion with a throughout numerical investigation of simulated mapping strategies in different network models. We show that sampled graphs provide a fair qualitative characterization of the statistical properties of the original networks in a fair range of different strategies and exploration parameters. The numerical study also allows the identification of intervals of the exploration parameters that optimize the fraction of nodes and edges discovered in the sampled graph. This finding might hint the steps toward more efficient mapping strategies.

[1]  S. N. Dorogovtsev,et al.  Size-dependent degree distribution of a scale-free growing network. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[2]  L. Trajkovic,et al.  Mapping the Internet , 2001, Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569).

[3]  David Bawden,et al.  Book Review: Evolution and Structure of the Internet: A Statistical Physics Approach. , 2006 .

[4]  Peng Xie,et al.  Sampling biases in IP topology measurements , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[5]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[6]  A. Clauset,et al.  Traceroute sampling makes random graphs appear to have power law degree distributions , 2003, cond-mat/0312674.

[7]  Ramesh Govindan,et al.  Heuristics for Internet map discovery , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[8]  M. Barthelemy Betweenness centrality in large complex networks , 2003, cond-mat/0309436.

[9]  Alessandro Vespignani,et al.  Large-scale topological and dynamical properties of the Internet. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  R Pastor-Satorras,et al.  Dynamical and correlation properties of the internet. , 2001, Physical review letters.

[11]  Pierre Baldi,et al.  Modeling the Internet and the Web: Probabilistic Methods and Algorithms. By Pierre Baldi, Paolo Frasconi, Padhraic Smith, John Wiley and Sons Ltd., West Sussex, England, 2003. 285 pp ISBN 0 470 84906 1 , 2006, Inf. Process. Manag..

[12]  Cristopher Moore,et al.  Accuracy and scaling phenomena in Internet mapping. , 2004, Physical review letters.

[13]  Walter Willinger,et al.  Scaling phenomena in the Internet: Critically examining criticality , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[15]  Ibrahim Matta,et al.  BRITE: A Flexible Generator of Internet Topologies , 2000 .

[16]  Sugih Jamin,et al.  Inet: Internet Topology Generator , 2000 .

[17]  Jean-Loup Guillaume,et al.  Relevance of massively distributed explorations of the Internet topology: simulation results , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[18]  K. Goh,et al.  Universal behavior of load distribution in scale-free networks. , 2001, Physical review letters.

[19]  Walter Willinger,et al.  The origin of power laws in Internet topologies revisited , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[20]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[21]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[22]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[23]  FaloutsosMichalis,et al.  On power-law relationships of the Internet topology , 1999 .

[24]  GuillaumeJean-Loup,et al.  Relevance of massively distributed explorations of the internet topology , 2006 .

[25]  Albert-László Barabási,et al.  Evolution of Networks: From Biological Nets to the Internet and WWW , 2004 .

[26]  T. Petermann,et al.  Exploration of scale-free networks , 2004, cond-mat/0401065.

[27]  kc claffy,et al.  Internet topology: connectivity of IP graphs , 2001, SPIE ITCom.

[28]  G. Caldarelli,et al.  The fractal properties of Internet , 2000, cond-mat/0009178.