Population size estimation and Internet link structure

Traceroute sampling is a common approach for exploring the autonomous system (AS) graph of the Internet. It provides samples of links between autonomous systems, but these links are not drawn uniformly at random from all possible links. Rather, the rules that each AS uses are idiosyncratic and emergent. Here, we are interested in using the data from traceroute sampling to estimate the degree distribution of the network, a quantity of common interest in network modeling more broadly. We link these ideas to the methodology of multiple-recapture estimation of the size of a closed population using log-linear models. We apply our approach to produce new estimates of the degree distribution of the AS graph, and to provide further evidence that the degree distribution does indeed have heavy tails.

[1]  Edoardo M. Airoldi,et al.  A Survey of Statistical Network Models , 2009, Found. Trends Mach. Learn..

[2]  S. Fienberg,et al.  Population Size Estimation Using Individual Level Mixture Models , 2008, Biometrical journal. Biometrische Zeitschrift.

[3]  Abraham D. Flaxman,et al.  Bias Reduction in Traceroute Sampling - Towards a More Accurate Map of the Internet , 2007, WAW.

[4]  Matthieu Latapy,et al.  Describing and Simulating Internet Routes , 2004, NETWORKING.

[5]  M. Faloutsos The internet AS-level topology: three data sources and one definitive metric , 2006, CCRV.

[6]  Alain Barrat,et al.  Network Inference from TraceRoute Measurements: Internet Topology 'Species' , 2005, ArXiv.

[7]  Carsten Wiuf,et al.  Subnets of scale-free networks are not scale-free: sampling properties of networks. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Cristopher Moore,et al.  On the bias of traceroute sampling: Or, power-law degree distributions in regular graphs , 2005, JACM.

[9]  Cristopher Moore,et al.  Accuracy and scaling phenomena in Internet mapping. , 2004, Physical review letters.

[10]  Cristopher Moore,et al.  On the bias of traceroute sampling: or, power-law degree distributions in regular graphs , 2005, STOC '05.

[11]  Matthew J. Salganik,et al.  5. Sampling and Estimation in Hidden Populations Using Respondent-Driven Sampling , 2004 .

[12]  T. Petermann,et al.  Exploration of scale-free networks , 2004, cond-mat/0401065.

[13]  Peng Xie,et al.  Sampling biases in IP topology measurements , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[14]  Mathew D. Penrose,et al.  Random Geometric Graphs , 2003 .

[15]  Walter Willinger,et al.  The origin of power laws in Internet topologies revisited , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[16]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[17]  FaloutsosMichalis,et al.  On power-law relationships of the Internet topology , 1999 .

[18]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[19]  S. Fienberg,et al.  Classical multilevel and Bayesian approaches to population size estimation using multiple lists , 1999 .

[20]  O. Frank A Survey of Statistical Methods for Graph Analysis , 1981 .

[21]  Stephen E. Fienberg,et al.  Discrete Multivariate Analysis: Theory and Practice , 1976 .

[22]  S. Fienberg The multiple recapture census for closed populations and incomplete 2k contingency tables , 1972 .

[23]  Paul Erdös,et al.  On random graphs, I , 1959 .