Graph model selection using maximum likelihood

In recent years, there has been a proliferation of theoretical graph models, e.g., preferential attachment and small-world models, motivated by real-world graphs such as the Internet topology. To address the natural question of which model is best for a particular data set, we propose a model selection criterion for graph models. Since each model is in fact a probability distribution over graphs, we suggest using Maximum Likelihood to compare graph models and select their parameters. Interestingly, for the case of graph models, computing likelihoods is a difficult algorithmic task. However, we design and implement MCMC algorithms for computing the maximum likelihood for four popular models: a power-law random graph model, a preferential attachment model, a small-world model, and a uniform random graph model. We hope that this novel use of ML will objectify comparisons between graph models.

[1]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[2]  Donald F. Towsley,et al.  On distinguishing between Internet power law topology generators , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[3]  Andrei Z. Broder,et al.  The Connectivity Server: Fast Access to Linkage Information on the Web , 1998, Comput. Networks.

[4]  Santosh S. Vempala,et al.  Simulated annealing in convex bodies and an O*(n4) volume algorithm , 2006, J. Comput. Syst. Sci..

[5]  A. Barabasi,et al.  Scale-free characteristics of random networks: the topology of the world-wide web , 2000 .

[6]  Walter Willinger,et al.  Network topology generators: degree-based vs. structural , 2002, SIGCOMM '02.

[7]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[8]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[9]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[10]  Santosh S. Vempala,et al.  Simulated annealing in convex bodies and an O*(n/sup 4/) volume algorithm , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[11]  Michael Mitzenmacher,et al.  A Brief History of Generative Models for Power Law and Lognormal Distributions , 2004, Internet Math..

[12]  Michael Pieper,et al.  University of Wisconsin at Madison , 1993 .

[13]  Stanley F. Chen,et al.  Evaluation Metrics For Language Models , 1998 .

[14]  Lixin Gao On inferring autonomous system relationships in the internet , 2001, TNET.

[15]  Ronald Rosenfeld,et al.  Efficient sampling and feature selection in whole sentence maximum entropy language models , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[16]  Michalis Faloutsos,et al.  Power laws and the AS-level internet topology , 2003, TNET.

[17]  Arun K. Ramani,et al.  Protein interaction networks from yeast to human. , 2004, Current opinion in structural biology.

[18]  Micah Adler,et al.  Towards compressing Web graphs , 2001, Proceedings DCC 2001. Data Compression Conference.

[19]  Andrew W. Moore,et al.  Finding Underlying Connections: A Fast Graph-Based Method for Link Analysis and Collaboration Queries , 2003, ICML.

[20]  Ibrahim Matta,et al.  On the origin of power laws in Internet topologies , 2000, CCRV.

[21]  Jon M. Kleinberg,et al.  The small-world phenomenon: an algorithmic perspective , 2000, STOC '00.

[22]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[23]  P. Faloutsos,et al.  Power-Laws and the AS-level Internet , 2003 .

[24]  Fan Chung Graham,et al.  A random graph model for massive graphs , 2000, STOC '00.