Statistical Model Selection Methods Applied to Biological Networks

Many biological networks have been labelled scale-free as their degree distribution can be approximately described by a powerlaw distribution. While the degree distribution does not summarize all aspects of a network it has often been suggested that its functional form contains important clues as to underlying evolutionary processes that have shaped the network. Generally determining the appropriate functional form for the degree distribution has been fitted in an ad-hoc fashion. Here we apply formal statistical model selection methods to determine which functional form best describes degree distributions of protein interaction and metabolic networks. We interpret the degree distribution as belonging to a class of probability models and determine which of these models provides the best description for the empirical data using maximum likelihood inference, composite likelihood methods, the Akaike information criterion and goodness-of-fit tests. The whole data is used in order to determine the parameter that best explains the data under a given model (e.g. scale-free or random graph). As we will show, present protein interaction and metabolic network data from different organisms suggests that simple scale-free models do not provide an adequate description of real network data.

[1]  D. Cox,et al.  A note on pseudolikelihood constructed from marginal densities , 2004 .

[2]  S. N. Dorogovtsev,et al.  Evolution of networks , 2001, cond-mat/0106144.

[3]  A. Barabasi,et al.  Functional and topological characterization of protein interaction networks , 2004, Proteomics.

[4]  D. Darling,et al.  A Test of Goodness of Fit , 1954 .

[5]  Derek Huntley,et al.  Comparative analysis of the Saccharomyces cerevisiae and Caenorhabditis elegans protein interaction networks , 2005, BMC Evolutionary Biology.

[6]  K. Strimmer,et al.  Inferring confidence sets of possibly misspecified gene trees , 2002, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[7]  K. Sneppen,et al.  Specificity and Stability in Topology of Protein Networks , 2002, Science.

[8]  Z. Burda,et al.  Statistical ensemble of scale-free random graphs. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  A. Wagner The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes. , 2001, Molecular biology and evolution.

[10]  T. W. Anderson,et al.  Asymptotic Theory of Certain "Goodness of Fit" Criteria Based on Stochastic Processes , 1952 .

[11]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[12]  Carsten Wiuf,et al.  Subnets of scale-free networks are not scale-free: sampling properties of networks. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[13]  R. May,et al.  Infection dynamics on scale-free networks. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[15]  M. Newman,et al.  Random graphs with arbitrary degree distributions and their applications. , 2000, Physical review. E, Statistical, nonlinear, and soft matter physics.

[16]  Ioannis Xenarios,et al.  DIP: the Database of Interacting Proteins , 2000, Nucleic Acids Res..

[17]  T. Ito,et al.  Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[18]  S. N. Dorogovtsev,et al.  Multifractal properties of growing networks , 2002 .

[19]  Wen-Hsiung Li,et al.  Evolution of the yeast protein interaction network , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[21]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.