Multi-model inference of network properties from incomplete data

Summary It has previously been shown that subnets differ from global networks from which they are sampled for all but a very limited number of theoretical network models. These differences are of qualitative as well as quantitative nature, and the properties of subnets may be very different from the corresponding properties in the true, unobserved network. Here we propose a novel approach which allows us to infer aspects of the true network from incomplete network data in a multi-model inference framework. We develop the basic theoretical framework, including procedures for assessing confidence intervals of our estimates and evaluate the performance of this approach in simulation studies and against subnets drawn from the presently available PIN network data in Saccaromyces cerevisiae. We then illustrate the potential power of this new approach by estimating the number of interactions that will be detectable with present experimental approaches in sfour eukaryotic species, inlcuding humans. Encouragingly, where independent datasets are available we obtain consistent estimates from different partial protein interaction networks. We conclude with a discussion of the scope of this approaches and areas for further research

[1]  Korbinian Strimmer,et al.  Learning Large‐Scale Graphical Gaussian Models from Genomic Data , 2005 .

[2]  Piers J. Ingram,et al.  Probability models for degree distributions of protein interaction networks , 2005 .

[3]  Michael P H Stumpf,et al.  Complex networks and simple models in biology , 2005, Journal of The Royal Society Interface.

[4]  A. Wagner,et al.  Structure and evolution of protein interaction networks: a statistical model for link dynamics and gene duplications , 2002, BMC Evolutionary Biology.

[5]  J. Rothberg,et al.  Gaining confidence in high-throughput protein interaction networks , 2004, Nature Biotechnology.

[6]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[7]  S H Strogatz,et al.  Random graph models of social networks , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Carsten Wiuf,et al.  Statistical Model Selection Methods Applied to Biological Networks , 2005, Trans. Comp. Sys. Biology.

[9]  Bruce A. Reed,et al.  A Critical Point for Random Graphs with a Given Degree Sequence , 1995, Random Struct. Algorithms.

[10]  Béla Bollobás,et al.  Random Graphs , 1985 .

[11]  Adrian E. Raftery,et al.  Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data , 2005, Bioinform..

[12]  R. Pastor-Satorras,et al.  Class of correlated random networks with hidden variables. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  Uncorrelated random networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  Carsten Wiuf,et al.  Binomial subsampling , 2006, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[15]  Carsten Wiuf,et al.  Subnets of scale-free networks are not scale-free: sampling properties of networks. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[16]  H. Lehrach,et al.  A Human Protein-Protein Interaction Network: A Resource for Annotating the Proteome , 2005, Cell.

[17]  Johannes Berg,et al.  Correlated random networks. , 2002, Physical review letters.

[18]  M. Vidal,et al.  Effect of sampling on topology predictions of protein-protein interaction networks , 2005, Nature Biotechnology.

[19]  M. Newman Random Graphs as Models of Networks , 2002, cond-mat/0202208.

[20]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[21]  K. Strimmer,et al.  Inferring confidence sets of possibly misspecified gene trees , 2002, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[22]  David R. Anderson,et al.  Model Selection and Multimodel Inference , 2003 .

[23]  Michael P. H. Stumpf,et al.  Generating confidence intervals on biological networks , 2007, BMC Bioinformatics.

[24]  Carsten Wiuf,et al.  Sampling properties of random graphs: the degree distribution. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[25]  S. L. Wong,et al.  Towards a proteome-scale map of the human protein–protein interaction network , 2005, Nature.

[26]  A. Krzywicki Defining statistical ensembles of random graphs , 2001 .

[27]  M. Stumpf,et al.  A likelihood approach to analysis of network data , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Z. Burda,et al.  Statistical ensemble of scale-free random graphs. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[29]  S. N. Dorogovtsev,et al.  Multifractal properties of growing networks , 2002 .

[30]  Hawoong Jeong,et al.  Statistical properties of sampled networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.