Efficient sampling strategies for large-scale complex networks

In empirical research on large-scale complex networks, sampling is a necessary way to collect data. Current methods commonly-used are Y2H-derived partial sampling strategy and random sampling strategy. Some recent studies have proposed that subnets sampled by these methods may not accurately conserve structural properties of the original network. Therefore, how to improve the accuracy of data collection is raised as a significant problem. We present an effective strategy for sampling in complex networks. The proposed strategy, hub strategy, calls for targeting the highly connected node samples. We demonstrate that in contrast with current sampling methods, hub sampling strategy keeps multiple structural properties of networks more accurately as well as being more economical. Furthermore, we find that when sampling rate decreases, hierarchical modularity is easier to be distorted quantitatively by hub sampling than the other structural properties.

[1]  A. Vázquez Growing network with local rules: preferential attachment, clustering hierarchy, and degree correlations. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[2]  Carsten Wiuf,et al.  Subnets of scale-free networks are not scale-free: sampling properties of networks. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[4]  Albert-László Barabási,et al.  Hierarchical organization in complex networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  Zhong Chen,et al.  Efficient target strategies for contagion in scale-free networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  R Pastor-Satorras,et al.  Dynamical and correlation properties of the internet. , 2001, Physical review letters.

[7]  Julio M. Ottino,et al.  Complex networks , 2004, Encyclopedia of Big Data.

[8]  Li Xiang NEW INTERDISCIPLINARY SCIENCE:NETWORK SCIENCE(I) , 2007 .

[9]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[10]  M E J Newman Assortative mixing in networks. , 2002, Physical review letters.

[11]  Soon-Hyung Yook,et al.  Statistical properties of sampled networks by random walks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  Albert-László Barabási,et al.  Internet: Diameter of the World-Wide Web , 1999, Nature.

[13]  M. Vidal,et al.  Effect of sampling on topology predictions of protein-protein interaction networks , 2005, Nature Biotechnology.

[14]  Alessandro Vespignani,et al.  Large-scale topological and dynamical properties of the Internet. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  D. Watts The “New” Science of Networks , 2004 .

[16]  Y. Moreno,et al.  Resilience to damage of graphs with degree correlations. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  A Vázquez,et al.  The topological relationship between the large-scale attributes and local interaction patterns of complex networks , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Alessandro Vespignani,et al.  Immunization of complex networks. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[20]  Che Hong-an Scale-Free Networks and Their Significance for Systems Science , 2004 .

[21]  A. Barabasi,et al.  Hierarchical Organization of Modularity in Metabolic Networks , 2002, Science.