Structure and inference in annotated networks

For many networks of scientific interest we know both the connections of the network and information about the network nodes, such as the age or gender of individuals in a social network, geographic location of nodes in the Internet, or cellular function of nodes in a gene regulatory network. Here we demonstrate how this “metadata” can be used to improve our analysis and understanding of network structure. We focus in particular on the problem of community detection in networks and develop a mathematically principled approach that combines a network and its metadata to detect communities more accurately than can be done with either alone. Crucially, the method does not assume that the metadata are correlated with the communities we are trying to find. Instead the method learns whether a correlation exists and correctly uses or ignores the metadata depending on whether they contain useful information. The learned correlations are also of interest in their own right, allowing us to make predictions about the community membership of nodes whose network connections are unknown. We demonstrate our method on synthetic networks with known structure and on real-world networks, large and small, drawn from social, biological, and technological domains.

[1]  Gan Wen-yan Community Structure Detection in Complex Networks , 2012 .

[2]  Cristopher Moore,et al.  Phase transition in the detection of modules in sparse networks , 2011, Physical review letters.

[3]  D. Gabard,et al.  Race , 1998, Encyclopedia of the UN Sustainable Development Goals.

[4]  William T. Freeman,et al.  Understanding belief propagation and its generalizations , 2003 .

[5]  Thomas M. Keane,et al.  Plasmodium falciparum var gene expression is modified by host immunity , 2009, Proceedings of the National Academy of Sciences.

[6]  Barbora Micenková,et al.  Clustering attributed graphs: Models, measures and methods , 2015, Network Science.

[7]  Derek Greene,et al.  Normalized Mutual Information to evaluate overlapping community finding algorithms , 2011, ArXiv.

[8]  Benjamin H. Good,et al.  Performance of modularity maximization in practical contexts. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  Cristopher Moore,et al.  Phase transitions in semisupervised clustering of sparse networks , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Petter Holme,et al.  Subnetwork hierarchies of biochemical pathways , 2002, Bioinform..

[11]  Mason A. Porter,et al.  Social Structure of Facebook Networks , 2011, ArXiv.

[12]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  Aaron Clauset,et al.  Learning Latent Block Structure in Weighted Networks , 2014, J. Complex Networks.

[14]  Eric Eaton,et al.  A Spin-Glass Model for Semi-Supervised Community Detection , 2012, AAAI.

[15]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[16]  Caroline O. Buckee,et al.  A Network Approach to Analyzing Highly Recombinant Malaria Parasite Genes , 2013, PLoS Comput. Biol..

[17]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[18]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[19]  Elchanan Mossel,et al.  A Proof of the Block Model Threshold Conjecture , 2013, Combinatorica.

[20]  Leto Peel Supervised Blockmodelling , 2012, ArXiv.

[21]  Neil Hall,et al.  Plasmodium falciparum Variant Surface Antigen Expression Patterns during Malaria , 2005, PLoS pathogens.

[22]  Mingwei Leng,et al.  Active Semi-supervised Community Detection Algorithm with Label Propagation , 2013, DASFAA.

[23]  Joshua T. Vogelstein,et al.  Covariate-assisted spectral clustering , 2014, Biometrika.

[24]  Caroline O. Buckee,et al.  An approach to classifying sequence tags sampled from Plasmodium falciparum var genes , 2007, Molecular and biochemical parasitology.

[25]  Zheng Wang,et al.  Active learning for node classification in assortative and disassortative networks , 2011, KDD.

[26]  Jure Leskovec,et al.  Community Detection in Networks with Node Attributes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[27]  Daniel B. Larremore,et al.  Efficiently inferring community structure in bipartite networks , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[28]  Leon Danon,et al.  Comparing community structure identification , 2005, cond-mat/0505245.

[29]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[30]  R. Guimerà,et al.  Functional cartography of complex metabolic networks , 2005, Nature.

[31]  Xiaoke Ma,et al.  Semi-supervised clustering algorithm for community structure detection in complex networks , 2010 .

[32]  Santo Fortunato,et al.  Community detection in networks: Structural communities versus ground truth , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.