Reconstructing networks with unknown and heterogeneous errors

The vast majority of network datasets contains errors and omissions, although this is rarely incorporated in traditional network analysis. Recently, an increasing effort has been made to fill this methodological gap by developing network reconstruction approaches based on Bayesian inference. These approaches, however, rely on assumptions of uniform error rates and on direct estimations of the existence of each edge via repeated measurements, something that is currently unavailable for the majority of network data. Here we develop a Bayesian reconstruction approach that lifts these limitations by not only allowing for heterogeneous errors, but also for single edge measurements without direct error estimates. Our approach works by coupling the inference approach with structured generative network models, which enable the correlations between edges to be used as reliable uncertainty estimates. Although our approach is general, we focus on the stochastic block model as the basic generative process, from which efficient nonparametric inference can be performed, and yields a principled method to infer hierarchical community structure from noisy data. We demonstrate the efficacy of our approach with a variety of empirical and artificial networks.

[1]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[2]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  M. E. J. Newman,et al.  Network structure from rich but noisy data , 2017, Nature Physics.

[4]  Carter T. Butts,et al.  Network inference, error, and informant (in)accuracy: a Bayesian approach , 2003, Soc. Networks.

[5]  M. A. Muñoz,et al.  Entropic origin of disassortativity in complex networks. , 2010, Physical review letters.

[6]  Konstantin Avrachenkov,et al.  Cooperative Game Theory Approaches for Network Partitioning , 2017, COCOON.

[7]  P. V. Marsden,et al.  NETWORK DATA AND MEASUREMENT , 1990 .

[8]  Tiago P. Peixoto Inferring the mesoscale structure of layered, edge-valued, and time-varying networks. , 2015, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  Roger Guimerà,et al.  Missing and spurious interactions and the reconstruction of complex networks , 2009, Proceedings of the National Academy of Sciences.

[10]  Alex Pentland,et al.  Reality mining: sensing complex social systems , 2006, Personal and Ubiquitous Computing.

[11]  Tiago P Peixoto,et al.  Parsimonious module inference in large networks. , 2012, Physical review letters.

[12]  Anja Znidarsic,et al.  Non-response in social networks: The impact of different non-response treatments on the stability of blockmodels , 2012, Soc. Networks.

[13]  Mark E. J. Newman,et al.  Structural inference for uncertain networks , 2015, Physical review. E.

[14]  Mark E. J. Newman,et al.  Structure and inference in annotated networks , 2015, Nature Communications.

[15]  Diego Garlaschelli,et al.  Maximum-Entropy Networks: Pattern Detection, Network Reconstruction and Graph Combinatorics , 2017 .

[16]  Mason A. Porter,et al.  Multilayer networks , 2013, J. Complex Networks.

[17]  Tiago P. Peixoto Nonparametric weighted stochastic block models. , 2017, Physical review. E.

[18]  Virgílio A. F. Almeida,et al.  Proceedings of the 22nd international conference on World Wide Web , 2013, WWW 2013.

[19]  Lincoln Stein,et al.  Reactome: a knowledgebase of biological pathways , 2004, Nucleic Acids Res..

[20]  Peter D. Hoff,et al.  Latent Space Approaches to Social Network Analysis , 2002 .

[21]  Tiago P. Peixoto,et al.  The graph-tool python library , 2014 .

[22]  S. Brenner,et al.  The structure of the nervous system of the nematode Caenorhabditis elegans. , 1986, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[23]  F. Hollander,et al.  Ensemble nonequivalence in random graphs with modular structure , 2016, 1603.08759.

[24]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Mark E. J. Newman,et al.  Network reconstruction and error estimation with noisy network data , 2018, ArXiv.

[26]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[27]  Václav Havel,et al.  Poznámka o existenci konečných grafů , 1955 .

[28]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[29]  Alberto H. F. Laender,et al.  Proceedings of the 9th International Symposium on String Processing and Information Retrieval , 2002 .

[30]  Vince Grolmusz,et al.  Parameterizable consensus connectomes from the Human Connectome Project: the Budapest Reference Connectome Server v3.0 , 2016, Cognitive Neurodynamics.

[31]  Tiago P. Peixoto Hierarchical block structures and high-resolution model selection in large networks , 2013, ArXiv.

[32]  Michael Ley,et al.  The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives , 2002, SPIRE.

[33]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[34]  M. Newman,et al.  The structure of scientific collaboration networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Dunja Mladenic,et al.  Proceedings of the 3rd international workshop on Link discovery , 2005, KDD 2005.

[36]  S. Hakimi On Realizability of a Set of Integers as Degrees of the Vertices of a Linear Graph. I , 1962 .

[37]  A. Barabasi,et al.  The network takeover , 2011, Nature Physics.

[38]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[39]  James Moody,et al.  Peer influence groups: identifying dense clusters in large networks , 2001, Soc. Networks.

[40]  Linyuan Lu,et al.  Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[41]  Yiming Yang,et al.  Introducing the Enron Corpus , 2004, CEAS.

[42]  Julien Cohen-Adad,et al.  The Human Connectome Project and beyond: Initial applications of 300mT/m gradients , 2013, NeuroImage.

[43]  Tiago P. Peixoto Nonparametric Bayesian inference of the microcanonical stochastic block model. , 2016, Physical review. E.

[44]  D. Lauffenburger,et al.  Network inference , 2005 .

[45]  Damian Szklarczyk,et al.  The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible , 2016, Nucleic Acids Res..

[46]  D. J. Strauss A model for clustering , 1975 .

[47]  M. Newman,et al.  Hierarchical structure and the prediction of missing links in networks , 2008, Nature.

[48]  Jon Kleinberg,et al.  The link prediction problem for social networks , 2003, CIKM '03.

[49]  Jure Leskovec,et al.  Discovering social circles in ego networks , 2012, ACM Trans. Knowl. Discov. Data.

[50]  Valdis E. Krebs,et al.  Uncloaking Terrorist Networks , 2002, First Monday.

[51]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.

[52]  Tiago P. Peixoto Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[53]  Santo Fortunato,et al.  Network structure, metadata and the prediction of missing nodes , 2016, ArXiv.

[54]  M. Newman,et al.  Origin of degree correlations in the Internet and other networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[55]  Mark E. J. Newman,et al.  Generalized communities in networks , 2015, Physical review letters.

[56]  Aaron Clauset,et al.  Learning Latent Block Structure in Weighted Networks , 2014, J. Complex Networks.

[57]  LeskovecJure,et al.  Discovering social circles in ego networks , 2014 .

[58]  Caroline O. Buckee,et al.  A Network Approach to Analyzing Highly Recombinant Malaria Parasite Genes , 2013, PLoS Comput. Biol..

[59]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[60]  Tiago P. Peixoto,et al.  Trust Transitivity in Social Networks , 2010, PloS one.

[61]  Valdis E. Krebs,et al.  Mapping Networks of Terrorist Cells , 2001 .

[62]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[63]  Jérôme Kunegis,et al.  KONECT: the Koblenz network collection , 2013, WWW.

[64]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[65]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[66]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .