Robust Bayesian inference of network structure from unreliable data

Most empirical studies of complex networks do not return direct, error-free measurements of network structure. Instead, they typically rely on indirect measurements that are often error-prone and unreliable. A fundamental problem in empirical network science is how to make the best possible estimates of network structure given such unreliable data. In this paper we describe a fully Bayesian method for reconstructing networks from observational data in any format, even when the data contain substantial measurement error and when the nature and magnitude of that error is unknown. The method is introduced through pedagogical case studies using real-world example networks, and specifically tailored to allow straightforward, computationally efficient implementation with a minimum of technical input. Computer code implementing the method is publicly available.

[1]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[2]  Samuel Leinhardt,et al.  The structural implications of measurement error in sociometry , 1973 .

[3]  Hawoong Jeong,et al.  Statistical properties of sampled networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[4]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[5]  D. Lauffenburger,et al.  Network inference , 2005 .

[6]  Hal Whitehead,et al.  Analyzing Animal Societies: Quantitative Methods for Vertebrate Social Analysis , 2008 .

[7]  E. Todeva Networks , 2007 .

[8]  Dan Cosley,et al.  Inferring social ties from geographic coincidences , 2010, Proceedings of the National Academy of Sciences.

[9]  John Scott What is social network analysis , 2010 .

[10]  Cosma Rohilla Shalizi,et al.  Philosophy and the practice of Bayesian statistics. , 2010, The British journal of mathematical and statistical psychology.

[11]  Carey E. Priebe,et al.  Statistical Inference on Errorfully Observed Graphs , 2012, 1211.3601.

[12]  Michael Betancourt,et al.  A Conceptual Introduction to Hamiltonian Monte Carlo , 2017, 1701.02434.

[13]  Eric D. Kolaczyk,et al.  Statistical Analysis of Network Data , 2009 .

[14]  Tanya Y. Berger-Wolf,et al.  Network Structure Inference, A Survey: Motivations, Methods, and Applications , 2016 .

[15]  Tiago P. Peixoto Network Reconstruction and Community Detection from Dynamics , 2019, Physical review letters.

[16]  David Lazer,et al.  Inferring friendship network structure by using mobile phone data , 2009, Proceedings of the National Academy of Sciences.

[17]  D. di Bernardo,et al.  How to infer gene networks from expression profiles , 2007, Molecular systems biology.

[18]  Aniket Kittur,et al.  Bridging the gap between physical location and online social networks , 2010, UbiComp.

[19]  Roger Guimerà,et al.  Missing and spurious interactions and the reconstruction of complex networks , 2009, Proceedings of the National Academy of Sciences.

[20]  Mark E. J. Newman,et al.  Structural inference for uncertain networks , 2015, Physical review. E.

[21]  Daniel S. Margulies,et al.  Connectome Smoothing via Low-Rank Approximations , 2016, IEEE Transactions on Medical Imaging.

[22]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[23]  Can M. Le,et al.  Estimating a network from multiple noisy realizations , 2017, ArXiv.

[24]  Carter T. Butts,et al.  Network inference, error, and informant (in)accuracy: a Bayesian approach , 2003, Soc. Networks.

[25]  Sujay Sanghavi,et al.  Learning the graph of epidemic cascades , 2012, SIGMETRICS '12.

[26]  Grace Kao,et al.  Do You Like Me as Much as I Like You? Friendship Reciprocity and Its Effects on School Outcomes among Adolescents. , 2008, Social science research.

[27]  Peter Orbanz,et al.  Subsampling large graphs and invariance in networks , 2017, 1710.04217.

[28]  Xing Xie,et al.  Mining user similarity based on location history , 2008, GIS '08.

[29]  Anees Shaikh,et al.  Issues with inferring Internet topological attributes , 2004, Comput. Commun..

[30]  Alex Pentland,et al.  Reality mining: sensing complex social systems , 2006, Personal and Ubiquitous Computing.

[31]  Tiago P. Peixoto Reconstructing networks with unknown and heterogeneous errors , 2018, Physical Review X.

[32]  Mark E. J. Newman,et al.  Friendship networks and social status , 2012, Network Science.

[33]  Eric D. Kolaczyk,et al.  A Latent Eigenprobit Model with Link Uncertainty for Prediction of Protein–Protein Interactions , 2012 .

[34]  P. Deb Finite Mixture Models , 2008 .

[35]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[36]  Samuel Ellis,et al.  Animal social networks - an introduction for complex systems scientists , 2020, ArXiv.

[37]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[38]  M. E. J. Newman,et al.  Network structure from rich but noisy data , 2017, Nature Physics.

[39]  Jean-Gabriel Young,et al.  Reconstruction of plant–pollinator networks from observational data , 2019, Nature Communications.

[40]  Jure Leskovec,et al.  Inferring networks of diffusion and influence , 2010, KDD.

[41]  John Zimmerman,et al.  "You Never Call, You Never Write": Call and SMS Logs Do Not Always Indicate Tie Strength , 2015, CSCW.

[42]  Xiao-Li Meng,et al.  POSTERIOR PREDICTIVE ASSESSMENT OF MODEL FITNESS VIA REALIZED DISCREPANCIES , 1996 .

[43]  Carsten Wiuf,et al.  Subnets of scale-free networks are not scale-free: sampling properties of networks. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[44]  Jure Leskovec,et al.  The Network Completion Problem: Inferring Missing Nodes and Edges in Networks , 2011, SDM.

[45]  Jure Leskovec,et al.  Measurement error in network data: A re-classification , 2012, Soc. Networks.

[46]  M. Newman,et al.  Hierarchical structure and the prediction of missing links in networks , 2008, Nature.

[47]  Dominique Gravel,et al.  The structure of probabilistic networks , 2016 .

[48]  Mark Huisman,et al.  Imputation of missing network data: Some simple procedures , 2009, J. Soc. Struct..

[49]  Bridget E. Begg,et al.  A Proteome-Scale Map of the Human Interactome Network , 2014, Cell.

[50]  Lei Chen,et al.  On Uncertain Graphs , 2018, On Uncertain Graphs.

[51]  C. T. Butts,et al.  Revisiting the Foundations of Network Analysis , 2009, Science.

[52]  Florence d'Alché-Buc,et al.  A Bayesian Framework for Combining Protein and Network Topology Information for Predicting Protein-Protein Interactions , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[53]  Marcos André Gonçalves,et al.  A brief survey of automatic methods for author name disambiguation , 2012, SGMD.

[54]  Diego Garlaschelli,et al.  Maximum-Entropy Networks: Pattern Detection, Network Reconstruction and Graph Combinatorics , 2017 .

[55]  Zhengwu Zhang,et al.  Common and individual structure of brain networks , 2017, The Annals of Applied Statistics.

[56]  M. E. J. Newman,et al.  Estimating network structure from unreliable measurements , 2018, Physical Review E.

[57]  Francesco Bonchi,et al.  Core decomposition of uncertain graphs , 2014, KDD.

[58]  T. Ito,et al.  Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[59]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[60]  Shmuel Sattath,et al.  How reliable are experimental protein-protein interaction data? , 2003, Journal of molecular biology.

[61]  Jennifer Neville,et al.  Methods to Determine Node Centrality and Clustering in Graphs with Uncertain Structure , 2011, ICWSM.

[62]  Lise Getoor,et al.  Collective Graph Identification , 2016, ACM Trans. Knowl. Discov. Data.

[63]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[64]  Tom A. B. Snijders,et al.  Social Network Analysis , 2011, International Encyclopedia of Statistical Science.

[65]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[66]  D. Long Networks of the Brain , 2011 .