Bayesian model-based clustering for multiple network data

There is increasing appetite for analysing multiple network data. This is due to the fast-growing body of applications that demand such methods. These include: the study of connectomes in neuroscience and the study of human mobility with respect to intelligent displays in computer science. Recent technological advancements have allowed the collection of this type of data. Both applications entail the analysis of a heterogeneous population of networks. In this paper we focus on the problem of clustering the elements of a network population, here each cluster will be characterised by a network representative. We take advantage of the Bayesian machinery to simultaneously infer the cluster membership, the representatives and the community structure of the representatives. Extensive simulation studies show our model performs well in both clustering multiple network data and inferring the model parameters. , we of false positive , 0.2, different to its true while its 95% credible interval covers a wide range of values, indicating that our MCMC chain struggles to make inferences here. These results suggest that our model performs well in most cases, even for high noise levels, but we must be cautious when making inferences for network populations with great variability in their structure. of initialise the nodes’ block membership of the representatives using R the presence of two underlying blocks. here that a simpler network model, namely the Erd¨os-R´enyi model, could also be applied to describe the representative networks. MCMC 500,000 iterations a burn-in

[1]  Can M. Le,et al.  Linear regression and its inference on noisy network‐linked data , 2020, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[2]  King-Sun Fu,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  D. Sussman,et al.  Causal Inference under Network Interference with Noise , 2021, 2105.04518.

[4]  George T. Cantwell,et al.  Robust Bayesian inference of network structure from unreliable data , 2020, ArXiv.

[5]  Carey E. Priebe,et al.  Inference for Multiple Heterogeneous Networks with a Common Invariant Subspace , 2019, J. Mach. Learn. Res..

[6]  Carey E. Priebe,et al.  Joint Embedding of Graphs , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  P. Wolfe,et al.  Modeling Network Populations via Graph Distances , 2019, Journal of the American Statistical Association.

[8]  E. Kolaczyk,et al.  Estimation of Subgraph Densities in Noisy Networks , 2018, Journal of the American Statistical Association.

[9]  Ernst Wit,et al.  Model-based clustering for populations of networks , 2018, Statistical Modelling.

[10]  Gertraud Malsiner-Walli,et al.  From here to infinity: sparse finite versus Dirichlet process mixtures in model-based clustering , 2018, Advances in Data Analysis and Classification.

[11]  Elizaveta Levina,et al.  NETWORK CLASSIFICATION WITH APPLICATIONS TO BRAIN CONNECTOMICS. , 2017, The annals of applied statistics.

[12]  D. Witten,et al.  The Multiple Random Dot Product Graph Model , 2018, 1811.12172.

[13]  Nigel Davies,et al.  Tacita: A Privacy Preserving Public Display Personalisation Service , 2018, UbiComp/ISWC Adjunct.

[14]  Tiago P. Peixoto Reconstructing networks with unknown and heterogeneous errors , 2018, Physical Review X.

[15]  S. Holmes,et al.  Tracking network dynamics: A survey using graph distances , 2018, The Annals of Applied Statistics.

[16]  Bruno Scarpa,et al.  Analysis of association football playing styles: An innovative method to cluster networks , 2018, Statistical Modelling.

[17]  M. E. J. Newman,et al.  Estimating network structure from unreliable measurements , 2018, Physical Review E.

[18]  Nial Friel,et al.  Optimal Bayesian estimators for latent variable cluster models , 2016, Statistics and Computing.

[19]  S. Holmes,et al.  TRACKING NETWORK DYNAMICS : A SURVEY OF DISTANCES AND SIMILARITY METRICS , 2018 .

[20]  Can M. Le,et al.  Estimating a network from multiple noisy realizations , 2017, ArXiv.

[21]  Lizhen Lin,et al.  Averages of unlabeled networks: Geometric characterization and asymptotic behavior , 2017, The Annals of Statistics.

[22]  Athanasios V. Vasilakos,et al.  Small-world human brain networks: Perspectives and challenges , 2017, Neuroscience & Biobehavioral Reviews.

[23]  Purnamrita Sarkar,et al.  On clustering network-valued data , 2016, NIPS.

[24]  Eric D. Kolaczyk,et al.  On the Propagation of Low-Rate Measurement Error to Subgraph Counts in Large Networks , 2014, J. Mach. Learn. Res..

[25]  T. B. Murphy,et al.  Joint Modelling of Multiple Network Views , 2013, 1301.3759.

[26]  David J. Marchette,et al.  Utilizing covariates in partially observed networks , 2015, 2015 18th International Conference on Information Fusion (Fusion).

[27]  Shantanu H. Joshi,et al.  Brain connectivity and novel network measures for Alzheimer's disease classification , 2015, Neurobiology of Aging.

[28]  S. Chatterjee,et al.  Matrix estimation by Universal Singular Value Thresholding , 2012, 1212.1247.

[29]  Bing Chen,et al.  An open science resource for establishing reliability and reproducibility in functional connectomics , 2014, Scientific Data.

[30]  Gertraud Malsiner-Walli,et al.  Model-based clustering based on sparse finite Gaussian mixtures , 2014, Statistics and Computing.

[31]  Jun Li,et al.  Hypothesis Testing For Network Data in Functional Neuroimaging , 2014, 1407.5525.

[32]  Daniele Durante,et al.  Nonparametric Bayes Modeling of Populations of Networks , 2014, 1406.7851.

[33]  J. Marron,et al.  Analysis of juggling data: Object oriented data analysis of clustering in acceleration functions , 2014 .

[34]  Garry Robins,et al.  Bayesian analysis for partially observed network data, missing ties, attributes and actors , 2013, Soc. Networks.

[35]  Hongyuan Wang,et al.  Shape clustering: Common structure discovery , 2013, Pattern Recognit..

[36]  Ji Zhu,et al.  Link Prediction for Partially Observed Networks , 2013, ArXiv.

[37]  Edoardo M. Airoldi,et al.  Estimating Latent Processes on a Network From Indirect Measurements , 2012, 1212.0178.

[38]  Carey E. Priebe,et al.  Statistical Inference on Errorfully Observed Graphs , 2012, 1211.3601.

[39]  Jae Kwang Kim,et al.  Imputation for statistical inference with coarse data , 2012 .

[40]  R Cameron Craddock,et al.  A whole brain fMRI atlas generated via spatially constrained spectral clustering , 2012, Human brain mapping.

[41]  Wei Cheng,et al.  Pattern Classification of Large-Scale Functional Brain Networks: Identification of Informative Neuroimaging Markers for Epilepsy , 2012, PloS one.

[42]  David Gold,et al.  Network‐based Auto‐probit Modeling for Protein Function Prediction , 2011, Biometrics.

[43]  Dimitri Van De Ville,et al.  Decoding brain states from fMRI connectivity graphs , 2011, NeuroImage.

[44]  Mark S Handcock,et al.  MODELING SOCIAL NETWORKS FROM SAMPLED DATA. , 2010, The annals of applied statistics.

[45]  James G. Scott,et al.  Handling Sparsity via the Horseshoe , 2009, AISTATS.

[46]  Edward R. Scheinerman,et al.  Random Dot Product Graph Models for Social Networks , 2007, WAW.

[47]  Ho-Jin Lee,et al.  Clustering of time-course gene expression data using functional data analysis , 2007, Comput. Biol. Chem..

[48]  Danielle Smith Bassett,et al.  Small-World Brain Networks , 2006, The Neuroscientist : a review journal bringing neurobiology, neurology and psychiatry.

[49]  Peter D. Hoff,et al.  Latent Space Approaches to Social Network Analysis , 2002 .

[50]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[51]  P. Green,et al.  On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion) , 1997 .

[52]  D. Rubin,et al.  Ignorability and Coarse Data , 1991 .

[53]  Donald B. Rubin,et al.  Inference from Coarse Data via Multiple Imputation with Application to Age Heaping , 1990 .

[54]  S. Fields,et al.  A novel genetic system to detect protein–protein interactions , 1989, Nature.

[55]  S. Brenner,et al.  The structure of the nervous system of the nematode Caenorhabditis elegans. , 1986, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.