Model‐based clustering for social networks

Summary.  Network models are widely used to represent relations between interacting units or actors. Network data often exhibit transitivity, meaning that two actors that have ties to a third actor are more likely to be tied than actors that do not, homophily by attributes of the actors or dyads, and clustering. Interest often focuses on finding clusters of actors or ties, and the number of groups in the data is typically unknown. We propose a new model, the latent position cluster model, under which the probability of a tie between two actors depends on the distance between them in an unobserved Euclidean ‘social space’, and the actors’ locations in the latent social space arise from a mixture of distributions, each corresponding to a cluster. We propose two estimation methods: a two‐stage maximum likelihood method and a fully Bayesian method that uses Markov chain Monte Carlo sampling. The former is quicker and simpler, but the latter performs better. We also propose a Bayesian way of determining the number of clusters that are present by using approximate conditional Bayes factors. Our model represents transitivity, homophily by attributes and clustering simultaneously and does not require the number of clusters to be known. The model makes it easy to simulate realistic networks with clustering, which are potentially useful as inputs to models of more complex systems of which the network is part, such as epidemic models of infectious disease. We apply the model to two networks of social relations. A free software package in the R statistical language, latentnet, is available to analyse data by using the model.

[1]  Peter D. Hoff,et al.  Multiplicative latent factor models for description and prediction of social networks , 2009, Comput. Math. Organ. Theory.

[2]  Michel Wedel,et al.  Challenges and opportunities in high-dimensional choice data analyses , 2008 .

[3]  D. Hunter,et al.  Goodness of Fit of Social Network Models , 2008 .

[4]  Robert E. Tarjan,et al.  Clustering Social Networks , 2007, WAW.

[5]  Anthony C. Atkinson,et al.  Exploratory tools for clustering multivariate data , 2007, Comput. Stat. Data Anal..

[6]  A. Raftery,et al.  Model-Based Clustering With Dissimilarities: A Bayesian Approach , 2007 .

[7]  BollobásBéla,et al.  The phase transition in inhomogeneous random graphs , 2007 .

[8]  Eric P. Xing,et al.  Admixtures of latent blocks with application to protein interaction networks , 2007 .

[9]  Garry Robins,et al.  An introduction to exponential random graph (p*) models for social networks , 2007, Soc. Networks.

[10]  Steven M. Goodreau,et al.  Advances in exponential random graph (p*) models applied to a large social network , 2007, Soc. Networks.

[11]  Peng Wang,et al.  Recent developments in exponential random graph (p*) models for social networks , 2007, Soc. Networks.

[12]  Anders Skrondal,et al.  Discussion of the paper by Handcock, Rafferty and Tantrum , 2007 .

[13]  Ds Leslie Discussion of the article by Handcock, Raftery and Tantrum , 2007 .

[14]  Isobel Claire Gormley,et al.  Statistical models for rank data , 2007 .

[15]  R. Durrett Random Graph Dynamics: References , 2006 .

[16]  Edoardo M. Airoldi,et al.  Statistical Network Analysis: Models, Issues, and New Directions - ICML 2006 Workshop on Statistical Network Analysis, Pittsburgh, PA, USA, June 29, 2006, Revised Selected Papers , 2007, SNA@ICML.

[17]  J. Cuaresma,et al.  Nonlinearities in cross-country growth regressions: A Bayesian Averaging of Thresholds (BAT) approach , 2007 .

[18]  Tony O’Hagan Bayes factors , 2006 .

[19]  Nicholas T. Longford,et al.  Stability of household income in European countries in the 1990s , 2006, Comput. Stat. Data Anal..

[20]  Susan M. Shortreed,et al.  Positional Estimation Within a Latent Space Model for Networks , 2006 .

[21]  Adrian E. Raftery,et al.  MCLUST Version 3: An R Package for Normal Mixture Modeling and Model-Based Clustering , 2006 .

[22]  D. Hunter,et al.  Inference in Curved Exponential Family Models for Networks , 2006 .

[23]  P. Pattison,et al.  New Specifications for Exponential Random Graph Models , 2006 .

[24]  Edoardo M. Airoldi,et al.  Combining Stochastic Block Models and Mixed Membership for Statistical Network Analysis , 2006, SNA@ICML.

[25]  Thomas Brendan Murphy,et al.  A Latent Space Model for Rank Data , 2006, SNA@ICML.

[26]  Martina Morris,et al.  A Simple Model for Complex Networks with Arbitrary Degree Distribution and Clustering , 2006, SNA@ICML.

[27]  I. C. Gormley,et al.  Analysis of Irish third‐level college applications data , 2006 .

[28]  Gueorgi Kossinets,et al.  Empirical Analysis of an Evolving Social Network , 2006, Science.

[29]  Marco Riani,et al.  Random Start Forward Searches with Envelopes for Detecting Clusters in Multivariate Data , 2006 .

[30]  Anthony C. Atkinson,et al.  Robust classification with categorical variables , 2006 .

[31]  M. Newton,et al.  Estimating the Integrated Likelihood via Posterior Simulation Using the Harmonic Mean Identity , 2006 .

[32]  Edoardo M. Airoldi,et al.  Bayesian mixed-membership models of complex and evolving networks , 2006 .

[33]  Using Asymmetry to Estimate Potential , 2005 .

[34]  E. Fokoue,et al.  Mixtures of factor analyzers: an extension with covariates , 2005 .

[35]  Peter D. Hoff,et al.  Bilinear Mixed-Effects Models for Dyadic Data , 2005 .

[36]  Tom A. B. Snijders,et al.  Model selection in random effects models for directed graphs using approximated Bayes factors , 2005 .

[37]  Christian Tallberg A BAYESIAN APPROACH TO MODELING STOCHASTIC BLOCKSTRUCTURES WITH COVARIATES , 2004 .

[38]  Tena I. Katsaounis,et al.  Exploring Multivariate Data With the Forward Search , 2006 .

[39]  Albert-László Barabási,et al.  Evolution of Networks: From Biological Nets to the Internet and WWW , 2004 .

[40]  Alessandro Vespignani,et al.  Evolution of Networks-From Biological Nets to the Internet and WWW S N Dorogovtsev and J F F Mendes , 2004 .

[41]  P. Bearman,et al.  Chains of Affection: The Structure of Adolescent Romantic and Sexual Networks1 , 2004, American Journal of Sociology.

[42]  Matt J Keeling,et al.  Monogamous networks and the spread of sexually transmitted diseases. , 2004, Mathematical biosciences.

[43]  M. Handcock,et al.  Likelihood-based inference for stochastic models of sexual network formation. , 2004, Theoretical population biology.

[44]  Aravind Srinivasan,et al.  Modelling disease outbreaks in realistic urban social networks , 2004, Nature.

[45]  T. Snijders,et al.  p2: a random effects model with covariates for directed graphs , 2004 .

[46]  Peter D. Hoff,et al.  Modeling Dependencies in International Relations Networks , 2004, Political Analysis.

[47]  J. Lafferty,et al.  Mixed-membership models of scientific publications , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[48]  Giuseppe Liotta Graph Drawing: 11th International Symposium, GD 2003, Perugia, Italy, September 21-24, 2003, Revised Papers , 2004 .

[49]  S. Shen-Orr,et al.  Superfamilies of Evolved and Designed Networks , 2004, Science.

[50]  Kathleen M. Carley,et al.  Dynamic Social Network Modeling and Analysis: Workshop Summary and Papers , 2004 .

[51]  D. M. Titterington,et al.  Mixtures of Factor Analysers. Bayesian Estimation and Inference by Stochastic Simulation , 2004, Machine Learning.

[52]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[53]  Guanrong Chen,et al.  Complex networks: small-world, scale-free and beyond , 2003 .

[54]  Adrian E. Raftery,et al.  Enhanced Model-Based Clustering, Density Estimation, and Discriminant Analysis Software: MCLUST , 2003, J. Classif..

[55]  C. F. Sirmans,et al.  Spatial Modeling With Spatially Varying Coefficient Processes , 2003 .

[56]  Walter Krämer,et al.  Review of Modern applied statistics with S, 4th ed. by W.N. Venables and B.D. Ripley. Springer-Verlag 2002 , 2003 .

[57]  F. Quintana,et al.  Bayesian clustering and product partition models , 2003 .

[58]  A. D. Barbour,et al.  Discrete small world networks , 2003, cond-mat/0304020.

[59]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[60]  Thomas Brendan Murphy,et al.  Mixtures of distance-based models for ranking data , 2003, Comput. Stat. Data Anal..

[61]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[62]  T. Snijders,et al.  Settings in Social Networks : a Measurement Model , 2003 .

[63]  M. Handcock Center for Studies in Demography and Ecology Assessing Degeneracy in Statistical Models of Social Networks , 2005 .

[64]  Peter D. Ho Bilinear Mixed Eects Models for Dyadic Data , 2003 .

[65]  Peter D. Hoff Random Effects Models for Network Data , 2003 .

[66]  C. Hennig Breakdown points for maximum likelihood estimators of location–scale mixtures , 2004, math/0410073.

[67]  Peter D. Hoff,et al.  Latent Space Approaches to Social Network Analysis , 2002 .

[68]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[69]  M. Newman,et al.  Identity and Search in Social Networks , 2002, Science.

[70]  Bani K. Mallick,et al.  Analyzing Spatial Data Using Skew-Gaussian Processes , 2002 .

[71]  A. Gelman,et al.  Let's Practice What We Preach , 2002 .

[72]  Andrew Gelman,et al.  Let's Practice What We Preach , 2002 .

[73]  Tom A. B. Snijders,et al.  Markov Chain Monte Carlo Estimation of Exponential Random Graph Models , 2002, J. Soc. Struct..

[74]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .

[75]  A. Raftery,et al.  Bayesian Multidimensional Scaling and Choice of Dimension , 2001 .

[76]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[77]  Stan Lipovetsky,et al.  Latent Variable Models and Factor Analysis , 2001, Technometrics.

[78]  R Sásik,et al.  Percolation clustering: a novel approach to the clustering of gene expression patterns in Dictyostelium development. , 2001, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[79]  Gesine Reinert,et al.  Small worlds , 2001, Random Struct. Algorithms.

[80]  C. Robert,et al.  Computational and Inferential Difficulties with Mixture Posterior Distributions , 2000 .

[81]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[82]  A. Raftery,et al.  Bayesian Information Criterion for Censored Survival Models , 2000, Biometrics.

[83]  M. Stephens Dealing with label switching in mixture models , 2000 .

[84]  Zoubin Ghahramani,et al.  Variational Inference for Bayesian Mixtures of Factor Analysers , 1999, NIPS.

[85]  Emmanuel Lazega,et al.  Multiplexity, generalized exchange and cooperation in organizations: a case study , 1999, Soc. Networks.

[86]  Vladimir Batagelj,et al.  Generalized blockmodeling , 2005, Structural analysis in the social sciences.

[87]  A. Raftery,et al.  Detecting features in spatial point processes with clutter via model-based clustering , 1998 .

[88]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[89]  A. Raftery,et al.  A note on the Dirichlet process prior in Bayesian nonparametric inference with partial exchangeability , 1997 .

[90]  L. Wasserman,et al.  Computing Bayes Factors by Combining Simulation and Asymptotic Approximations , 1997 .

[91]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[92]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure , 1997 .

[93]  M Kretzschmar,et al.  Measures of concurrency in networks and the spread of infectious disease. , 1996, Mathematical biosciences.

[94]  J. Udry,et al.  The National Longitudinal Survey of Adolescent Health, Waves I & II (Add Health), 1994-1996: (384982004-001) , 1996 .

[95]  Xiao-Li Meng,et al.  POSTERIOR PREDICTIVE ASSESSMENT OF MODEL FITNESS VIA REALIZED DISCREPANCIES , 1996 .

[96]  L. Wasserman,et al.  A Reference Bayesian Test for Nested Hypotheses and its Relationship to the Schwarz Criterion , 1995 .

[97]  D. Mackay,et al.  Bayesian neural networks and density networks , 1995 .

[98]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[99]  C. Robert,et al.  Estimation of Finite Mixture Distributions Through Bayesian Sampling , 1994 .

[100]  Kenneth G. Manton,et al.  Statistical applications using fuzzy sets , 1994 .

[101]  Ioannis G. Tollis,et al.  Graph Drawing , 1994, Lecture Notes in Computer Science.

[102]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[103]  T. Snijders Enumeration and simulation methods for 0–1 matrices with given marginals , 1991 .

[104]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[105]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[106]  A. Bowman,et al.  A look at some data on the old faithful geyser , 1990 .

[107]  Katherine Faust Comparison of methods for positional analysis: Structural and general equivalences☆ , 1988 .

[108]  David Krackhardt,et al.  PREDICTING WITH NETWORKS: NONPARAMETRIC MULTIPLE REGRESSION ANALYSIS OF DYADIC DATA * , 1988 .

[109]  John Scott What is social network analysis , 2010 .

[110]  K. Fuast Comparison of methods for positional analysis: Structural and general equivalences , 1988 .

[111]  S. Wasserman,et al.  Stochastic a posteriori blockmodels: Construction and assessment , 1987 .

[112]  Yuchung J. Wang,et al.  Stochastic Blockmodels for Directed Graphs , 1987 .

[113]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[114]  D. Aldous Exchangeability and related topics , 1985 .

[115]  D. Rubin,et al.  On Jointly Estimating Parameters and Missing Data by Maximizing the Complete-Data Likelihood , 1983 .

[116]  P. Holland,et al.  An Exponential Family of Probability Distributions for Directed Graphs , 1981 .

[117]  S. Fienberg,et al.  Categorical Data Analysis of Single Sociometric Relations , 1981 .

[118]  R. Sibson Studies in the Robustness of Multidimensional Scaling: Perturbational Analysis of Classical Scaling , 1979 .

[119]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[120]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[121]  R. Kala,et al.  EXTENSIONS OF MILLIKEN'S ESTIMABILITY CRITERION , 1976 .

[122]  S. Boorman,et al.  Social Structure from Multiple Networks. I. Blockmodels of Roles and Positions , 1976, American Journal of Sociology.

[123]  Waldo R. Tobler,et al.  Spatial Interaction Patterns , 1976 .

[124]  F. Marriott 389: Separating Mixtures of Normal Distributions , 1975 .

[125]  P. Arabie,et al.  An algorithm for clustering relational data with applications to social network analysis and comparison with multidimensional scaling , 1975 .

[126]  R. Breiger The Duality of Persons and Groups , 1974 .

[127]  R. Alba,et al.  Bonds of Pluralism: The Form and Substance of Urban Social Networks. , 1974 .

[128]  P. Holland,et al.  A Method for Detecting Structure in Sociometric Data , 1970, American Journal of Sociology.

[129]  S. Leinhardt,et al.  The Structure of Positive Interpersonal Relations in Small Groups. , 1967 .

[130]  P. Lazarsfeld,et al.  Friendship as Social process: a substantive and methodological analysis , 1964 .

[131]  F. Harary,et al.  STRUCTURAL BALANCE: A GENERALIZATION OF HEIDER'S THEORY1 , 1977 .

[132]  G. Simmel,et al.  Conflict and the Web of Group Affiliations , 1955 .

[133]  M. Uschold,et al.  Methods and applications , 1953 .

[134]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[135]  R. Luce,et al.  A method of matrix analysis of group structure , 1949, Psychometrika.