A generalized hypothesis test for community structure and homophily in networks

Networks continue to be of great interest to statisticians, with an emphasis on community detection. Less work, however, has addressed this question: given some network, does it exhibit meaningful community structure? We propose to answer this question in a principled manner by framing it as a statistical hypothesis in terms of a formal and general homophily metric. Homophily is a well-studied network property where intra-community edges are more likely than between-community edges. We use the homophily metric to identify and distinguish between three concepts: nominal, collateral, and intrinsic homophily. We propose a simple and interpretable test statistic leveraging this homophily parameter and formulate both asymptotic and bootstrapbased rejection thresholds. We prove its asymptotic properties and demonstrate it outperforms benchmark methods on both simulated and real world data. Furthermore, the proposed method yields rich, provocative insights on four classic data sets; namely, that meany well-studied networks do not actually have intrinsic homophily.

[1]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[2]  F. Radicchi,et al.  Statistical significance of communities in networks. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  A. Barrat,et al.  Estimating Potential Infection Transmission Routes in Hospital Wards Using Wearable Proximity Sensors , 2013, PloS one.

[4]  Pablo M. Gleiser,et al.  Community Structure in Jazz , 2003, Adv. Complex Syst..

[5]  Andrew B. Nobel,et al.  Significance-based community detection in weighted networks , 2016, J. Mach. Learn. Res..

[6]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[7]  Yuguo Chen,et al.  A block model for node popularity in networks with community structure , 2018 .

[8]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  B. Graham An Econometric Model of Network Formation With Degree Heterogeneity , 2017 .

[10]  Gerald C. Kane,et al.  What's Different about Social Media Networks? A Framework and Research Agenda , 2014, MIS Q..

[11]  Marco Aiello,et al.  The Power Grid as a Complex Network: a Survey , 2011, ArXiv.

[12]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[13]  Emden R. Gansner,et al.  Using automatic clustering to produce high-level system organizations of source code , 1998, Proceedings. 6th International Workshop on Program Comprehension. IWPC'98 (Cat. No.98TB100242).

[14]  Y. Qi,et al.  Asymptotic distribution of modularity in networks , 2019, Metrika.

[15]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[16]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[17]  Peter D. Hoff,et al.  Latent Space Approaches to Social Network Analysis , 2002 .

[18]  Purnamrita Sarkar,et al.  Hypothesis testing for automated community detection in networks , 2013, ArXiv.

[19]  M E J Newman Assortative mixing in networks. , 2002, Physical review letters.

[20]  Ji Zhu,et al.  Consistency of community detection in networks under degree-corrected stochastic block models , 2011, 1110.3854.

[21]  Pavel N Krivitsky,et al.  Fitting Position Latent Cluster Models for Social Networks with latentnet. , 2008, Journal of statistical software.

[22]  Jiashun Jin,et al.  FAST COMMUNITY DETECTION BY SCORE , 2012, 1211.5803.

[23]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.

[24]  F. Chung,et al.  The average distances in random graphs with given expected degrees , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[25]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[26]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[27]  Srijan Sengupta,et al.  Toward epidemic thresholds on temporal networks: a review and open questions , 2019, Applied Network Science.

[28]  Srijan Sengupta,et al.  SPECTRAL CLUSTERING IN HETEROGENEOUS NETWORKS , 2015 .

[29]  O Mason,et al.  Graph theory and networks in Biology. , 2006, IET systems biology.

[30]  Zengyou He,et al.  Computing exact P-values for community detection , 2020, Data Mining and Knowledge Discovery.

[31]  Bin Yu,et al.  Spectral clustering and the high-dimensional stochastic blockmodel , 2010, 1007.1684.

[32]  Ing-Ray Chen,et al.  Online Social Deception and Its Countermeasures: A Survey , 2021, IEEE Access.

[33]  Hao Liang,et al.  Detecting Statistically Significant Communities , 2018, IEEE Transactions on Knowledge and Data Engineering.

[34]  Matthieu Latapy,et al.  Computing Communities in Large Networks Using Random Walks , 2004, J. Graph Algorithms Appl..