Informative core identification in complex networks

In network analysis, the core structure of modeling interest is usually hidden in a larger network in which most structures are not informative. The noise and bias introduced by the non-informative component in networks can obscure the salient structure and limit many network modeling procedures’ effectiveness. This paper introduces a novel core-periphery model for the non-informative periphery structure of networks without imposing a specific form for the informative core structure. We propose spectral algorithms for core identification as a data preprocessing step for general downstream network analysis tasks based on the model. The algorithm enjoys a strong theoretical guarantee of accuracy and is scalable for large networks. We evaluate the proposed method by extensive simulation studies demonstrating various advantages over many traditional core-periphery methods. The method is applied to extract the informative core structure from a citation network and give more informative results in the downstream hierarchical community detection. 1 ar X iv :2 10 1. 06 38 8v 1 [ st at .M L ] 1 6 Ja n 20 21

[1]  A. Rinaldo,et al.  Consistency of spectral clustering in stochastic block models , 2013, 1312.2050.

[2]  Mason A. Porter,et al.  Core-Periphery Structure in Networks (Revisited) , 2017, SIAM Rev..

[3]  Karl Rohe,et al.  Discussion of “Coauthorship and citation networks for statisticians” , 2016 .

[4]  Chao Gao,et al.  Testing for Global Network Structure Using Small Subgraph Statistics , 2017, ArXiv.

[5]  Bruce Hajek,et al.  Information limits for recovering a hidden community , 2015, 2016 IEEE International Symposium on Information Theory (ISIT).

[6]  E. Levina,et al.  Network cross-validation by edge sampling , 2016, Biometrika.

[7]  Jon M. Kleinberg,et al.  Subgraph frequencies: mapping the empirical and extremal geography of large graph collections , 2013, WWW.

[8]  Fanie Reynders,et al.  The Configuration Model , 2018 .

[9]  Fabio Della Rossa,et al.  Profiling core-periphery network structure by random walkers , 2013, Scientific Reports.

[10]  C. Priebe,et al.  The two-to-infinity norm and singular subspace geometry with applications to high-dimensional statistics , 2017, The Annals of Statistics.

[11]  Jianqing Fan,et al.  ENTRYWISE EIGENVECTOR ANALYSIS OF RANDOM MATRICES WITH LOW EXPECTED RANK. , 2017, Annals of statistics.

[12]  Tianxi Li,et al.  Community models for partially observed networks from surveys , 2020, 2008.03652.

[13]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[14]  Xiaodong Li,et al.  Consistency of Spectral Clustering on Hierarchical Stochastic Block Models , 2020, 2004.14531.

[15]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[16]  E. Levina,et al.  Estimating network edge probabilities by neighborhood smoothing , 2015, 1509.08588.

[17]  Xiao Zhang,et al.  Identification of core-periphery structure in networks , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  Yuval Peres,et al.  Finding Hidden Cliques in Linear Time with High Probability , 2010, Combinatorics, Probability and Computing.

[19]  S. Chatterjee,et al.  Matrix estimation by Universal Singular Value Thresholding , 2012, 1212.1247.

[20]  Purnamrita Sarkar,et al.  Hierarchical community detection by recursive bi-partitioning , 2018 .

[21]  Tai Qin,et al.  Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel , 2013, NIPS.

[22]  Noga Alon,et al.  Finding a large hidden clique in a random graph , 1998, SODA '98.

[23]  Peter D. Hoff,et al.  Latent Space Approaches to Social Network Analysis , 2002 .

[24]  Jiashun Jin,et al.  Coauthorship and Citation Networks for Statisticians , 2014, ArXiv.

[25]  D. Aldous Representations for partially exchangeable arrays of random variables , 1981 .

[26]  M. Newman Community detection in networks: Modularity optimization and maximum likelihood are equivalent , 2016, Physical review. E.

[27]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[28]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[29]  Bin Yu,et al.  Spectral clustering and the high-dimensional stochastic blockmodel , 2010, 1007.1684.

[30]  Chao Gao,et al.  Achieving Optimal Misclassification Proportion in Stochastic Block Models , 2015, J. Mach. Learn. Res..

[31]  Andrea Montanari,et al.  Improved Sum-of-Squares Lower Bounds for Hidden Clique and Hidden Submatrix Problems , 2015, COLT.

[32]  Tengyuan Liang,et al.  Computational and Statistical Boundaries for Submatrix Localization in a Large Noisy Matrix , 2015, 1502.01988.

[33]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[34]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[35]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[36]  Naoki Masuda,et al.  Core-periphery structure requires something else in the network , 2017, ArXiv.

[37]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[38]  Zhuang Ma,et al.  Universal Latent Space Model Fitting for Large Networks with Edge Covariates , 2020, J. Mach. Learn. Res..

[39]  Jianqing Fan,et al.  An l∞ Eigenvector Perturbation Bound and Its Application to Robust Covariance Estimation , 2018, Journal of machine learning research : JMLR.

[40]  Fabrizio Lillo,et al.  Centrality metrics and localization in core-periphery networks , 2015, ArXiv.

[41]  Purnamrita Sarkar,et al.  Mean Field for the Stochastic Blockmodel: Optimization Landscape and Convergence Issues , 2018, NeurIPS.

[42]  Stephen B. Seidman,et al.  Network structure and minimum degree , 1983 .

[43]  Martin G. Everett,et al.  Models of core/periphery structures , 2000, Soc. Networks.

[44]  Carey E. Priebe,et al.  On a two-truths phenomenon in spectral graph clustering , 2018, Proceedings of the National Academy of Sciences.

[45]  Yudong Chen,et al.  Incoherence-Optimal Matrix Completion , 2013, IEEE Transactions on Information Theory.

[46]  Sang Hoon Lee,et al.  Density-Based and Transport-Based Core-Periphery Structures in Networks , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[47]  Ji Zhu,et al.  Consistency of community detection in networks under degree-corrected stochastic block models , 2011, 1110.3854.

[48]  Carey E. Priebe,et al.  Statistical Inference on Random Dot Product Graphs: a Survey , 2017, J. Mach. Learn. Res..

[49]  Sang Hoon Lee,et al.  Detection of core–periphery structure in networks using spectral methods and geodesic paths , 2014, European Journal of Applied Mathematics.

[50]  Jiashun Jin,et al.  FAST COMMUNITY DETECTION BY SCORE , 2012, 1211.5803.

[51]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[52]  F. Chung,et al.  The average distances in random graphs with given expected degrees , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[53]  Yu. I. Ingster,et al.  Sharp Variable Selection of a Sparse Submatrix in a High-Dimensional Noisy Matrix , 2013, 1303.5647.

[54]  Carey E. Priebe,et al.  A Consistent Adjacency Spectral Embedding for Stochastic Blockmodel Graphs , 2011, 1108.2228.

[55]  Béla Bollobás,et al.  A Probabilistic Proof of an Asymptotic Formula for the Number of Labelled Regular Graphs , 1980, Eur. J. Comb..