From which world is your graph

Discovering statistical structure from links is a fundamental problem in the analysis of social networks. Choosing a misspecified model, or equivalently, an incorrect inference algorithm will result in an invalid analysis or even falsely uncover patterns that are in fact artifacts of the model. This work focuses on unifying two of the most widely used link-formation models: the stochastic blockmodel (SBM) and the small world (or latent space) model (SWM). Integrating techniques from kernel learning, spectral graph theory, and nonlinear dimensionality reduction, we develop the first statistically sound polynomial-time algorithm to discover latent patterns in sparse graphs for both models. When the network comes from an SBM, the algorithm outputs a block structure. When it is from an SWM, the algorithm outputs estimates of each node's latent position.

[1]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[2]  M. Laver,et al.  Extracting Policy Positions from Political Texts Using Words as Data , 2003, American Political Science Review.

[3]  Elchanan Mossel,et al.  Reconstruction and estimation in the planted partition model , 2012, Probability Theory and Related Fields.

[4]  Pablo Barberá Birds of the Same Feather Tweet Together: Bayesian Ideal Point Estimation Using Twitter Data , 2015, Political Analysis.

[5]  S H Strogatz,et al.  Random graph models of social networks , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Ji Zhu,et al.  Consistency of community detection in networks under degree-corrected stochastic block models , 2011, 1110.3854.

[7]  Chandler Davis The rotation of eigenvectors by a perturbation , 1963 .

[8]  Sean Gerrish,et al.  Predicting Legislative Roll Calls from Text , 2011, ICML.

[9]  Bin Yu,et al.  Spectral clustering and the high-dimensional stochastic blockmodel , 2010, 1007.1684.

[10]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[11]  Tai Qin,et al.  Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel , 2013, NIPS.

[12]  Frank McSherry,et al.  Spectral partitioning of random graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[13]  Derek Ruths,et al.  Classifying Political Orientation on Twitter: It's Not Easy! , 2013, ICWSM.

[14]  Mung Chiang,et al.  Quantifying Political Leaning from Tweets, Retweets, and Retweeters , 2016, IEEE Transactions on Knowledge and Data Engineering.

[15]  K. T. Poole,et al.  A Spatial Model for Legislative Roll Call Analysis , 1985 .

[16]  Tosio Kato Variation of discrete spectra , 1987 .

[17]  Christos H. Papadimitriou,et al.  On the Eigenvalue Power Law , 2002, RANDOM.

[18]  Emmanuel Abbe,et al.  Community Detection in General Stochastic Block models: Fundamental Limits and Efficient Algorithms for Recovery , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[19]  Yi-Cheng Zhang,et al.  Bipartite network projection and personal recommendation. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  Jirí Matousek,et al.  Low-Distortion Embeddings of Finite Metric Spaces , 2004, Handbook of Discrete and Computational Geometry, 2nd Ed..

[21]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[22]  R. Oliveira Sums of random Hermitian matrices and an inequality by Rudelson , 2010, 1004.3821.

[23]  Joshua D. Clinton,et al.  The Statistical Analysis of Roll Call Data , 2004, American Political Science Review.

[24]  Patrick J. Wolfe,et al.  Co-clustering separately exchangeable network data , 2012, ArXiv.

[25]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[26]  E. Abbe,et al.  Community detection and the stochastic block model , 2016 .

[27]  Edoardo M. Airoldi,et al.  Stochastic blockmodel approximation of a graphon: Theory and consistent estimation , 2013, NIPS.

[28]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[29]  Joshua A. Tucker,et al.  Is Online Political Communication More Than an Echo Chamber? , 2022 .

[30]  Gilles Blanchard,et al.  On the Convergence of Eigenspaces in Kernel Principal Component Analysis , 2005, NIPS.

[31]  Sean Gerrish,et al.  How They Vote: Issue-Adjusted Models of Legislative Behavior , 2012, NIPS.

[32]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[33]  S. Chatterjee,et al.  Matrix estimation by Universal Singular Value Thresholding , 2012, 1212.1247.

[34]  Varun Kanade,et al.  Global and Local Information in Clustering Labeled Block Models , 2014, IEEE Transactions on Information Theory.

[35]  H. Weyl Das asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Differentialgleichungen (mit einer Anwendung auf die Theorie der Hohlraumstrahlung) , 1912 .

[36]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[37]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[38]  Joshua B. Tenenbaum,et al.  Global Versus Local Methods in Nonlinear Dimensionality Reduction , 2002, NIPS.

[39]  P. Wolfe,et al.  Nonparametric graphon estimation , 2013, 1309.5936.

[40]  Elchanan Mossel,et al.  A Proof of the Block Model Threshold Conjecture , 2013, Combinatorica.

[41]  Andrew J. Wathen,et al.  On spectral distribution of kernel matrices related to radial basis functions , 2015, Numerical Algorithms.

[42]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[43]  C. Priebe,et al.  Universally consistent vertex classification for latent positions graphs , 2012, 1212.1182.

[44]  Bin Yu,et al.  Co-clustering directed graphs to discover asymmetries and directional communities , 2016, Proceedings of the National Academy of Sciences.

[45]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[46]  Laurent Massoulié,et al.  Edge Label Inference in Generalized Stochastic Block Models: from Spectral Theory to Impossibility Results , 2014, COLT.

[47]  Jeffrey R. Lax,et al.  The Democratic Deficit in the States , 2012 .

[48]  Piotr Indyk,et al.  Low-distortion embeddings of general metrics into the line , 2005, STOC '05.

[49]  Justin Grimmer,et al.  Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts , 2013, Political Analysis.

[50]  Pablo Barberá Birds of the Same Feather Tweet Together. Estimating Policy Positions Using Twitter Data. , 2013 .

[51]  Peter D. Hoff,et al.  Latent Space Approaches to Social Network Analysis , 2002 .

[52]  Emmanuel Abbe,et al.  Detection in the stochastic block model with multiple clusters: proof of the achievability conjectures, acyclic BP, and the information-computation gap , 2015, ArXiv.

[53]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[54]  Ittai Abraham,et al.  Low-Distortion Inference of Latent Similarities from a Multiplex Social Network , 2012, SIAM J. Comput..

[55]  Alexandre Proutière,et al.  Optimal Cluster Recovery in the Labeled Stochastic Block Model , 2015, NIPS.

[56]  Laurent Massoulié,et al.  Community detection thresholds and the weak Ramanujan property , 2013, STOC.

[57]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[58]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[59]  Mikhail Belkin,et al.  On Learning with Integral Operators , 2010, J. Mach. Learn. Res..

[60]  Jon M. Kleinberg,et al.  The small-world phenomenon: an algorithmic perspective , 2000, STOC '00.

[61]  A. Pietsch Eigenvalue distribution of compact operators , 1986 .

[62]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[63]  W. Kahan,et al.  The Rotation of Eigenvectors by a Perturbation. III , 1970 .

[64]  M. Bálek,et al.  Large Networks and Graph Limits , 2022 .