Decoupled Smoothing on Graphs

Graph smoothing methods are an extremely popular family of approaches for semi-supervised learning. The choice of graph used to represent relationships in these learning problems is often a more important decision than the particular algorithm or loss function used, yet this choice is less well-studied in the literature. In this work, we demonstrate that for social networks, the basic friendship graph itself may often not be the appropriate graph for predicting node attributes using graph smoothing. More specifically, standard graph smoothing is designed to harness the social phenomenon of homophily whereby individuals are similar to “the company they keep.” We present a decoupled approach to graph smoothing that decouples notions of “identity” and “preference,” resulting in an alternative social phenomenon of monophily whereby individuals are similar to “the company they're kept in,” as observed in recent empirical work. Our model results in a rigorous extension of the Gaussian Markov Random Field (GMRF) models that underlie graph smoothing, interpretable as smoothing on an appropriate auxiliary graph of weighted or unweighted two-hop relationships.

[1]  J. Rao,et al.  Combining Independent Estimators and Estimation in Linear Regression with Unequal Variances , 1971 .

[2]  Frédéric Lavancier,et al.  A general procedure to combine estimators , 2014, Comput. Stat. Data Anal..

[3]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[4]  W. G. Cochran Problems arising in the analysis of a series of similar experiments , 1937 .

[5]  Fan Chung Graham,et al.  Local Graph Partitioning using PageRank Vectors , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[6]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[7]  Ron Meir,et al.  Bias, Variance and the Combination of Least Squares Estimators , 1994, NIPS.

[8]  Shankar Kumar,et al.  Video suggestion and discovery for youtube: taking random walks through the view graph , 2008, WWW.

[9]  Jon M. Kleinberg,et al.  Block models and personalized PageRank , 2016, Proceedings of the National Academy of Sciences.

[10]  David F. Gleich,et al.  Heat kernel based community detection , 2014, KDD.

[11]  Leto Peel,et al.  Graph-based semi-supervised learning for relational networks , 2016, SDM.

[12]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[13]  Ahmed El Alaoui,et al.  Asymptotic behavior of \(\ell_p\)-based Laplacian regularization in semi-supervised learning , 2016, COLT.

[14]  Ya Xu Empirical stationary correlations for semi-supervised learning on graphs , 2009 .

[15]  D. Brook On the distinction between the conditional probability and the joint probability approaches in the specification of nearest-neighbour systems , 1964 .

[16]  B. Nadler,et al.  Semi-supervised learning with the graph Laplacian: the limit of infinite unlabelled data , 2009, NIPS 2009.

[17]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[18]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[19]  Mikhail Belkin,et al.  Regularization and Semi-supervised Learning on Large Graphs , 2004, COLT.

[20]  Graham Cormode,et al.  Node Classification in Social Networks , 2011, Social Network Data Analytics.

[21]  W. G. Cochran The combination of estimates from different experiments. , 1954 .

[22]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[23]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.

[24]  Kristen M. Altenburger,et al.  Monophily in social networks introduces similarity among friends-of-friends , 2018, Nature Human Behaviour.

[25]  William R. Fairweather,et al.  A Method of Obtaining an Exact Confidence Interval for the Common Mean of Several Normal Populations , 1972 .

[26]  Kaplan,et al.  ‘Combining Probability Distributions from Experts in Risk Analysis’ , 2000, Risk analysis : an official publication of the Society for Risk Analysis.

[27]  Foster J. Provost,et al.  Classification in Networked Data: a Toolkit and a Univariate Case Study , 2007, J. Mach. Learn. Res..

[28]  Mason A. Porter,et al.  Comparing Community Structure to Characteristics in Online Collegiate Social Networks , 2008, SIAM Rev..

[29]  M. Halperin Almost Linearly-Optimum Combination of Unbiased Estimates , 1961 .

[30]  R. L. Winkler Combining Probability Distributions from Dependent Information Sources , 1981 .

[31]  Mason A. Porter,et al.  Social Structure of Facebook Networks , 2011, ArXiv.

[32]  J. Besag,et al.  Bayesian image restoration, with two applications in spatial statistics , 1991 .