Understanding the Success of Graph-based Semi-Supervised Learning using Partially Labelled Stochastic Block Model

With the proliferation of learning scenarios with an abundance of instances, but limited amount of high-quality labels, semi-supervised learning algorithms came to prominence. Graph-based SemiSupervised Learning (G-SSL) algorithms, of which Label Propagation (LP) is a prominent example, are particularly well-suited for these problems. The premise of LP is the existence of homophily in the graph, but beyond that nothing is known about the efficacy of LP. In particular, there is no characterisation that connects the structural constraints, volume and quality of the labels to the accuracy of LP. In this work, we draw upon the notion of recovery from the literature on community detection, and provide guarantees on accuracy for partially-labelled graphs generated from the Partially-Labelled Stochastic Block Model (PLSBM). Extensive experiments performed on synthetic data verify the theoretical findings.

[1]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[2]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[3]  Emmanuel Abbe,et al.  Community detection and stochastic block models: recent developments , 2017, Found. Trends Commun. Inf. Theory.

[4]  Alexandre Proutière,et al.  Accurate Community Detection in the Stochastic Block Model via Spectral Algorithms , 2014, ArXiv.

[5]  Nicolas Le Roux,et al.  11 Label Propagation and Quadratic Criterion , 2022 .

[6]  Emmanuel Abbe,et al.  Community Detection in General Stochastic Block models: Fundamental Limits and Efficient Algorithms for Recovery , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[7]  Deepayan Chakrabarti,et al.  Joint Inference of Multiple Label Types in Large Networks , 2014, ICML.

[8]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[9]  Andrew Tomkins,et al.  Graph Agreement Models for Semi-Supervised Learning , 2019, NeurIPS.

[10]  Aria Nosratinia,et al.  Community Detection With Side Information: Exact Recovery Under the Stochastic Block Model , 2018, IEEE Journal of Selected Topics in Signal Processing.

[11]  Jean Honorio,et al.  Information-theoretic Limits for Community Detection in Network Models , 2018, NeurIPS.

[12]  Yuto Yamaguchi,et al.  When Does Label Propagation Fail? A View from a Network Generative Model , 2017, IJCAI.