Active learning in the geometric block model

The geometric block model is a recently proposed generative model for random graphs that is able to capture the inherent geometric properties of many community detection problems, providing more accurate characterizations of practical community structures compared with the popular stochastic block model. Galhotra et al. recently proposed a motif-counting algorithm for unsupervised community detection in the geometric block model that is proved to be near-optimal. They also characterized the regimes of the model parameters for which the proposed algorithm can achieve exact recovery. In this work, we initiate the study of active learning in the geometric block model. That is, we are interested in the problem of exactly recovering the community structure of random graphs following the geometric block model under arbitrary model parameters, by possibly querying the labels of a limited number of chosen nodes. We propose two active learning algorithms that combine the use of motif-counting with two different label query policies. Our main contribution is to show that sampling the labels of a vanishingly small fraction of nodes (sub-linear in the total number of nodes) is sufficient to achieve exact recovery in the regimes under which the state-of-the-art unsupervised method fails. We validate the superior performance of our algorithms via numerical simulations on both real and synthetic datasets.

[1]  Emmanuel Abbe,et al.  Graph powering and spectral robustness , 2018, SIAM J. Math. Data Sci..

[2]  Pan Li,et al.  HS2: Active learning over hypergraphs with pointwise and pairwise queries , 2019, AISTATS.

[3]  Arya Mazumdar,et al.  Connectivity in Random Annulus Graphs and the Geometric Block Model , 2018, APPROX-RANDOM.

[4]  I-Hsiang Wang,et al.  On the Minimax Misclassification Ratio of Hypergraph Community Detection , 2018, IEEE Transactions on Information Theory.

[5]  Yuguo Chen,et al.  Higher-Order Spectral Clustering under Superimposed Stochastic Block Model , 2018, ArXiv.

[6]  I-Hsiang Wang,et al.  Community Detection in Hypergraphs: Optimal Statistical Limit and Efficient Algorithms , 2018, AISTATS.

[7]  Arya Mazumdar,et al.  The Geometric Block Model , 2017, AAAI.

[8]  François Baccelli,et al.  Community detection on euclidean random graphs , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[9]  Emmanuel Abbe,et al.  Community detection and stochastic block models: recent developments , 2017, Found. Trends Commun. Inf. Theory.

[10]  Ambedkar Dukkipati,et al.  Consistency of spectral hypergraph partitioning under planted partition model , 2015, 1505.01582.

[11]  Kangwook Lee,et al.  Community Recovery in Hypergraphs , 2017, IEEE Transactions on Information Theory.

[12]  Antonio Ortega,et al.  Active learning for community detection in stochastic block models , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[13]  Robert D. Nowak,et al.  S2: An Efficient Graph Based Active Learning Algorithm with Application to Nonparametric Classification , 2015, COLT.

[14]  Elchanan Mossel,et al.  Consistency Thresholds for the Planted Bisection Model , 2014, STOC.

[15]  Jiawei Han,et al.  Towards Active Learning on Graphs: An Error Bound Minimization Approach , 2012, 2012 IEEE 12th International Conference on Data Mining.

[16]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[17]  G. Lugosi,et al.  High-dimensional random geometric graphs and their clique number , 2011 .

[18]  William W. Cohen,et al.  Community-Based Recommendations: a Solution to the Cold Start Problem , 2011 .

[19]  Claudio Gentile,et al.  Active Learning on Trees and Graphs , 2010, COLT.

[20]  Jeff A. Bilmes,et al.  Label Selection on Graphs , 2009, NIPS.

[21]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[22]  Armand M. Makowski,et al.  Connectivity in one-dimensional geometric random graphs: Poisson approximations, zero-one laws and phase transitions , 2008 .

[23]  Jingchun Chen,et al.  Detecting functional modules in the yeast protein-protein interaction network , 2006, Bioinform..

[24]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.

[25]  Bhaskar Krishnamachari,et al.  Monotone properties of random geometric graphs have sharp thresholds , 2003, math/0310232.

[26]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[27]  J. Lafferty,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[28]  J. Dall,et al.  Random geometric graphs. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[29]  Piyush Gupta,et al.  Critical Power for Asymptotic Connectivity in Wireless Networks , 1999 .

[30]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[32]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[33]  Louis H. Y. Chen Poisson Approximation for Dependent Trials , 1975 .

[34]  P. Holland,et al.  Transitivity in Structural Models of Small Groups , 1971 .