The interplay between communities and homophily in semi-supervised classification using graph neural networks

Graph Neural Networks (GNNs) are effective in many applications. Still, there is a limited understanding of the effect of common graph structures on the learning process of GNNs. To fill this gap, we study the impact of community structure and homophily on the performance of GNNs in semi-supervised node classification on graphs. Our methodology consists of systematically manipulating the structure of eight datasets, and measuring the performance of GNNs on the original graphs and the change in performance in the presence and the absence of community structure and/or homophily. Our results show the major impact of both homophily and communities on the classification accuracy of GNNs, and provide insights on their interplay. In particular, by analyzing community structure and its correlation with node labels, we are able to make informed predictions on the suitability of GNNs for classification on a given graph. Using an information-theoretic metric for community-label correlation, we devise a guideline for model selection based on graph structure. With our work, we provide insights on the abilities of GNNs and the impact of common network phenomena on their performance. Our work improves model selection for node classification in semi-supervised settings.

[1]  William H. Press,et al.  Numerical Recipes 3rd Edition: The Art of Scientific Computing , 2007 .

[2]  Samy Bengio,et al.  Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks , 2019, KDD.

[3]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[4]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[5]  L. Akoglu,et al.  Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs , 2020, NeurIPS.

[6]  Christophe Bravard,et al.  Homophily and Community Structure in Networks , 2016 .

[7]  Jimeng Sun,et al.  Social influence analysis in large-scale networks , 2009, KDD.

[8]  O. Chapelle,et al.  Semi-Supervised Learning (Chapelle, O. et al., Eds.; 2006) [Book reviews] , 2009, IEEE Transactions on Neural Networks.

[9]  Tiago P. Peixoto Disentangling homophily, community structure and triadic closure in networks , 2021, Physical Review X.

[10]  Dominik Kowald,et al.  Consensus dynamics in online collaboration systems , 2018, Computational social networks.

[11]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[12]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[13]  Yixin Chen,et al.  Link Prediction Based on Graph Neural Networks , 2018, NeurIPS.

[14]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[15]  Stephan Günnemann,et al.  Pitfalls of Graph Neural Network Evaluation , 2018, ArXiv.

[16]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[17]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[18]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[19]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[20]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[21]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[22]  Stephan Günnemann,et al.  Predict then Propagate: Graph Neural Networks meet Personalized PageRank , 2018, ICLR.

[23]  Xiao-Ming Wu,et al.  Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning , 2018, AAAI.

[24]  Denis Helic,et al.  On the Impact of Communities on Semi-supervised Classification Using Graph Neural Networks , 2020, COMPLEX NETWORKS.

[25]  Johannes Klicpera,et al.  Scaling Graph Neural Networks with Approximate PageRank , 2020, KDD.

[26]  Kevin Chen-Chuan Chang,et al.  Geom-GCN: Geometric Graph Convolutional Networks , 2020, ICLR.

[27]  Boleslaw K. Szymanski,et al.  On community structure in complex networks: challenges and opportunities , 2019, Applied Network Science.

[28]  Tom M. Mitchell,et al.  Learning to Extract Symbolic Knowledge from the World Wide Web , 1998, AAAI/IAAI.

[29]  Philippe Thomas Review of Semi-supervised learning by O. Chapelle, B. Schölkopf, and A. Zien, Eds. London, UK, MIT Press, 2006 , 2009 .

[30]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..