CARL-G: Clustering-Accelerated Representation Learning on Graphs

Self-supervised learning on graphs has made large strides in achieving great performance in various downstream tasks. However, many state-of-the-art methods suffer from a number of impediments, which prevent them from realizing their full potential. For instance, contrastive methods typically require negative sampling, which is often computationally costly. While non-contrastive methods avoid this expensive step, most existing methods either rely on overly complex architectures or dataset-specific augmentations. In this paper, we ask: Can we borrow from classical unsupervised machine learning literature in order to overcome those obstacles? Guided by our key insight that the goal of distance-based clustering closely resembles that of contrastive learning: both attempt to pull representations of similar items together and dissimilar items apart. As a result, we propose CARL-G - a novel clustering-based framework for graph representation learning that uses a loss inspired by Cluster Validation Indices (CVIs), i.e., internal measures of cluster quality (no ground truth required). CARL-G is adaptable to different clustering methods and CVIs, and we show that with the right choice of clustering method and CVI, CARL-G outperforms node classification baselines on 4/5 datasets with up to a 79× training speedup compared to the best-performing baseline. CARL-G also performs at par or better than baselines in node clustering and similarity search tasks, training up to 1,500× faster than the best-performing baseline. Finally, we also provide theoretical foundations for the use of CVI-inspired losses in graph representation learning.

[1]  Erich Schubert Stop using the elbow criterion for k-means and how to choose the number of clusters instead , 2022, SIGKDD Explor..

[2]  E. Papalexakis,et al.  Link Prediction with Non-Contrastive Learning , 2022, ICLR.

[3]  Jiliang Tang,et al.  Empowering Graph Representation Learning with Test-Time Graph Transformation , 2022, ICLR.

[4]  Neil Shah,et al.  Multi-task Self-supervised Graph Neural Networks Enable Stronger Task Generalization , 2022, ICLR.

[5]  Neil Shah,et al.  MLPInit: Embarrassingly Simple GNN Training Acceleration with MLP Initialization , 2022, ICLR.

[6]  Erich Schubert,et al.  Clustering by Direct Optimization of the Medoid Silhouette , 2022, SISAP.

[7]  Xi Xiao,et al.  Graph Data Augmentation for Node Classification , 2022, 2022 26th International Conference on Pattern Recognition (ICPR).

[8]  Meng Jiang,et al.  Graph Data Augmentation for Graph Machine Learning: A Survey , 2022, IEEE Data Eng. Bull..

[9]  Neil Shah,et al.  Friend Story Ranking with Edge-Contextual Local Graph Convolutions , 2022, WSDM.

[10]  Chanyoung Park,et al.  Augmentation-Free Self-Supervised Learning on Graphs , 2021, AAAI.

[11]  Evangelos E. Papalexakis,et al.  Adversarially Generating Rank-Constrained Graphs , 2021, 2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA).

[12]  Xueying Guo,et al.  ETA Prediction with Graph Neural Networks in Google Maps , 2021, CIKM.

[13]  Xiaorui Liu,et al.  Graph Trend Filtering Networks for Recommendation , 2021, SIGIR.

[14]  Junchi Yan,et al.  From Canonical Correlation Analysis to Self-supervised Graph Neural Networks , 2021, NeurIPS.

[15]  Xiaorui Liu,et al.  Automated Self-Supervised Learning for Graphs , 2021, ICLR.

[16]  N. Chawla,et al.  Graph Barlow Twins: A self-supervised representation learning framework for graphs , 2021, Knowl. Based Syst..

[17]  Wenhao Yu,et al.  Learning from Counterfactual Links for Link Prediction , 2021, ICML.

[18]  Neil Shah,et al.  Graph Neural Networks for Friend Ranking in Large-scale Social Platforms , 2021, WWW.

[19]  Sarunas Girdzijauskas,et al.  Self-supervised Graph Neural Networks without explicit negative sampling , 2021, ArXiv.

[20]  Nitesh V. Chawla,et al.  Few-Shot Graph Learning for Molecular Property Prediction , 2021, WWW.

[21]  Eva L. Dyer,et al.  Large-Scale Representation Learning on Graphs via Bootstrapping , 2021, ICLR.

[22]  Qiang Liu,et al.  Graph Contrastive Learning with Adaptive Augmentation , 2020, WWW.

[23]  Zhangyang Wang,et al.  Graph Contrastive Learning with Augmentations , 2020, NeurIPS.

[24]  Leonardo Neves,et al.  Data Augmentation for Graph Neural Networks , 2020, AAAI.

[25]  Xianfeng Tang,et al.  Knowing your FATE: Friendship, Action and Temporal Explanations for User Engagement Prediction on Social Apps , 2020, KDD.

[26]  Liang Wang,et al.  Deep Graph Contrastive Representation Learning , 2020, ArXiv.

[27]  Kaveh Hassani,et al.  Contrastive Multi-View Representation Learning on Graphs , 2020, ICML.

[28]  Chang Zhou,et al.  Understanding Negative Sampling in Graph Representation Learning , 2020, KDD.

[29]  Xiangnan He,et al.  LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation , 2020, SIGIR.

[30]  Minnan Luo,et al.  Graph Representation Learning via Graphical Mutual Information Maximization , 2020, WWW.

[31]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[32]  Hyung Jin Chang,et al.  Symmetric Graph Convolutional Autoencoder for Unsupervised Graph Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Nada Lavrac,et al.  Embedding-based Silhouette community detection , 2019, Machine Learning.

[34]  Bert Huang,et al.  Labeled Graph Generative Adversarial Networks , 2019, ArXiv.

[35]  Samy Bengio,et al.  Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks , 2019, KDD.

[36]  Jan Eric Lenssen,et al.  Fast Graph Representation Learning with PyTorch Geometric , 2019, ArXiv.

[37]  Chi Chen,et al.  Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals , 2018, Chemistry of Materials.

[38]  Wenwu Zhu,et al.  Deep Learning on Graphs: A Survey , 2018, IEEE Transactions on Knowledge and Data Engineering.

[39]  Erich Schubert,et al.  Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms , 2018, SISAP.

[40]  Matthijs Douze,et al.  Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[41]  Jure Leskovec,et al.  Hierarchical Graph Representation Learning with Differentiable Pooling , 2018, NeurIPS.

[42]  Jure Leskovec,et al.  Graph Convolutional Neural Networks for Web-Scale Recommender Systems , 2018, KDD.

[43]  Yixin Chen,et al.  Link Prediction Based on Graph Neural Networks , 2018, NeurIPS.

[44]  Jure Leskovec,et al.  GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models , 2018, ICML.

[45]  Cao Xiao,et al.  FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling , 2018, ICLR.

[46]  Daniel Cremers,et al.  Clustering with Deep Learning: Taxonomy and New Methods , 2018, ArXiv.

[47]  John D. Kelleher,et al.  An Analysis of the Application of Simplified Silhouette to the Evaluation of k-means Clustering Validity , 2017, MLDM.

[48]  Jure Leskovec,et al.  Predicting multicellular function through multi-layer tissue networks , 2017, Bioinform..

[49]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[50]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[51]  Bo Yang,et al.  Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering , 2016, ICML.

[52]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[53]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[54]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[55]  Yingjie Tian,et al.  A Comprehensive Survey of Clustering Algorithms , 2015, Annals of Data Science.

[56]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[57]  D. Sculley,et al.  Web-scale k-means clustering , 2010, WWW '10.

[58]  Hae-Sang Park,et al.  A simple and fast algorithm for K-medoids clustering , 2009, Expert Syst. Appl..

[59]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[60]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[61]  Minho Kim,et al.  New indices for cluster validity assessment , 2005, Pattern Recognit. Lett..

[62]  Leon Danon,et al.  Comparing community structure identification , 2005, cond-mat/0505245.

[63]  Ricardo J. G. B. Campello,et al.  Evolutionary algorithms for clustering gene-expression data , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[64]  S H Strogatz,et al.  Random graph models of social networks , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[65]  Andrew W. Moore,et al.  Accelerating exact k-means algorithms with geometric reasoning , 1999, KDD '99.

[66]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[67]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[68]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[69]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[70]  Nikhil S. Rao,et al.  AutoGDA: Automated Graph Data Augmentation for Node Classification , 2022, LoG.

[71]  Olatz Arbelaitz,et al.  An extensive comparative study of cluster validity indices , 2013, Pattern Recognit..

[72]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[73]  S. Dasgupta The hardness of k-means clustering , 2008 .

[74]  P. Sopp Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[75]  André Hardy,et al.  An examination of procedures for determining the number of clusters in a data set , 1994 .

[76]  Yoshua Bengio,et al.  Convergence Properties of the K-Means Algorithms , 1994, NIPS.

[77]  J. Dunn Some Recent Investigations of a New Fuzzy Partitioning Algorithm and its Application to Pattern Classification Problems , 1974 .

[78]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[79]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .