Distributed Training of Graph Convolutional Networks using Subgraph Approximation

Modern machine learning techniques are successfully being adapted to data modeled as graphs. However, many real-world graphs are typically very large and do not fit in memory, often making the problem of training machine learning models on them intractable. Distributed training has been successfully employed to alleviate memory problems and speed up training in machine learning domains in which the input data is assumed to be independently identical distributed (i.i.d). However, distributing the training of non i.i.d data such as graphs that are used as training inputs in Graph Convolutional Networks (GCNs) causes accuracy problems since information is lost at the graph partitioning boundaries. In this paper, we propose a training strategy that mitigates the lost information across multiple partitions of a graph through a subgraph approximation scheme. Our proposed approach augments each sub-graph with a small amount of edge and vertex information that is approximated from all other sub-graphs. The subgraph approximation approach helps the distributed training system converge at single-machine accuracy, while keeping the memory footprint low and minimizing synchronization overhead between the machines.

[1]  Sam Ade Jacobs,et al.  Communication Quantization for Data-Parallel Training of Deep Neural Networks , 2016, 2016 2nd Workshop on Machine Learning in HPC Environments (MLHPC).

[2]  Margaret Martonosi,et al.  Graphicionado: A high-performance and energy-efficient accelerator for graph analytics , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[3]  Samy Bengio,et al.  Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks , 2019, KDD.

[4]  Alex Fout,et al.  Protein Interface Prediction using Graph Convolutional Networks , 2017, NIPS.

[5]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[6]  Cong Xu,et al.  TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.

[7]  Nam Sung Kim,et al.  GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient Aggregation in Distributed CNN Training , 2018, NeurIPS.

[8]  Kiyoung Choi,et al.  A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[9]  Joan Bruna,et al.  Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[10]  Joan Bruna,et al.  Deep Convolutional Networks on Graph-Structured Data , 2015, ArXiv.

[11]  Jan Eric Lenssen,et al.  Fast Graph Representation Learning with PyTorch Geometric , 2019, ArXiv.

[12]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[13]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[14]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[15]  Junzhou Huang,et al.  Adaptive Sampling Towards Fast Graph Representation Learning , 2018, NeurIPS.

[16]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[17]  Cao Xiao,et al.  FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling , 2018, ICLR.

[18]  Dan Alistarh,et al.  QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.

[19]  Chang Zhou,et al.  AliGraph: A Comprehensive Graph Neural Network Platform , 2019, Proc. VLDB Endow..

[20]  William J. Dally,et al.  Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.

[21]  Rajeev Thakur,et al.  Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..

[22]  Yafei Dai,et al.  NeuGraph: Parallel Deep Neural Network Computation on Large Graphs , 2019, USENIX ATC.

[23]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[24]  Alexander Aiken,et al.  Improving the Accuracy, Scalability, and Performance of Graph Neural Networks with Roc , 2020, MLSys.

[25]  Jure Leskovec,et al.  Representation Learning on Graphs: Methods and Applications , 2017, IEEE Data Eng. Bull..

[26]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.