Characterizing the Scalability of Graph Convolutional Networks on Intel® PIUMA

Large-scale Graph Convolutional Network (GCN) inference on traditional CPU/GPU systems is challenging due to a large memory footprint, sparse computational patterns, and irregular memory accesses with poor locality. Intel’s Programmable Integrated Unffied Memory Architecture (PIUMA) is designed to address these challenges for graph analytics. In this paper, a detailed characterization of GCNs is presented using the Open-Graph Benchmark (OGB) datasets to determine the viability of PIUMA as a potential solution to GCN scalability. First, the extent of sparse matrix dense matrix multiplication (SpMM) as a performance driver for GCN on CPU and GPU is explored, offering a methodology for predicting GCN behavior as a function of dataset characteristics. Second, an SpMM kernel optimized for PIUMA is described and investigated for sensitivity to system parameters including memory bandwidth, latency, and thread count. SpMM scalability on PIUMA is demonstrated, while the scalability limitations of a Xeon-optimized SpMM implementation are discussed. Finally, GCN performance is compared on PIUMA versus a Xeon CPU system and Ampere GPU system, showing impressive results on PIUMA for largescale datasets.

[1]  D. Brooks,et al.  Architectural Implications of Embedding Dimension during GCN on CPU and GPU , 2022, arXiv.org.

[2]  Christopher W. Fletcher,et al.  Graphite: optimizing graph neural networks on CPUs through cooperative software-hardware techniques , 2022, ISCA.

[3]  J. Demmel,et al.  Distributed-Memory Sparse Kernels for Machine Learning , 2022, 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[4]  V. Prasanna,et al.  Accelerating Allreduce With In-Network Reduction on Intel PIUMA , 2022, IEEE Micro.

[5]  Jesmin Jahan Tithi,et al.  A New Parallel Algorithm for Sinkhorn Word-Movers Distance and Its Performance on PIUMA and Xeon CPU , 2021, 2107.06433.

[6]  Katherine A. Yelick,et al.  Distributed-memory parallel algorithms for sparse times tall-skinny-dense matrix multiplication , 2021, ICS.

[7]  Lei Deng,et al.  Rubik: A Hierarchical Architecture for Efficient Graph Neural Network Training , 2021, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[8]  Dhiraj D. Kalamkar,et al.  DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks , 2021, SC21: International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  Ronny Krashinsky,et al.  NVIDIA A100 Tensor Core GPU: Performance and Innovation , 2021, IEEE Micro.

[10]  Ahmed Louri,et al.  GCNAX: A Flexible and Energy-efficient Accelerator for Graph Convolutional Neural Networks , 2021, 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA).

[11]  Stijn Eyerman,et al.  PIUMA: Programmable Integrated Unified Memory Architecture , 2020, ArXiv.

[12]  Juan Manuel Górriz,et al.  Covid-19 classification by FGCNet with deep feature fusion from graph convolutional network and convolutional neural network , 2020, Information Fusion.

[13]  Stijn Eyerman,et al.  Projecting Performance for PIUMA using Down-Scaled Simulation , 2020, 2020 IEEE High Performance Extreme Computing Conference (HPEC).

[14]  Balasubramanian Seshasayee,et al.  Hash Table Scalability on Intel PIUMA , 2020, 2020 IEEE High Performance Extreme Computing Conference (HPEC).

[15]  Fabrizio Petrini,et al.  Prune the Unnecessary: Parallel Pull-Push Louvain Algorithms with Automatic Edge Pruning , 2020, ICPP.

[16]  Qing Li,et al.  A Graph Neural Network Framework for Social Recommendations , 2020, IEEE Transactions on Knowledge and Data Engineering.

[17]  Yu Wang,et al.  GE-SpMM: General-Purpose Sparse Matrix-Matrix Multiplication on GPUs for Graph Neural Networks , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.

[18]  J. Leskovec,et al.  Open Graph Benchmark: Datasets for Machine Learning on Graphs , 2020, NeurIPS.

[19]  Dongrui Fan,et al.  Characterizing and Understanding GCNs on GPU , 2020, IEEE Computer Architecture Letters.

[20]  Rulin Liu,et al.  Symmetrical Graph Neural Network for Quantum Chemistry with Dual Real and Momenta Space. , 2019, The journal of physical chemistry. A.

[21]  Rajgopal Kannan,et al.  GraphSAINT: Graph Sampling Based Inductive Learning Method , 2019, ICLR.

[22]  Samy Bengio,et al.  Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks , 2019, KDD.

[23]  Jure Leskovec,et al.  Graph Convolutional Neural Networks for Web-Scale Recommender Systems , 2018, KDD.

[24]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[25]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[26]  Rok Sosic,et al.  SNAP , 2016, ACM Trans. Intell. Syst. Technol..

[27]  Michael Isard,et al.  Scalability! But at what COST? , 2015, HotOS.

[28]  Jung Ho Ahn,et al.  HyperX: topology, routing, and packaging of efficient large-scale networks , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[29]  Jesmin Jahan Tithi,et al.  SU3_Bench on a Programmable Integrated Unified Memory Architecture (PIUMA) and How that Differs from Standard NUMA CPUs , 2022, ISC.

[30]  Jiajun Li,et al.  SGCNAX: A Scalable Graph Convolutional Neural Network Accelerator with Workload Balancing , 2021, IEEE Transactions on Parallel and Distributed Systems.