Efficient and effective training of language and graph neural network models

Can we combine heterogenous graph structure with text to learn high-quality semantic and behavioural representations? Graph neural networks (GNN)s encode numerical node attributes and graph structure to achieve impressive performance in a variety of supervised learning tasks. Current GNN approaches are challenged by textual features, which typically need to be encoded to a numerical vector before provided to the GNN that may incur some information loss. In this paper, we put forth an efficient and effective framework termed language model GNN (LM-GNN) to jointly train large-scale language models and graph neural networks. The effectiveness in our framework is achieved by applying stage-wise fine-tuning of the BERT model first with heterogenous graph information and then with a GNN model. Several system and design optimizations are proposed to enable scalable and efficient training. LM-GNN accommodates node and edge classification as well as link prediction tasks. We evaluate the LM-GNN framework in different datasets performance and showcase the effectiveness of the proposed approach. LM-GNN provides competitive results in an Amazon query-purchase-product application.

[1]  Cho-Jui Hsieh,et al.  Node Feature Extraction by Self-Supervised Multi-scale Neighborhood Prediction , 2021, ICLR.

[2]  G. Karypis,et al.  Distributed Hybrid CPU and GPU training for Graph Neural Networks on Billion-Scale Heterogeneous Graphs , 2021, KDD.

[3]  Chaozhuo Li,et al.  AdsGNN: Behavior-Graph Augmented Relevance Modeling in Sponsored Search , 2021, SIGIR.

[4]  Da Zheng,et al.  PanRep: Graph neural networks for extracting universal node embeddings in heterogeneous graphs , 2021 .

[5]  Huasha Zhao,et al.  TextGNN: Improving Text Encoder via Graph Neural Network in Sponsored Search , 2021, WWW.

[6]  George Karypis,et al.  Few-shot link prediction via graph neural networks for Covid-19 drug-repurposing , 2020, ArXiv.

[7]  J. Leskovec,et al.  Open Graph Benchmark: Datasets for Machine Learning on Graphs , 2020, NeurIPS.

[8]  G. Karypis,et al.  DGL-KE: Training Knowledge Graph Embeddings at Scale , 2020, SIGIR.

[9]  Irwin King,et al.  MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding , 2020, WWW.

[10]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[11]  Yanfang Ye,et al.  Heterogeneous Graph Attention Network , 2019, WWW.

[12]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[13]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[14]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[15]  Max Welling,et al.  Modeling Relational Data with Graph Convolutional Networks , 2017, ESWC.

[16]  K. G. Srinivasa,et al.  Representation Learning on Graphs , 2018 .

[17]  Zhendong Mao,et al.  Knowledge Graph Embedding: A Survey of Approaches and Applications , 2017, IEEE Transactions on Knowledge and Data Engineering.

[18]  Jure Leskovec,et al.  Representation Learning on Graphs: Methods and Applications , 2017, IEEE Data Eng. Bull..

[19]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[20]  Pierre Vandergheynst,et al.  Geometric Deep Learning: Going beyond Euclidean data , 2016, IEEE Signal Process. Mag..

[21]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[22]  Jian Pei,et al.  Asymmetric Transitivity Preserving Graph Embedding , 2016, KDD.

[23]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[24]  Qiongkai Xu,et al.  GraRep: Learning Graph Representations with Global Structural Information , 2015, CIKM.

[25]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[26]  Jianfeng Gao,et al.  Embedding Entities and Relations for Learning and Inference in Knowledge Bases , 2014, ICLR.

[27]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[28]  Alexander J. Smola,et al.  Distributed large-scale natural graph factorization , 2013, WWW.

[29]  Mikhail Belkin,et al.  Regularization and Semi-supervised Learning on Large Graphs , 2004, COLT.

[30]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.