BertGCN: Transductive Text Classification by Combining GNN and BERT

In this work, we propose BertGCN, a model that combines large scale pretraining and transductive learning for text classification. BertGCN constructs a heterogeneous graph over the dataset and represents documents as nodes using BERT representations. By jointly training the BERT and GCN modules within BertGCN, the proposed model is able to leverage the advantages of both worlds: large-scale pretraining which takes the advantage of the massive amount of raw data and transductive learning which jointly learns representations for both training data and unlabeled test data by propagating label influence through graph convolution. Experiments show that BertGCN achieves SOTA performances on a wide range of text classification datasets.1

[1]  Houfeng Wang,et al.  Text Level Graph Neural Network for Text Classification , 2019, EMNLP.

[2]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Hao Ma,et al.  GaAN: Gated Attention Networks for Learning on Large and Spatiotemporal Graphs , 2018, UAI.

[4]  Haopeng Zhang,et al.  Text Graph Transformer for Document Classification , 2020, EMNLP.

[5]  Zhiyuan Liu,et al.  A C-LSTM Neural Network for Text Classification , 2015, ArXiv.

[6]  Diego Marcheggiani,et al.  Deep Graph Convolutional Encoders for Structured Data to Text Generation , 2018, INLG.

[7]  Yuan Luo,et al.  Graph Convolutional Networks for Text Classification , 2018, AAAI.

[8]  Navneet Kaur,et al.  Opinion mining and sentiment analysis , 2016, 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom).

[9]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[10]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[11]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[12]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[13]  Jiwei Li,et al.  Description Based Text Classification with Reinforcement Learning , 2020, ICML.

[14]  Kilian Q. Weinberger,et al.  Simplifying Graph Convolutional Networks , 2019, ICML.

[15]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[16]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[17]  James Henderson,et al.  GILE: A Generalized Input-Label Embedding for Text Classification , 2018, TACL.

[18]  Xiao-Ming Wu,et al.  Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning , 2018, AAAI.

[19]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[20]  Jian-Yun Nie,et al.  VGCN-BERT: Augmenting BERT with Graph Embedding for Text Classification , 2020, ECIR.

[21]  Hongxia Yang,et al.  Comprehensive Information Integration Modeling Framework for Video Titling , 2020, KDD.

[22]  Qiaozhu Mei,et al.  PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks , 2015, KDD.

[23]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[24]  Diego Marcheggiani,et al.  Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling , 2017, EMNLP.

[25]  Khalil Sima'an,et al.  Graph Convolutional Encoders for Syntax-aware Neural Machine Translation , 2017, EMNLP.

[26]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[27]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[28]  Guoyin Wang,et al.  Joint Embedding of Words and Labels for Text Classification , 2018, ACL.

[29]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[30]  Alex Hai Wang,et al.  Don't follow me: Spam detection in Twitter , 2010, 2010 International Conference on Security and Cryptography (SECRYPT).

[31]  Razvan Pascanu,et al.  Learning Deep Generative Models of Graphs , 2018, ICLR 2018.

[32]  Zhanxing Zhu,et al.  Spatio-temporal Graph Convolutional Neural Network: A Deep Learning Framework for Traffic Forecasting , 2017, IJCAI.

[33]  Max Welling,et al.  Variational Graph Auto-Encoders , 2016, ArXiv.

[34]  Huajun Chen,et al.  Zero-shot Text Classification via Reinforced Self-training , 2020, ACL.

[35]  Subhabrata Mukherjee,et al.  Uncertainty-aware Self-training for Text Classification with Few Labels , 2020, ArXiv.

[36]  Nicola De Cao,et al.  MolGAN: An implicit generative model for small molecular graphs , 2018, ArXiv.

[37]  Han Wang,et al.  Enhancing Generalization in Natural Language Inference by Syntax , 2020, FINDINGS.

[38]  Nicola De Cao,et al.  Question Answering by Reasoning Across Documents with Graph Convolutional Networks , 2018, NAACL.

[39]  Cyrus Shahabi,et al.  Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting , 2017, ICLR.

[40]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[41]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[42]  Xavier Bresson,et al.  Geometric Matrix Completion with Recurrent Multi-Graph Neural Networks , 2017, NIPS.

[43]  Christopher D. Manning,et al.  Graph Convolution over Pruned Dependency Trees Improves Relation Extraction , 2018, EMNLP.

[44]  Wei Lu,et al.  Deep Neural Networks for Learning Graph Representations , 2016, AAAI.

[45]  Ji Wu,et al.  Tensor Graph Convolutional Networks for Text Classification , 2020, AAAI.