BGNN-XML: Bilateral Graph Neural Networks for Extreme Multi-Label Text Classification

Extreme multi-label text classification (XMTC) aims to tag a text instance with the most relevant subset of labels from an extremely large label set. XMTC has attracted much recent attention due to massive label sets yielded by modern applications, such as news annotation and product recommendation. The main challenges of XMTC are the data scalability and sparsity, thereby leading to two issues: i) the intractability to scale to the extreme label setting, ii) the presence of long-tailed label distribution, implying that a large fraction of labels have few positive training instances. To overcome these problems, we propose BGNN-XML, a scalable graph neural network framework tailored for XMTC problems. Specifically, we exploit label correlations via excavating their co-occurrence patterns and build a label graph based on the correlation matrix. We then conduct the attributed graph clustering by performing graph convolution with a low-pass graph filter to jointly model label dependencies and label features, which induces semantic label clusters. We further propose a bilateral-branch graph isomorphism network to decouple representation learning and classifier learning for better modeling tail labels. Experimental results on multiple benchmark datasets demonstrate that BGNN-XML significantly outperforms state-of-the-art baselines while maintaining comparable prediction efficiency and model size.

[1]  Aidong Zhang,et al.  Correlation Networks for Extreme Multi-label Text Classification , 2020, KDD.

[2]  Brian D. Davison,et al.  Pretrained Generalized Autoregressive Model with Adaptive Probabilistic Label Clusters for Extreme Multi-label Text Classification , 2020, ICML.

[3]  Yufeng Zhang,et al.  Every Document Owns Its Structure: Inductive Text Classification via Graph Neural Networks , 2020, ACL.

[4]  J. Nie,et al.  VGCN-BERT: Augmenting BERT with Graph Embedding for Text Classification , 2020, ECIR.

[5]  Xien Liu,et al.  Tensor Graph Convolutional Networks for Text Classification , 2020, AAAI.

[6]  Xiu-Shen Wei,et al.  BBN: Bilateral-Branch Network With Cumulative Learning for Long-Tailed Visual Recognition , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[8]  Shuming Ma,et al.  Recursive Graphical Neural Networks for Text Classification , 2019, ArXiv.

[9]  Jianfeng Gao,et al.  On the Variance of the Adaptive Learning Rate and Beyond , 2019, ICLR.

[10]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[11]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[12]  Xiaotong Zhang,et al.  Attributed Graph Clustering via Adaptive Graph Convolution , 2019, IJCAI.

[13]  Jing Jiang,et al.  Attributed Graph Clustering: A Deep Attentional Embedding Approach , 2019, IJCAI.

[14]  Alan Wee-Chung Liew,et al.  Multi-label classification via label correlation and first order feature dependance in a data stream , 2019, Pattern Recognit..

[15]  Xuanjing Huang,et al.  How to Fine-Tune BERT for Text Classification? , 2019, CCL.

[16]  I. Dhillon,et al.  Taming Pretrained Transformers for Extreme Multi-label Text Classification , 2019, KDD.

[17]  Wei-Cheng Chang,et al.  X-BERT: eXtreme Multi-label Text Classification with BERT , 2019, 1905.02331.

[18]  Rohit Babbar,et al.  Bonsai: diverse and shallow trees for extreme multi-label classification , 2019, Machine Learning.

[19]  Jaewoo Kang,et al.  Self-Attention Graph Pooling , 2019, ICML.

[20]  Xiu-Shen Wei,et al.  Multi-Label Image Recognition With Graph Convolutional Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Bernhard Schölkopf,et al.  Data scarcity, robustness and extreme multi-label classification , 2019, Machine Learning.

[22]  Venkatesh Balasubramanian,et al.  Slice: Scalable Linear Extreme Classifiers Trained on 100 Million Labels for Related Searches , 2019, WSDM.

[23]  Hiroshi Mamitsuka,et al.  AttentionXML: Extreme Multi-Label Text Classification with Multi-Label Attention Based Recurrent Neural Networks , 2018, ArXiv.

[24]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[25]  Yuan Luo,et al.  Graph Convolutional Networks for Text Classification , 2018, AAAI.

[26]  Ken-ichi Kawarabayashi,et al.  Representation Learning on Graphs with Jumping Knowledge Networks , 2018, ICML.

[27]  Yixin Chen,et al.  An End-to-End Deep Learning Architecture for Graph Classification , 2018, AAAI.

[28]  Yu Xu,et al.  Matching Article Pairs with Graphical Decomposition and Convolutions , 2018, ACL.

[29]  Cao Xiao,et al.  FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling , 2018, ICLR.

[30]  Xiao-Ming Wu,et al.  Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning , 2018, AAAI.

[31]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[32]  Yiming Yang,et al.  Deep Learning for Extreme Multi-label Text Classification , 2017, SIGIR.

[33]  Pradeep Ravikumar,et al.  PPDsparse: A Parallel Primal-Dual Sparse Method for Extreme Classification , 2017, KDD.

[34]  Yukihiro Tagami,et al.  AnnexML: Approximate Nearest Neighbor Search for Extreme Multi-label Classification , 2017, KDD.

[35]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[36]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[37]  Quoc V. Le,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[38]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[39]  Bernhard Schölkopf,et al.  DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification , 2016, WSDM.

[40]  Manik Varma,et al.  Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications , 2016, KDD.

[41]  Dacheng Tao,et al.  Robust Extreme Multi-label Learning , 2016, KDD.

[42]  Pradeep Ravikumar,et al.  PD-Sparse : A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification , 2016, ICML.

[43]  Prateek Jain,et al.  Sparse Local Embeddings for Extreme Multi-label Classification , 2015, NIPS.

[44]  Jure Leskovec,et al.  Inferring Networks of Substitutable and Complementary Products , 2015, KDD.

[45]  Georgios Paliouras,et al.  LSHTC: A Benchmark for Large-Scale Text Classification , 2015, ArXiv.

[46]  Manik Varma,et al.  FastXML: a fast, accurate and stable tree-classifier for extreme multi-label learning , 2014, KDD.

[47]  Pasi Fränti,et al.  Balanced K-Means for Clustering , 2014, S+SSPR.

[48]  Moustapha Cissé,et al.  Robust Bloom Filters for Large MultiLabel Classification Tasks , 2013, NIPS.

[49]  Jure Leskovec,et al.  Hidden factors and hidden topics: understanding rating dimensions with review text , 2013, RecSys.

[50]  James T. Kwok,et al.  Efficient Multi-label Classification with Many Labels , 2013, ICML.

[51]  Ashish Kapoor,et al.  Multilabel Classification using Bayesian Compressed Sensing , 2012, NIPS.

[52]  Hsuan-Tien Lin,et al.  Feature-aware Label Space Dimension Reduction for Multi-label Classification , 2012, NIPS.

[53]  A. Zubiaga Enhancing Navigation on Wikipedia with Social Tags , 2012, ArXiv.

[54]  D. Sculley,et al.  Web-scale k-means clustering , 2010, WWW '10.

[55]  John Langford,et al.  Multi-Label Prediction via Compressed Sensing , 2009, NIPS.

[56]  Johannes Fürnkranz,et al.  Efficient Pairwise Multilabel Classification for Large-Scale Problems in the Legal Domain , 2008, ECML/PKDD.

[57]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[58]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[59]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.