论文信息 - ECLARE: Extreme Classification with Label Graph Correlations

ECLARE: Extreme Classification with Label Graph Correlations

Deep extreme classification (XC) seeks to train deep architectures that can tag a data point with its most relevant subset of labels from an extremely large label set. The core utility of XC comes from predicting labels that are rarely seen during training. Such rare labels hold the key to personalized recommendations that can delight and surprise a user. However, the large number of rare labels and small amount of training data per rare label offer significant statistical and computational challenges. State-of-the-art deep XC methods attempt to remedy this by incorporating textual descriptions of labels but do not adequately address the problem. This paper presents ECLARE, a scalable deep learning architecture that incorporates not only label text, but also label correlations, to offer accurate real-time predictions within a few milliseconds. Core contributions of ECLARE include a frugal architecture and scalable techniques to train deep models along with label correlation graphs at the scale of millions of labels. In particular, ECLARE offers predictions that are 2–14% more accurate on both publicly available benchmark datasets as well as proprietary datasets for a related products recommendation task sourced from the Bing search engine. Code for ECLARE is available at https://github.com/Extreme-classification/ECLARE

[1] Pradeep Ravikumar,et al. PD-Sparse : A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification , 2016, ICML.

[2] Max Welling,et al. Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[3] Yuichi Yoshida,et al. Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[4] Manik Varma,et al. FastXML: a fast, accurate and stable tree-classifier for extreme multi-label learning , 2014, KDD.

[5] Pradeep Ravikumar,et al. Loss Decomposition for Fast Learning in Large Output Spaces , 2018, ICML.

[6] Jure Leskovec,et al. Graph Convolutional Neural Networks for Web-Scale Recommender Systems , 2018, KDD.

[7] Nagarajan Natarajan,et al. Distributional Semantics Meets Multi-Label Learning , 2019, AAAI.

[8] Piyush Rai,et al. Scalable Generative Models for Multi-label Learning with Missing Labels , 2017, ICML.

[9] Manik Varma,et al. DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents , 2021, WSDM.

[10] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[11] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[12] Yongdong Zhang,et al. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation , 2020, SIGIR.

[13] Bernhard Schölkopf,et al. DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification , 2016, WSDM.

[14] Manik Varma,et al. Extreme Multi-label Learning with Label Features for Warm-start Tagging, Ranking & Recommendation , 2018, WSDM.

[15] Prateek Jain,et al. Sparse Local Embeddings for Extreme Multi-label Classification , 2015, NIPS.

[16] Mike Schuster,et al. Japanese and Korean voice search , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17] Purushottam Kar,et al. Accelerating Extreme Classification via Adaptive Feature Agglomeration , 2019, IJCAI.

[18] Bernhard Schölkopf,et al. Data scarcity, robustness and extreme multi-label classification , 2019, Machine Learning.

[19] Rohit Babbar,et al. Bonsai - Diverse and Shallow Trees for Extreme Multi-label Classification , 2019, ArXiv.

[20] Jure Leskovec,et al. PinnerSage: Multi-Modal User Embedding Framework for Recommendations at Pinterest , 2020, KDD.

[21] Yukihiro Tagami,et al. AnnexML: Approximate Nearest Neighbor Search for Extreme Multi-label Classification , 2017, KDD.

[22] I. Dhillon,et al. Taming Pretrained Transformers for Extreme Multi-label Text Classification , 2019, KDD.

[23] Manik Varma,et al. Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications , 2016, KDD.

[24] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25] Róbert Busa-Fekete,et al. A no-regret generalization of hierarchical softmax to extreme multi-label classification , 2018, NeurIPS.

[26] Wenwu Zhu,et al. Deep Learning on Graphs: A Survey , 2018, IEEE Transactions on Knowledge and Data Engineering.

[27] Ali Mousavi,et al. Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces , 2019, NeurIPS.

[28] Pradeep Ravikumar,et al. PPDsparse: A Parallel Primal-Dual Sparse Method for Extreme Classification , 2017, KDD.