论文信息 - Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification - 字舞流文

Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification

Extreme multi-label text classification (XMC) seeks to find relevant labels from an extreme large label collection for a given text input. Many real-world applications can be formulated as XMC problems, such as recommendation systems, document tagging and semantic search. Recently, transformer based XMC methods, such as X-Transformer and LightXML, have shown significant improvement over other XMC methods. Despite leveraging pre-trained transformer models for text representation, the fine-tuning procedure of transformer models on large label space still has lengthy computational time even with powerful GPUs. In this paper, we propose a novel recursive approach, XR-Transformer to accelerate the procedure through recursively fine-tuning transformer models on a series of multi-resolution objectives related to the original XMC objective function. Empirical results show that XR-Transformer takes significantly less training time compared to other transformer-based XMC models while yielding better state-of-the-art results. In particular, on the public Amazon-3M dataset with 3 million labels, XR-Transformer is not only 20x faster than XTransformer but also improves the Precision@1 from 51% to 54%. Our code is publicly available at https://github.com/amzn/pecos.

Inderjit S. Dhillon | Hsiang-fu Yu | Jiong Zhang | Wei-cheng Chang | I. Dhillon | Hsiang-Fu Yu | Wei-Cheng Chang | Jiong Zhang

[1] Yiming Yang,et al. Deep Learning for Extreme Multi-label Text Classification , 2017, SIGIR.

[2] Narendra Ahuja,et al. Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Bernhard Schölkopf,et al. Data scarcity, robustness and extreme multi-label classification , 2019, Machine Learning.

[4] Bernhard Schölkopf,et al. DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification , 2016, WSDM.

[5] Charles Elkan,et al. The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[6] Zihan Zhang,et al. AttentionXML: Label Tree-based Attention-Aware Deep Model for High-Performance Extreme Multi-Label Text Classification , 2019, NeurIPS.

[7] John Langford,et al. Logarithmic Time One-Against-Some , 2016, ICML.

[8] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[9] Ankit Singh Rawat,et al. Multilabel reductions: what is my loss optimising? , 2019, NeurIPS.

[10] John Langford,et al. Logarithmic Time Online Multiclass prediction , 2015, NIPS.

[11] Manik Varma,et al. Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications , 2016, KDD.

[12] Manik Varma,et al. FastXML: a fast, accurate and stable tree-classifier for extreme multi-label learning , 2014, KDD.

[13] Inderjit S. Dhillon,et al. PECOS: Prediction for Enormous and Correlated Output Spaces , 2020, ArXiv.

[14] Sanjiv Kumar,et al. Accelerating Large-Scale Inference with Anisotropic Vector Quantization , 2019, ICML.

[15] Pradeep Ravikumar,et al. PD-Sparse : A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification , 2016, ICML.

[16] Marek Wydmuch,et al. Probabilistic Label Trees for Extreme Multi-label Classification , 2020, ArXiv.

[17] Anshumali Shrivastava,et al. Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products , 2019, NeurIPS.

[18] Manik Varma,et al. ECLARE: Extreme Classification with Label Graph Correlations , 2021, WWW.

[19] Chun-Liang Li,et al. Condensed Filter Tree for Cost-Sensitive Multi-Label Classification , 2014, ICML.

[20] Brian D. Davison,et al. Pretrained Generalized Autoregressive Model with Adaptive Probabilistic Label Clusters for Extreme Multi-label Text Classification , 2020, ICML.

[21] Manik Varma,et al. Extreme Regression for Dynamic Search Advertising , 2020, WSDM.

[22] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[23] Yukihiro Tagami,et al. AnnexML: Approximate Nearest Neighbor Search for Extreme Multi-label Classification , 2017, KDD.

[24] Marek Wydmuch,et al. Online probabilistic label trees , 2020, ArXiv.

[25] Manik Varma,et al. DECAF: Deep Extreme Classification with Label Features , 2021, WSDM.

[26] Ming-Wei Chang,et al. Latent Retrieval for Weakly Supervised Open Domain Question Answering , 2019, ACL.

[27] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[28] Moustapha Cissé,et al. Efficient softmax approximation for GPUs , 2016, ICML.

[29] Hsuan-Tien Lin,et al. Cost-sensitive label embedding for multi-label classification , 2017, Machine Learning.

[30] Pradeep Ravikumar,et al. PPDsparse: A Parallel Primal-Dual Sparse Method for Extreme Classification , 2017, KDD.

[31] Ting Jiang,et al. LightXML: Transformer with Dynamic Negative Sampling for High-Performance Extreme Multi-label Text Classification , 2021, AAAI.

[32] Wei-Cheng Chang,et al. Pre-training Tasks for Embedding-based Large-scale Retrieval , 2020, ICLR.

[33] I. Dhillon,et al. Taming Pretrained Transformers for Extreme Multi-label Text Classification , 2019, KDD.

[34] Rohit Babbar,et al. Bonsai - Diverse and Shallow Trees for Extreme Multi-label Classification , 2019, ArXiv.

[35] Jian Jiao,et al. GalaXC: Graph Neural Networks with Labelwise Attention for Extreme Classification , 2021, WWW.

[36] Jason Weston,et al. Curriculum learning , 2009, ICML '09.

[37] Yury A. Malkov,et al. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38] Venkatesh Balasubramanian,et al. Slice: Scalable Linear Extreme Classifiers Trained on 100 Million Labels for Related Searches , 2019, WSDM.

[39] Timo Aila,et al. A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Manik Varma,et al. DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents , 2021, WSDM.

[41] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[42] Jordi Gonzàlez,et al. A coarse-to-fine approach for fast deformable object detection , 2011, CVPR 2011.

[43] Andrew J. Davison,et al. Self-Supervised Generalisation with Meta Auxiliary Learning , 2019, NeurIPS.

[44] Jaakko Lehtinen,et al. Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[45] Hsuan-Tien Lin,et al. Advances in Cost-sensitive Multiclass and Multilabel Classification , 2019, KDD.

[46] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[47] Inderjit S. Dhillon,et al. Extreme Multi-label Learning for Semantic Matching in Product Search , 2021, KDD.

[48] Róbert Busa-Fekete,et al. A no-regret generalization of hierarchical softmax to extreme multi-label classification , 2018, NeurIPS.

[49] Ali Mousavi,et al. Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces , 2019, NeurIPS.