DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents

Scalability and accuracy are well recognized challenges in deep extreme multi-label learning where the objective is to train architectures for automatically annotating a data point with the most relevant subset of labels from an extremely large label set. This paper develops the DeepXML framework that addresses these challenges by decomposing the deep extreme multi-label task into four simpler sub-tasks each of which can be trained accurately and efficiently. Choosing different components for the four sub-tasks allows DeepXML to generate a family of algorithms with varying trade-offs between accuracy and scalability. In particular, DeepXML yields the Astec algorithm that could be 2-12% more accurate and 5-30x faster to train than leading deep extreme classifiers on publically available short text datasets. Astec could also efficiently train on Bing short text datasets containing up to 62 million labels while making predictions for billions of users and data points per day on commodity hardware. This allowed Astec to be deployed on the Bing search engine for a number of short text applications ranging from matching user queries to advertiser bid phrases to showing personalized ads where it yielded significant gains in click-through-rates, coverage, revenue and other online metrics over state-of-the-art techniques currently in production. DeepXML's code is available at https://github.com/Extreme-classification/deepxml.

[1]  Andrei Z. Broder,et al.  Online expansion of rare queries for sponsored search , 2009, WWW '09.

[2]  Wei-Cheng Chang,et al.  Pre-training Tasks for Embedding-based Large-scale Retrieval , 2020, ICLR.

[3]  Kenneth Ward Church,et al.  Query suggestion using hitting time , 2008, CIKM '08.

[4]  Yejin Choi,et al.  Using landing pages for sponsored search ad selection , 2010, WWW '10.

[5]  I. Dhillon,et al.  Taming Pretrained Transformers for Extreme Multi-label Text Classification , 2019, KDD.

[6]  ειδικούς στόχους,et al.  (2016) , 2018 .

[7]  Yuanyuan Zhang,et al.  Scalable Query N-Gram Embedding for Improving Matching and Relevance in Sponsored Search , 2018, KDD.

[8]  Brian D. Davison,et al.  Pretrained Generalized Autoregressive Model with Adaptive Probabilistic Label Clusters for Extreme Multi-label Text Classification , 2020, ICML.

[9]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[10]  Pradeep Ravikumar,et al.  PD-Sparse : A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification , 2016, ICML.

[11]  Manik Varma,et al.  Extreme Multi-label Learning with Label Features for Warm-start Tagging, Ranking & Recommendation , 2018, WSDM.

[12]  Xiaoyan Zhu,et al.  Domain-Constrained Advertising Keyword Generation , 2019, WWW.

[13]  James T. Kwok,et al.  Efficient Multi-label Classification with Many Labels , 2013, ICML.

[14]  Bernhard Schölkopf,et al.  DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification , 2016, WSDM.

[15]  Jian Jiao,et al.  GalaXC: Graph Neural Networks with Labelwise Attention for Extreme Classification , 2021, WWW.

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  John R. Anderson,et al.  Efficient Training on Very Large Corpora via Gramian Estimation , 2018, ICLR.

[18]  Manik Varma,et al.  DECAF: Deep Extreme Classification with Label Features , 2021, WSDM.

[19]  Pascale Fung,et al.  A Submodular Feature-Aware Framework for Label Subset Selection in Extreme Classification Problems , 2019, NAACL-HLT.

[20]  Xiaodong Liu,et al.  Multi-Task Deep Neural Networks for Natural Language Understanding , 2019, ACL.

[21]  Venkatesh Balasubramanian,et al.  Slice: Scalable Linear Extreme Classifiers Trained on 100 Million Labels for Related Searches , 2019, WSDM.

[22]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[23]  Ankit Singh Rawat,et al.  Sampled Softmax with Random Fourier Features , 2019, NeurIPS.

[24]  Yury A. Malkov,et al.  Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Yukihiro Tagami,et al.  AnnexML: Approximate Nearest Neighbor Search for Extreme Multi-label Classification , 2017, KDD.

[26]  Manik Varma,et al.  Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications , 2016, KDD.

[27]  M. Varacallo,et al.  2019 , 2019, Journal of Surgical Orthopaedic Advances.

[28]  Bernhard Schölkopf,et al.  Data scarcity, robustness and extreme multi-label classification , 2019, Machine Learning.

[29]  Li Wei,et al.  Sampling-bias-corrected neural modeling for large corpus item recommendations , 2019, RecSys.

[30]  Eyke Hüllermeier,et al.  Extreme F-measure Maximization using Sparse Probability Estimates , 2016, ICML.

[31]  Manik Varma,et al.  FastXML: a fast, accurate and stable tree-classifier for extreme multi-label learning , 2014, KDD.

[32]  Pradeep Ravikumar,et al.  Loss Decomposition for Fast Learning in Large Output Spaces , 2018, ICML.

[33]  Anshumali Shrivastava,et al.  Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products , 2019, NeurIPS.

[34]  Hiroshi Mamitsuka,et al.  AttentionXML: Extreme Multi-Label Text Classification with Multi-Label Attention Based Recurrent Neural Networks , 2018, ArXiv.

[35]  Manik Varma,et al.  ECLARE: Extreme Classification with Label Graph Correlations , 2021, WWW.

[36]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[37]  Pascale Kuntz,et al.  CRAFTML, an Efficient Clustering-based Random Forest for Extreme Multi-label Learning , 2018, ICML.

[38]  신기덕 2010 , 2019, The Winning Cars of the Indianapolis 500.

[39]  Prateek Jain,et al.  Sparse Local Embeddings for Extreme Multi-label Classification , 2015, NIPS.

[40]  Ion Androutsopoulos,et al.  Extreme Multi-Label Legal Text Classification: A Case Study in EU Legislation , 2019, Proceedings of the Natural Legal Language Processing Workshop 2019.

[41]  Róbert Busa-Fekete,et al.  A no-regret generalization of hierarchical softmax to extreme multi-label classification , 2018, NeurIPS.

[42]  Ali Mousavi,et al.  Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces , 2019, NeurIPS.

[43]  Martial Hebert,et al.  Learning to Model the Tail , 2017, NIPS.

[44]  Rohit Babbar,et al.  Bonsai - Diverse and Shallow Trees for Extreme Multi-label Classification , 2019, ArXiv.

[45]  Wei-Wei Tu,et al.  Learning for Tail Label Data: A Label-Specific Feature Approach , 2019, IJCAI.

[46]  Manik Varma,et al.  Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages , 2013, WWW.

[47]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[48]  Tianbao Yang,et al.  Accelerating Deep Learning with Millions of Classes , 2020, ECCV.

[49]  Paul Mineiro,et al.  Fast Label Embeddings via Randomized Linear Algebra , 2014, ECML/PKDD.

[50]  Yiming Yang,et al.  Deep Learning for Extreme Multi-label Text Classification , 2017, SIGIR.

[51]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[52]  Bin Gao,et al.  Rare Query Expansion Through Generative Adversarial Networks in Search Advertising , 2018, KDD.

[53]  Ying Li,et al.  An end-to-end Generative Retrieval Method for Sponsored Search Engine -Decoding Efficiently into a Closed Target Domain , 2019, ArXiv.

[54]  Anna Choromanska,et al.  Simultaneous Learning of Trees and Representations for Extreme Classification and Density Estimation , 2016, ICML.

[55]  Jianfeng Gao,et al.  Learning Lexicon Models from Search Logs for Query Expansion , 2012, EMNLP.