论文信息 - Probabilistic Label Trees for Extreme Multi-label Classification - 字舞流文

Probabilistic Label Trees for Extreme Multi-label Classification

Extreme multi-label classification (XMLC) is a learning task of tagging instances with a small subset of relevant labels chosen from an extremely large pool of possible labels. Problems of this scale can be efficiently handled by organizing labels as a tree, like in hierarchical softmax used for multi-class problems. In this paper, we thoroughly investigate probabilistic label trees (PLTs) which can be treated as a generalization of hierarchical softmax for multi-label problems. We first introduce the PLT model and discuss training and inference procedures and their computational costs. Next, we prove the consistency of PLTs for a wide spectrum of performance metrics. To this end, we upperbound their regret by a function of surrogate-loss regrets of node classifiers. Furthermore, we consider a problem of training PLTs in a fully online setting, without any prior knowledge of training instances, their features, or labels. In this case, both node classifiers and the tree structure are trained online. We prove a specific equivalence between the fully online algorithm and an algorithm with a tree structure given in advance. Finally, we discuss several implementations of PLTs and introduce a new one, napkinXC, which we empirically evaluate and compare with state-of-the-art algorithms.

Marek Wydmuch | Krzysztof Dembczynski | Kalina Jasinska-Kobus | Mikhail Kuznetsov | Robert Busa-Fekete | R. Busa-Fekete | Marek Wydmuch | K. Dembczynski | Kalina Jasinska-Kobus | Mikhail Kuznetsov

[1] E. Hüllermeier,et al. Consistent multilabel ranking through univariate loss minimization , 2012, ICML 2012.

[2] Yoshua Bengio,et al. Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[3] Manik Varma,et al. Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages , 2013, WWW.

[4] Kilian Q. Weinberger,et al. Feature hashing for large scale multitask learning , 2009, ICML '09.

[5] Zihan Zhang,et al. AttentionXML: Label Tree-based Attention-Aware Deep Model for High-Performance Extreme Multi-Label Text Classification , 2019, NeurIPS.

[6] Yukihiro Tagami,et al. AnnexML: Approximate Nearest Neighbor Search for Extreme Multi-label Classification , 2017, KDD.

[7] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[8] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[9] Oluwasanmi Koyejo,et al. Consistent Multilabel Classification , 2015, NIPS.

[10] Eyke Hüllermeier,et al. Online F-Measure Optimization , 2015, NIPS.

[11] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[12] Bernhard Schölkopf,et al. DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification , 2016, WSDM.

[13] Manik Varma,et al. FastXML: a fast, accurate and stable tree-classifier for extreme multi-label learning , 2014, KDD.

[14] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[15] John Langford. Vowpal Wabbit , 2014 .

[16] John Langford,et al. Learning Reductions That Really Work , 2016, Proceedings of the IEEE.

[17] Marek Kurzynski,et al. On the multistage Bayes classifier , 1988, Pattern Recognit..

[18] Pradeep Ravikumar,et al. PPDsparse: A Parallel Primal-Dual Sparse Method for Extreme Classification , 2017, KDD.

[19] Moustapha Cissé,et al. Efficient softmax approximation for GPUs , 2016, ICML.

[20] Jason Weston,et al. Label Partitioning For Sublinear Ranking , 2013, ICML.

[21] Marek Wydmuch,et al. Online probabilistic label trees , 2020, ArXiv.

[22] Jason Weston,et al. Label Embedding Trees for Large Multi-Class Tasks , 2010, NIPS.

[23] Eyke Hüllermeier,et al. Extreme F-measure Maximization using Sparse Probability Estimates , 2016, ICML.

[24] Yiming Yang,et al. Deep Learning for Extreme Multi-label Text Classification , 2017, SIGIR.

[25] John Langford,et al. Error-Correcting Tournaments , 2009, ALT.

[26] Chun-Liang Li,et al. Condensed Filter Tree for Cost-Sensitive Multi-Label Classification , 2014, ICML.

[27] Ohad Shamir,et al. Multiclass-Multilabel Classification with More Classes than Examples , 2010, AISTATS.

[28] Shivani Agarwal,et al. Surrogate regret bounds for bipartite ranking via strongly proper losses , 2012, J. Mach. Learn. Res..

[29] Pradeep Ravikumar,et al. PD-Sparse : A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification , 2016, ICML.

[30] Manik Varma,et al. Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications , 2016, KDD.

[31] Wojciech Kotlowski,et al. Surrogate regret bounds for generalized classification performance metrics , 2015, Machine Learning.

[32] Alexander C. Berg,et al. Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition , 2011, NIPS.

[33] Eyke Hüllermeier,et al. Bayes Optimal Multilabel Classification via Probabilistic Classifier Chains , 2010, ICML.

[34] Róbert Busa-Fekete,et al. A no-regret generalization of hierarchical softmax to extreme multi-label classification , 2018, NeurIPS.

[35] Jian Xu,et al. Learning Optimal Tree Models Under Beam Search , 2020, ICML.

[36] J. Fox. Applied Regression Analysis, Linear Models, and Related Methods , 1997 .

[37] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[38] Grigorios Tsoumakas,et al. Effective and Efficient Multilabel Classification in Domains with Large Number of Labels , 2008 .

[39] Ankit Singh Rawat,et al. Multilabel reductions: what is my loss optimising? , 2019, NeurIPS.

[40] Rohit Babbar,et al. Bonsai - Diverse and Shallow Trees for Extreme Multi-label Classification , 2019, ArXiv.

[41] John Langford,et al. Conditional Probability Tree Estimation Analysis and Algorithms , 2009, UAI.

[42] Chao Xu,et al. On the computational complexity of the probabilistic label tree algorithms , 2019, ArXiv.

[43] Anna Choromanska,et al. Simultaneous Learning of Trees and Representations for Extreme Classification and Density Estimation , 2016, ICML.

[44] J. Ian Munro,et al. Robin hood hashing , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[45] Tomas Mikolov,et al. Bag of Tricks for Efficient Text Classification , 2016, EACL.

[46] Eyke Hüllermeier,et al. Consistency of Probabilistic Classifier Trees , 2016, ECML/PKDD.

[47] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .

[48] Yves Grandvalet,et al. Optimizing F-Measures by Cost-Sensitive Classification , 2014, NIPS.

[49] Yoram Singer,et al. Efficient Online and Batch Learning Using Forward Backward Splitting , 2009, J. Mach. Learn. Res..

[50] Thomas Demeester,et al. Representation learning for very short texts using weighted word embedding aggregation , 2016, Pattern Recognit. Lett..