论文信息 - Subset Labeled LDA for Large-Scale Multi-Label Classification - 字舞流文

Subset Labeled LDA for Large-Scale Multi-Label Classification

Labeled Latent Dirichlet Allocation (LLDA) is an extension of the standard unsupervised Latent Dirichlet Allocation (LDA) algorithm, to address multi-label learning tasks. Previous work has shown it to perform in par with other state-of-the-art multi-label methods. Nonetheless, with increasing label sets sizes LLDA encounters scalability issues. In this work, we introduce Subset LLDA, a simple variant of the standard LLDA algorithm, that not only can effectively scale up to problems with hundreds of thousands of labels but also improves over the LLDA state-of-the-art. We conduct extensive experiments on eight data sets, with label sets sizes ranging from hundreds to hundreds of thousands, comparing our proposed algorithm with the previously proposed LLDA algorithms (Prior--LDA, Dep--LDA), as well as the state of the art in extreme multi-label classification. The results show a steady advantage of our method over the other LLDA algorithms and competitive results compared to the extreme multi-label classification algorithms.

Yannis Papanikolaou | Grigorios Tsoumakas | Grigorios Tsoumakas | Yannis Papanikolaou

[1] Manik Varma,et al. Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages , 2013, WWW.

[2] Pradeep Ravikumar,et al. PD-Sparse : A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification , 2016, ICML.

[3] Johannes Fürnkranz,et al. Efficient Pairwise Multilabel Classification for Large-Scale Problems in the Legal Domain , 2008, ECML/PKDD.

[4] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[5] Grigorios Tsoumakas,et al. Effective and Efficient Multilabel Classification in Domains with Large Number of Labels , 2008 .

[6] Weiwei Liu,et al. On the Optimality of Classifier Chain for Multi-label Classification , 2015, NIPS.

[7] Inderjit S. Dhillon,et al. Large-scale Multi-label Learning with Missing Labels , 2013, ICML.

[8] Geoff Holmes,et al. Classifier chains for multi-label classification , 2009, Machine Learning.

[9] Tie-Yan Liu,et al. LightLDA: Big Topic Models on Modest Computer Clusters , 2014, WWW.

[10] Timothy N. Rubin,et al. Statistical topic models for multi-label document classification , 2011, Machine Learning.

[11] Zhi-Hua Zhou,et al. Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization , 2006, IEEE Transactions on Knowledge and Data Engineering.

[12] Mark Steyvers,et al. Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[13] Hinrich Schütze,et al. Introduction to information retrieval , 2008 .

[14] Wenguang Chen,et al. WarpLDA: a Cache Efficient O(1) Algorithm for Latent Dirichlet Allocation , 2015, Proc. VLDB Endow..

[15] Bernhard Schölkopf,et al. DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification , 2016, WSDM.

[16] Grigorios Tsoumakas,et al. Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[17] James R. Foulds,et al. Dense Distributions from Sparse Samples: Improved Gibbs Sampling Parameter Estimators for LDA , 2015, J. Mach. Learn. Res..

[18] Grigorios Tsoumakas,et al. Random K-labelsets for Multilabel Classification , 2022 .

[19] Prateek Jain,et al. Sparse Local Embeddings for Extreme Multi-label Classification , 2015, NIPS.

[20] Marcel Worring,et al. The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[21] Jason Weston,et al. Label Partitioning For Sublinear Ranking , 2013, ICML.

[22] Manik Varma,et al. Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications , 2016, KDD.

[23] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[24] Grigorios Tsoumakas,et al. Multilabel Text Classification for Automated Tag Suggestion , 2008 .

[25] Weiwei Liu,et al. Large Margin Metric Learning for Multi-Label Prediction , 2015, AAAI.

[26] Ramesh Nallapati,et al. Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[27] Yiming Yang,et al. A study of thresholding strategies for text categorization , 2001, SIGIR '01.

[28] Manik Varma,et al. FastXML: a fast, accurate and stable tree-classifier for extreme multi-label learning , 2014, KDD.