FastXML: a fast, accurate and stable tree-classifier for extreme multi-label learning

The objective in extreme multi-label classification is to learn a classifier that can automatically tag a data point with the most relevant subset of labels from a large label set. Extreme multi-label classification is an important research problem since not only does it enable the tackling of applications with many labels but it also allows the reformulation of ranking problems with certain advantages over existing formulations. Our objective, in this paper, is to develop an extreme multi-label classifier that is faster to train and more accurate at prediction than the state-of-the-art Multi-label Random Forest (MLRF) algorithm [2] and the Label Partitioning for Sub-linear Ranking (LPSR) algorithm [35]. MLRF and LPSR learn a hierarchy to deal with the large number of labels but optimize task independent measures, such as the Gini index or clustering error, in order to learn the hierarchy. Our proposed FastXML algorithm achieves significantly higher accuracies by directly optimizing an nDCG based ranking loss function. We also develop an alternating minimization algorithm for efficiently optimizing the proposed formulation. Experiments reveal that FastXML can be trained on problems with more than a million labels on a standard desktop in eight hours using a single core and in an hour using multiple cores.

[1]  Chia-Hua Ho,et al.  An improved GLMNET for l1-regularized logistic regression , 2011, J. Mach. Learn. Res..

[2]  Tie-Yan Liu,et al.  A Theoretical Analysis of NDCG Type Ranking Measures , 2013, COLT.

[3]  Daphne Koller,et al.  Discriminative learning of relaxed hierarchy for large-scale visual recognition , 2011, 2011 International Conference on Computer Vision.

[4]  Jason Weston,et al.  Label Partitioning For Sublinear Ranking , 2013, ICML.

[5]  Inderjit S. Dhillon,et al.  Large-scale Multi-label Learning with Missing Labels , 2013, ICML.

[6]  Alexander C. Berg,et al.  Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition , 2011, NIPS.

[7]  Claudio Gentile,et al.  Incremental Algorithms for Hierarchical Classification , 2004, J. Mach. Learn. Res..

[8]  Grigorios Tsoumakas,et al.  Effective and Efficient Multilabel Classification in Domains with Large Number of Labels , 2008 .

[9]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[10]  Juho Rousu,et al.  Kernel-Based Learning of Hierarchical Multilabel Classification Models , 2006, J. Mach. Learn. Res..

[11]  Jianfeng Gao,et al.  Scalable training of L1-regularized log-linear models , 2007, ICML '07.

[12]  Hsuan-Tien Lin,et al.  Multilabel Classification with Principal Label Space Transformation , 2012, Neural Computation.

[13]  Jeff G. Schneider,et al.  Multi-Label Output Codes using Canonical Correlation Analysis , 2011, AISTATS.

[14]  Maksims Volkovs,et al.  BoltzRank: learning to maximize expected ranking gain , 2009, ICML '09.

[15]  Jason Weston,et al.  Label Embedding Trees for Large Multi-Class Tasks , 2010, NIPS.

[16]  Prasoon Goyal,et al.  Local Deep Kernel Learning for Efficient Non-linear SVM Prediction , 2013, ICML.

[17]  James T. Kwok,et al.  MultiLabel Classification on Tree- and DAG-Structured Hierarchies , 2011, ICML.

[18]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..

[19]  S. V. N. Vishwanathan,et al.  Efficient max-margin multi-label classification with applications to zero-shot learning , 2012, Machine Learning.

[20]  Krishnakumar Balasubramanian,et al.  The Landmark Selection Method for Multiple Output Prediction , 2012, ICML.

[21]  Manik Varma,et al.  Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages , 2013, WWW.

[22]  Jason Weston,et al.  WSABIE: Scaling Up to Large Vocabulary Image Annotation , 2011, IJCAI.

[23]  Hsuan-Tien Lin,et al.  Feature-aware Label Space Dimension Reduction for Multi-label Classification , 2012, NIPS.

[24]  Ashish Kapoor,et al.  Multilabel Classification using Bayesian Compressed Sensing , 2012, NIPS.

[25]  Grigorios Tsoumakas,et al.  Multilabel Text Classification for Automated Tag Suggestion , 2008 .

[26]  Yury Logachev,et al.  Smoothing NDCG metrics using tied scores , 2011, CIKM '11.

[27]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[28]  John Langford,et al.  Multi-Label Prediction via Compressed Sensing , 2009, NIPS.

[29]  Moustapha Cissé,et al.  Robust Bloom Filters for Large MultiLabel Classification Tasks , 2013, NIPS.

[30]  John Langford,et al.  Logarithmic Time Online Multiclass prediction , 2015, NIPS.

[31]  Pradeep Ravikumar,et al.  On NDCG Consistency of Listwise Ranking Methods , 2011, AISTATS.

[32]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[33]  Hsuan-Tien Lin,et al.  Multi-label Classification with Error-correcting Codes , 2011, ACML.

[34]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[35]  James T. Kwok,et al.  Efficient Multi-label Classification with Many Labels , 2013, ICML.

[36]  Rong Jin,et al.  Learning to Rank by Optimizing NDCG Measure , 2009, NIPS.

[37]  Hsuan-Tien Lin,et al.  Multi-label Classication with Error-correcting Codes , 2011 .

[38]  Jieping Ye,et al.  Extracting shared subspace for multi-label classification , 2008, KDD.