Multi-Label Prediction for Large Text Corpora

Here we study the problem of predicting labels for large text corpora where each text can be assigned a variable number of labels. The problem might seem trivial when the label dimensionality is small, and can be easily solved using a series of one-vs-all classifiers. However, as the label dimensionality increases to several thousand, the parameter space becomes extremely large, and it is no longer possible to use the one-vs-all technique. Here we propose a model based on the factorization of higher order word vector moments, as well as the cross moments between the labels and the words for multi-label prediction. Our model provides guaranteed converge bounds on the extracted parameters. Further, our model takes only three passes through the training dataset to obtain the parameters, resulting in a highly scalable algorithm that can train on GB’s of data consisting of millions of documents with hundreds of thousands of labels using a nominal resource of a single processor with 16GB RAM. Our model achieves 10x-15x order of speed-up on large-scale datasets while producing competitive performance in comparison with existing benchmark algorithms.

[1]  Inderjit S. Dhillon,et al.  Large-scale Multi-label Learning with Missing Labels , 2013, ICML.

[2]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.

[3]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[4]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[5]  Prateek Jain,et al.  Sparse Local Embeddings for Extreme Multi-label Classification , 2015, NIPS.

[6]  Manik Varma,et al.  Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages , 2013, WWW.

[7]  Timothy N. Rubin,et al.  Statistical topic models for multi-label document classification , 2011, Machine Learning.

[8]  Tamara G. Kolda,et al.  Shifted Power Method for Computing Tensor Eigenpairs , 2010, SIAM J. Matrix Anal. Appl..

[9]  Jun Zhu,et al.  Spectral Methods for Supervised Topic Models , 2014, NIPS.

[10]  Jason Weston,et al.  WSABIE: Scaling Up to Large Vocabulary Image Annotation , 2011, IJCAI.

[11]  Cheng-Kok Koh,et al.  From O(k2N) to O(N): A fast complex-valued eigenvalue solver for large-scale on-chip interconnect analysis , 2009, 2009 IEEE MTT-S International Microwave Symposium Digest.

[12]  Manik Varma,et al.  FastXML: a fast, accurate and stable tree-classifier for extreme multi-label learning , 2014, KDD.

[13]  Percy Liang,et al.  Spectral Experts for Estimating Mixtures of Linear Regressions , 2013, ICML.

[14]  Anima Anandkumar,et al.  A Spectral Algorithm for Latent Dirichlet Allocation , 2012, Algorithmica.