Scalable Generative Models for Multi-label Learning with Missing Labels

We present a scalable, generative framework for multi-label learning with missing labels. Our framework consists of a latent factor model for the binary label matrix, which is coupled with an exposure model to account for label missingness (i.e., whether a zero in the label matrix is indeed a zero or denotes a missing observation). The underlying latent factor model also assumes that the low-dimensional embeddings of each label vector are directly conditioned on the respective feature vector of that example. Our generative framework admits a simple inference procedure, such that the parameter estimation reduces to a sequence of simple weighted leastsquare regression problems, each of which can be solved easily, efficiently, and in parallel. Moreover, inference can also be performed in an online fashion using mini-batches of training examples, which makes our framework scalable for large data sets, even when using moderate computational resources. We report both quantitative and qualitative results for our framework on several benchmark data sets, comparing it with a number of state-of-the-art methods.

[1]  James G. Scott,et al.  Expectation-maximization for logistic regression , 2013, 1306.0040.

[2]  Lawrence Carin,et al.  Large-Scale Bayesian Multi-Label Learning via Topic-Based Label Embeddings , 2015, NIPS.

[3]  Samy Bengio,et al.  ADIOS: Architectures Deep In Output Space , 2016, ICML.

[4]  Manik Varma,et al.  Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications , 2016, KDD.

[5]  Tatsuya Harada,et al.  Multi-label Ranking from Positive and Unlabeled Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Ashish Kapoor,et al.  Active learning for sparse bayesian multilabel classification , 2014, KDD.

[7]  Philip S. Yu,et al.  Large-Scale Multi-Label Learning with Incomplete Label Assignments , 2014, SDM.

[8]  Manik Varma,et al.  FastXML: a fast, accurate and stable tree-classifier for extreme multi-label learning , 2014, KDD.

[9]  Hsuan-Tien Lin,et al.  Feature-aware Label Space Dimension Reduction for Multi-label Classification , 2012, NIPS.

[10]  Sebastián Ventura,et al.  Multi‐label learning: a review of the state of the art and ongoing research , 2014, WIREs Data Mining Knowl. Discov..

[11]  Pradeep Ravikumar,et al.  PD-Sparse : A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification , 2016, ICML.

[12]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[13]  Ashish Kapoor,et al.  Multilabel Classification using Bayesian Compressed Sensing , 2012, NIPS.

[14]  Prateek Jain,et al.  Sparse Local Embeddings for Extreme Multi-label Classification , 2015, NIPS.

[15]  Ole Winther,et al.  Indexable Probabilistic Matrix Factorization for Maximum Inner Product Search , 2016, AAAI.

[16]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[17]  Manik Varma,et al.  Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages , 2013, WWW.

[18]  Yifan Hu,et al.  Collaborative Filtering for Implicit Feedback Datasets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[19]  David M. Blei,et al.  Modeling User Exposure in Recommendation , 2015, WWW.

[20]  Rong Jin,et al.  Multi-label learning with incomplete class assignments , 2011, CVPR 2011.

[21]  Chih-Jen Lin,et al.  A Unified Algorithm for One-Cass Structured Matrix Factorization with Side Information , 2017, AAAI.

[22]  Wei Xu,et al.  CNN-RNN: A Unified Framework for Multi-label Image Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[24]  O. Cappé,et al.  On‐line expectation–maximization algorithm for latent data models , 2009 .

[25]  Inderjit S. Dhillon,et al.  Large-scale Multi-label Learning with Missing Labels , 2013, ICML.

[26]  Bernhard Schölkopf,et al.  DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification , 2016, WSDM.

[27]  James G. Scott,et al.  Bayesian Inference for Logistic Models Using Pólya–Gamma Latent Variables , 2012, 1205.0310.

[28]  Sebastián Ventura,et al.  A Tutorial on Multilabel Learning , 2015, ACM Comput. Surv..