Streaming Label Learning for Modeling Labels on the Fly

It is challenging to handle a large volume of labels in multi-label learning. However, existing approaches explicitly or implicitly assume that all the labels in the learning process are given, which could be easily violated in changing environments. In this paper, we define and study streaming label learning (SLL), i.e., labels are arrived on the fly, to model newly arrived labels with the help of the knowledge learned from past labels. The core of SLL is to explore and exploit the relationships between new labels and past labels and then inherit the relationship into hypotheses of labels to boost the performance of new classifiers. In specific, we use the label self-representation to model the label relationship, and SLL will be divided into two steps: a regression problem and a empirical risk minimization (ERM) problem. Both problems are simple and can be efficiently solved. We further show that SLL can generate a tighter generalization error bound for new labels than the general ERM framework with trace norm or Frobenius norm regularization. Finally, we implement extensive experiments on various benchmark datasets to validate the new setting. And results show that SLL can effectively handle the constantly emerging new labels and provides excellent classification performance.

[1]  Hsuan-Tien Lin,et al.  Feature-aware Label Space Dimension Reduction for Multi-label Classification , 2012, NIPS.

[2]  Tat-Seng Chua,et al.  Automatic image annotation via local multi-label classification , 2008, CIVR '08.

[3]  Alexandre Bernardino,et al.  Matrix Completion for Multi-label Image Classification , 2011, NIPS.

[4]  Hsuan-Tien Lin,et al.  Multilabel Classification with Principal Label Space Transformation , 2012, Neural Computation.

[5]  Nicolò Cesa-Bianchi,et al.  Synergy of multi-label hierarchical ensembles, data fusion, and cost-sensitive methods for gene functional inference , 2012, Machine Learning.

[6]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[7]  Emmanuel J. Candès,et al.  Templates for convex cone problems with applications to sparse signal recovery , 2010, Math. Program. Comput..

[8]  John Langford,et al.  Multi-Label Prediction via Compressed Sensing , 2009, NIPS.

[9]  Jeff G. Schneider,et al.  Multi-Label Output Codes using Canonical Correlation Analysis , 2011, AISTATS.

[10]  Stephen P. Boyd,et al.  Graph Implementations for Nonsmooth Convex Programs , 2008, Recent Advances in Learning and Control.

[11]  Haojie Li,et al.  Multi-Label Image Categorization With Sparse Factor Representation , 2014, IEEE Transactions on Image Processing.

[12]  Moustapha Cissé,et al.  Robust Bloom Filters for Large MultiLabel Classification Tasks , 2013, NIPS.

[13]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[14]  Zheng Chen,et al.  Effective multi-label active learning for text classification , 2009, KDD.

[15]  Mohan S. Kankanhalli,et al.  Proceedings of the 2008 international conference on Content-based image and video retrieval , 2008 .

[16]  James T. Kwok,et al.  Efficient Multi-label Classification with Many Labels , 2013, ICML.

[17]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[18]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[19]  Krishnakumar Balasubramanian,et al.  The Landmark Selection Method for Multiple Output Prediction , 2012, ICML.

[20]  Eyke Hüllermeier,et al.  Bayes Optimal Multilabel Classification via Probabilistic Classifier Chains , 2010, ICML.

[21]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[22]  Prateek Jain,et al.  Sparse Local Embeddings for Extreme Multi-label Classification , 2015, NIPS.

[23]  James Theiler,et al.  Online Feature Selection using Grafting , 2003, ICML.

[24]  Lihi Zelnik-Manor,et al.  Large Scale Max-Margin Multi-Label Classification with Priors , 2010, ICML.

[25]  Inderjit S. Dhillon,et al.  Large-scale Multi-label Learning with Missing Labels , 2013, ICML.

[26]  Robert E. Schapire,et al.  Hierarchical multi-label prediction of gene function , 2006, Bioinform..

[27]  Jihong Ouyang,et al.  Supervised topic models for multi-label classification , 2015, Neurocomputing.

[28]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..