Correlated multi-label feature selection

Multi-label learning studies the problem where each instance is associated with a set of labels. There are two challenges in multi-label learning: (1) the labels are interdependent and correlated, and (2) the data are of high dimensionality. In this paper, we aim to tackle these challenges in one shot. In particular, we propose to learn the label correlation and do feature selection simultaneously. We introduce a matrix-variate Normal prior distribution on the weight vectors of the classifier to model the label correlation. Our goal is to find a subset of features, based on which the label correlation regularized loss of label ranking is minimized. The resulting multi-label feature selection problem is a mixed integer programming, which is reformulated as quadratically constrained linear programming (QCLP). It can be solved by cutting plane algorithm, in each iteration of which a minimax optimization problem is solved by dual coordinate descent and projected sub-gradient descent alternatively. Experiments on benchmark data sets illustrate that the proposed methods outperform single-label feature selection method and many other state-of-the-art multi-label learning methods.

[1]  Lihi Zelnik-Manor,et al.  Large Scale Max-Margin Multi-Label Classification with Priors , 2010, ICML.

[2]  Jiawei Han,et al.  Generalized Fisher Score for Feature Selection , 2011, UAI.

[3]  David G. Stork,et al.  Pattern Classification , 1973 .

[4]  Yiming Yang,et al.  High-performing feature selection for text classification , 2002, CIKM '02.

[5]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[6]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[7]  Jieping Ye,et al.  Hypergraph spectral learning for multi-label classification , 2008, KDD.

[8]  Quanquan Gu,et al.  Subspace maximum margin clustering , 2009, CIKM.

[9]  Volker Tresp,et al.  Multi-label informed latent semantic indexing , 2005, SIGIR '05.

[10]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[11]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[12]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[13]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[14]  Jieping Ye,et al.  Extracting shared subspace for multi-label classification , 2008, KDD.

[15]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[16]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[17]  Naonori Ueda,et al.  Parametric Mixture Models for Multi-Labeled Text , 2002, NIPS.

[18]  J. E. Kelley,et al.  The Cutting-Plane Method for Solving Convex Programs , 1960 .

[19]  Gang Chen,et al.  Efficient multi-label classification with hypergraph regularization , 2009, CVPR.

[20]  Jieping Ye,et al.  Training SVM with indefinite kernels , 2008, ICML '08.

[21]  Jason Weston,et al.  A kernel method for multi-labelled classification , 2001, NIPS.

[22]  Zenglin Xu,et al.  An Extended Level Method for Efficient Multiple Kernel Learning , 2008, NIPS.

[23]  Yoram Singer,et al.  Log-Linear Models for Label Ranking , 2003, NIPS.

[24]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[25]  Dit-Yan Yeung,et al.  A Convex Formulation for Learning Task Relationships in Multi-Task Learning , 2010, UAI.

[26]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[27]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[28]  Kun Zhang,et al.  Multi-label learning by exploiting label dependency , 2010, KDD.

[29]  Dit-Yan Yeung,et al.  Transfer metric learning by learning task relationships , 2010, KDD.

[30]  Chris H. Q. Ding,et al.  Multi-label Linear Discriminant Analysis , 2010, ECCV.

[31]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[32]  Gang Chen,et al.  Efficient multi-label classification with hypergraph regularization , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Yves Grandvalet,et al.  More efficiency in multiple kernel learning , 2007, ICML '07.

[34]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[35]  Zhi-Hua Zhou,et al.  Multilabel dimensionality reduction via dependence maximization , 2008, TKDD.

[36]  Ivor W. Tsang,et al.  Learning Sparse SVM for Feature Selection on Very High Dimensional Datasets , 2010, ICML.

[37]  Ivor W. Tsang,et al.  Tighter and Convex Maximum Margin Clustering , 2009, AISTATS.

[38]  Stephen P. Boyd,et al.  A minimax theorem with applications to machine learning, signal processing, and finance , 2007, CDC.

[39]  A. Rukhin Matrix Variate Distributions , 1999, The Multivariate Normal Distribution.

[40]  Gang Chen,et al.  Semi-supervised Multi-label Learning by Solving a Sylvester Equation , 2008, SDM.