Exploiting Multilabel Information for Noise-Resilient Feature Selection

In a conventional supervised learning paradigm, each data instance is associated with one single class label. Multilabel learning differs in the way that data instances may belong to multiple concepts simultaneously, which naturally appear in a variety of high impact domains, ranging from bioinformatics and information retrieval to multimedia analysis. It targets leveraging the multiple label information of data instances to build a predictive learning model that can classify unlabeled instances into one or multiple predefined target classes. In multilabel learning, even though each instance is associated with a rich set of class labels, the label information could be noisy and incomplete as the labeling process is both time consuming and labor expensive, leading to potential missing annotations or even erroneous annotations. The existence of noisy and missing labels could negatively affect the performance of underlying learning algorithms. More often than not, multilabeled data often has noisy, irrelevant, and redundant features of high dimensionality. The existence of these uninformative features may also deteriorate the predictive power of the learning model due to the curse of dimensionality. Feature selection, as an effective dimensionality reduction technique, has shown to be powerful in preparing high-dimensional data for numerous data mining and machine-learning tasks. However, a vast majority of existing multilabel feature selection algorithms either boil down to solving multiple single-labeled feature selection problems or directly make use of the imperfect labels to guide the selection of representative features. As a result, they may not be able to obtain discriminative features shared across multiple labels. In this article, to bridge the gap between a rich source of multilabel information and its blemish in practical usage, we propose a novel noise-resilient multilabel informed feature selection framework (MIFS) by exploiting the correlations among different labels. In particular, to reduce the negative effects of imperfect label information in obtaining label correlations, we decompose the multilabel information of data instances into a low-dimensional space and then employ the reduced label representation to guide the feature selection phase via a joint sparse regression framework. Empirical studies on both synthetic and real-world datasets demonstrate the effectiveness and efficiency of the proposed MIFS framework.

[1]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[2]  Zhi-Hua Zhou,et al.  Multi-Label Learning with Weak Label , 2010, AAAI.

[3]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[4]  Yi Yang,et al.  A Convex Formulation for Semi-Supervised Multi-Label Feature Selection , 2014, AAAI.

[5]  Huan Liu,et al.  Feature selection for classification: A review , 2014 .

[6]  Zhi-Hua Zhou,et al.  Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization , 2006, IEEE Transactions on Knowledge and Data Engineering.

[7]  Jieping Ye,et al.  Multi-Task Feature Learning Via Efficient l2, 1-Norm Minimization , 2009, UAI.

[8]  C. Bauckhage,et al.  Analyzing Social Bookmarking Systems : A del . icio . us Cookbook , 2008 .

[9]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[10]  Michel Verleysen,et al.  Feature Selection for Multi-label Classification Problems , 2011, IWANN.

[11]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2007 .

[12]  Zhi-Hua Zhou,et al.  Multilabel dimensionality reduction via dependence maximization , 2008, TKDD.

[13]  Zhi-Hua Zhou,et al.  Multi-Label Learning by Exploiting Label Correlations Locally , 2012, AAAI.

[14]  Jing Liu,et al.  Unsupervised Feature Selection Using Nonnegative Spectral Analysis , 2012, AAAI.

[15]  Rong Jin,et al.  Multi-label learning with incomplete class assignments , 2011, CVPR 2011.

[16]  Huan Liu,et al.  Robust Unsupervised Feature Selection on Networked Data , 2016, SDM.

[17]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.

[18]  Alexandre Bernardino,et al.  Matrix Completion for Multi-label Image Classification , 2011, NIPS.

[19]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[20]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[22]  Huan Liu,et al.  Multi-Label Informed Feature Selection , 2016, IJCAI.

[23]  Yang Song,et al.  A sparse gaussian processes classification framework for fast tag suggestions , 2008, CIKM '08.

[24]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[25]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[26]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[27]  S. Dumais Latent Semantic Analysis. , 2005 .

[28]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[29]  Huan Liu,et al.  Reconstruction-based Unsupervised Feature Selection: An Embedded Approach , 2017, IJCAI.

[30]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[31]  Kewei Cheng,et al.  Feature Selection , 2016, ACM Comput. Surv..

[32]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[33]  Qiang Ji,et al.  Multi-label learning with missing labels for image annotation and facial action unit recognition , 2015, Pattern Recognit..

[34]  Jiawei Han,et al.  Correlated multi-label feature selection , 2011, CIKM '11.

[35]  Lei Tang,et al.  Large scale multi-label classification via metalabeler , 2009, WWW '09.

[36]  Yong Luo,et al.  Multiview Matrix Completion for Multilabel Image Classification , 2015, IEEE Transactions on Image Processing.

[37]  Gavin Brown,et al.  Information Theoretic Feature Selection in Multi-label Data through Composite Likelihood , 2014, S+SSPR.

[38]  David G. Stork,et al.  Pattern Classification , 1973 .

[39]  O. J. Dunn Multiple Comparisons among Means , 1961 .

[40]  Sebastián Ventura,et al.  A Tutorial on Multilabel Learning , 2015, ACM Comput. Surv..

[41]  Xindong Wu,et al.  Learning Label Specific Features for Multi-label Classification , 2015, 2015 IEEE International Conference on Data Mining.

[42]  R. Manmatha,et al.  Automatic Image Annotation and Retrieval using CrossMedia Relevance Models , 2003 .

[43]  Xian-Sheng Hua,et al.  Online multi-label active annotation: towards large-scale content-based video search , 2008, ACM Multimedia.

[44]  Yi Zhang,et al.  Average Precision , 2009, Encyclopedia of Database Systems.

[45]  Huan Liu,et al.  Book review: Machine Learning, Neural and Statistical Classification Edited by D. Michie, D.J. Spiegelhalter and C.C. Taylor (Ellis Horwood Limited, 1994) , 1996, SGAR.

[46]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[47]  Víctor Robles,et al.  Feature selection for multi-label naive Bayes classification , 2009, Inf. Sci..

[48]  Huan Liu,et al.  Feature Selection for Clustering: A Review , 2018, Data Clustering: Algorithms and Applications.

[49]  Mohamed S. Kamel,et al.  An Efficient Greedy Method for Unsupervised Feature Selection , 2011, 2011 IEEE 11th International Conference on Data Mining.

[50]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[51]  Yiming Yang,et al.  A study of thresholding strategies for text categorization , 2001, SIGIR '01.

[52]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[53]  Chih-Jen Lin,et al.  Projected Gradient Methods for Nonnegative Matrix Factorization , 2007, Neural Computation.

[54]  Min-Ling Zhang,et al.  Lift: Multi-Label Learning with Label-Specific Features , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Jieping Ye,et al.  Extracting shared subspace for multi-label classification , 2008, KDD.

[56]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[57]  Volker Tresp,et al.  Multi-label informed latent semantic indexing , 2005, SIGIR '05.

[58]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[59]  Huan Liu,et al.  Feature Selection for Classification: A Review , 2014, Data Classification: Algorithms and Applications.

[60]  Jason Weston,et al.  A kernel method for multi-labelled classification , 2001, NIPS.

[61]  Ashish Kapoor,et al.  Multilabel Classification using Bayesian Compressed Sensing , 2012, NIPS.

[62]  Grigorios Tsoumakas,et al.  Multilabel Text Classification for Automated Tag Suggestion , 2008 .

[63]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[64]  Miao Xu,et al.  Speedup Matrix Completion with Side Information: Application to Multi-Label Learning , 2013, NIPS.

[65]  Yiming Yang,et al.  Multilabel classification with meta-level features , 2010, SIGIR.

[66]  Nicu Sebe,et al.  Web Image Annotation Via Subspace-Sparsity Collaborated Feature Selection , 2012, IEEE Transactions on Multimedia.