Semi-Supervised Feature Selection via Insensitive Sparse Regression with Application to Video Semantic Recognition

Feature selection plays a significant role in dealing with high-dimensional data to avoid the curse of dimensionality. In many real applications, like video semantic recognition, handling few labeled and large unlabeled data samples from the same population is a recently addressed challenge in feature selection. To solve this problem, we propose a novel semi-supervised feature selection method via insensitive sparse regression (ISR). Specifically, we compute the soft label matrix by the special label propagation, which can predict the labels of the unlabeled data. To guarantee the robustness of ISR to the false labeled instances or outliers, we propose Insensitive Regression Model (IRM) by capped <inline-formula><tex-math notation="LaTeX">$l_2$</tex-math><alternatives> <inline-graphic xlink:href="hou-ieq1-2810286.gif"/></alternatives></inline-formula>-<inline-formula> <tex-math notation="LaTeX">$l_p$</tex-math><alternatives> <inline-graphic xlink:href="hou-ieq2-2810286.gif"/></alternatives></inline-formula>-norm loss. The soft label is imposed as the weights of IRM to fully utilize the label information. Meanwhile, to perform feature selection, we incorporate <inline-formula><tex-math notation="LaTeX"> $l_{2,q}$</tex-math><alternatives> <inline-graphic xlink:href="hou-ieq3-2810286.gif"/></alternatives></inline-formula> -norm regularizer with IRM as the structural sparsity constraint when <inline-formula><tex-math notation="LaTeX"> $0<q\leq 1$</tex-math><alternatives> <inline-graphic xlink:href="hou-ieq4-2810286.gif"/></alternatives> </inline-formula>. Moreover, we put forward an effective approach for solving the formulated non-convex optimization problem. We analyze the performance of convergence rigorously and discuss the parameter determination problem. Extensive experimental results on several public data sets verify the effectiveness of our proposed algorithm in comparison with the state-of-art feature selection methods. Finally, we apply our method to video semantic recognition successfully.

[1]  Huan Liu,et al.  Semi-supervised Feature Selection via Spectral Analysis , 2007, SDM.

[2]  Marco Cristani,et al.  Infinite Feature Selection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Shuicheng Yan,et al.  Neighborhood preserving embedding , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[4]  J. Borwein,et al.  Convex Analysis And Nonlinear Optimization , 2000 .

[5]  Nicu Sebe,et al.  Web Image Annotation Via Subspace-Sparsity Collaborated Feature Selection , 2012, IEEE Transactions on Multimedia.

[6]  Zi Huang,et al.  Self-taught dimensionality reduction on the high-dimensional small-sized data , 2013, Pattern Recognit..

[7]  Gilles Blanchard,et al.  On the convergence rate of lp-norm multiple kernel learning , 2012, J. Mach. Learn. Res..

[8]  Simone Melzi,et al.  Ranking to Learn: - Feature Ranking and Selection via Eigenvector Centrality , 2016, NFMCP@PKDD/ECML.

[9]  Shih-Fu Chang,et al.  Consumer video understanding: a benchmark database and an evaluation of human and machine performance , 2011, ICMR.

[10]  B. Moore Principal component analysis in linear systems: Controllability, observability, and model reduction , 1981 .

[11]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[12]  Xuelong Li,et al.  Joint Embedding Learning and Sparse Regression: A Framework for Unsupervised Feature Selection , 2014, IEEE Transactions on Cybernetics.

[13]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[14]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[15]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[16]  Yi Yang,et al.  A Convex Formulation for Semi-Supervised Multi-Label Feature Selection , 2014, AAAI.

[17]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[18]  Francis R. Bach,et al.  Structured Variable Selection with Sparsity-Inducing Norms , 2009, J. Mach. Learn. Res..

[19]  Shichao Zhang,et al.  Robust Joint Graph Sparse Coding for Unsupervised Spectral Feature Selection , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[20]  Carla E. Brodley,et al.  Unsupervised Feature Selection Applied to Content-Based Retrieval of Lung Images , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Jane Labadin,et al.  Feature selection based on mutual information , 2015, 2015 9th International Conference on IT in Asia (CITA).

[22]  Zenglin Xu,et al.  Discriminative Semi-Supervised Feature Selection Via Manifold Regularization , 2009, IEEE Transactions on Neural Networks.

[23]  Yi Yang,et al.  Semisupervised Feature Selection via Spline Regression for Video Semantic Recognition , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[24]  Rick Chartrand,et al.  Exact Reconstruction of Sparse Signals via Nonconvex Minimization , 2007, IEEE Signal Processing Letters.

[25]  Feiping Nie,et al.  Optimal Mean Robust Principal Component Analysis , 2014, ICML.

[26]  M. Kloft,et al.  Efficient and Accurate ` p-Norm Multiple Kernel Learning , 2009 .

[27]  Feiping Nie,et al.  Semi-supervised orthogonal discriminant analysis via label propagation , 2009, Pattern Recognit..

[28]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[29]  Jieping Ye,et al.  Multi-stage multi-task feature learning , 2012, J. Mach. Learn. Res..

[30]  Feiping Nie,et al.  Efficient semi-supervised feature selection with noise insensitive trace ratio criterion , 2013, Neurocomputing.

[31]  Xuelong Li,et al.  Large Graph Hashing with Spectral Rotation , 2017, AAAI.

[32]  Xuelong Li,et al.  Block-Row Sparse Multiview Multilabel Learning for Image Classification , 2016, IEEE Transactions on Cybernetics.

[33]  Jidong Zhao,et al.  Locality sensitive semi-supervised feature selection , 2008, Neurocomputing.

[34]  Feiping Nie,et al.  A general graph-based semi-supervised learning with novel class discovery , 2010, Neural Computing and Applications.

[35]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[36]  Yi Yang,et al.  Action recognition by exploring data distribution and feature correlation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Alexander Zien,et al.  lp-Norm Multiple Kernel Learning , 2011, J. Mach. Learn. Res..

[38]  Xuelong Li,et al.  Multi-View Clustering and Semi-Supervised Classification with Adaptive Neighbours , 2017, AAAI.

[39]  Yang Liu,et al.  Locally linear embedding: a survey , 2011, Artificial Intelligence Review.

[40]  Nicu Sebe,et al.  Feature Selection for Multimedia Analysis by Sharing Information Among Multiple Tasks , 2013, IEEE Transactions on Multimedia.

[41]  Nicu Sebe,et al.  Discriminating Joint Feature Analysis for Multimedia Data Understanding , 2012, IEEE Transactions on Multimedia.

[42]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[43]  Giorgio Roffo,et al.  Ranking to Learn and Learning to Rank: On the Role of Ranking in Pattern Recognition Applications , 2017, ArXiv.

[44]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[45]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.

[46]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.