MRM-Lasso: A Sparse Multiview Feature Selection Method via Low-Rank Analysis

Learning about multiview data involves many applications, such as video understanding, image classification, and social media. However, when the data dimension increases dramatically, it is important but very challenging to remove redundant features in multiview feature selection. In this paper, we propose a novel feature selection algorithm, multiview rank minimization-based Lasso (MRM-Lasso), which jointly utilizes Lasso for sparse feature selection and rank minimization for learning relevant patterns across views. Instead of simply integrating multiple Lasso from view level, we focus on the performance of sample-level (sample significance) and introduce pattern-specific weights into MRM-Lasso. The weights are utilized to measure the contribution of each sample to the labels in the current view. In addition, the latent correlation across different views is successfully captured by learning a low-rank matrix consisting of pattern-specific weights. The alternating direction method of multipliers is applied to optimize the proposed MRM-Lasso. Experiments on four real-life data sets show that features selected by MRM-Lasso have better multiview classification performance than the baselines. Moreover, pattern-specific weights are demonstrated to be significant for learning about multiview data, compared with view-specific weights.

[1]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[2]  Tom Diethe,et al.  Constructing Nonlinear Discriminants from Multiple Data Views , 2010, ECML/PKDD.

[3]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[4]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[5]  Rama Chellappa,et al.  Joint Sparse Representation for Robust Multimodal Biometrics Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[7]  Yinghuan Shi,et al.  Prostate Segmentation in CT Images via Spatial-Constrained Transductive Lasso , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Yong Yu,et al.  Robust Recovery of Subspace Structures by Low-Rank Representation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[10]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[11]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[12]  Dirk Van,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[13]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Huan Liu,et al.  Unsupervised Feature Selection for Multi-View Data in Social Media , 2013, SDM.

[15]  Johan A. K. Suykens,et al.  L2-norm multiple kernel learning and its application to biomedical data fusion , 2010, BMC Bioinformatics.

[16]  Yueting Zhuang,et al.  Adaptive Unsupervised Multi-view Feature Selection for Visual Concept Recognition , 2012, ACCV.

[17]  Ivor W. Tsang,et al.  Minimax Sparse Logistic Regression for Very High-Dimensional Feature Selection , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Feiping Nie,et al.  Multi-View Clustering and Feature Learning via Structured Sparsity , 2013, ICML.

[19]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[20]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[21]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[22]  Dean P. Foster Multi-View Dimensionality Reduction via Canonical Correlation Multi-View Dimensionality Reduction via Canonical Correlation Analysis Analysis Multi-View Dimensionality Reduction via Canonical Correlation Analysis Multi-View Dimensionality Reduction via Canonical Correlation Analysis Multi-View Dimen , 2008 .

[23]  Wen Gao,et al.  Multiview Metric Learning with Global Consistency and Local Smoothness , 2012, TIST.

[24]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[25]  Sham M. Kakade,et al.  Multi-view clustering via canonical correlation analysis , 2009, ICML '09.

[26]  Jian Yang,et al.  Robust sparse coding for face recognition , 2011, CVPR 2011.

[27]  Fuchun Sun,et al.  Large-Margin Predictive Latent Subspace Learning for Multiview Data Analysis , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Daoqiang Zhang,et al.  Multi-view dimensionality reduction via canonical random correlation analysis , 2015, Frontiers of Computer Science.

[29]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[30]  Steven C. H. Hoi,et al.  Multiview Semi-Supervised Learning with Consensus , 2012, IEEE Transactions on Knowledge and Data Engineering.

[31]  Martha White,et al.  Convex Multi-view Subspace Learning , 2012, NIPS.

[32]  Chunheng Wang,et al.  Sparse representation for face recognition based on discriminative low-rank dictionary learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Feiping Nie,et al.  Multi-Subspace Representation and Discovery , 2011, ECML/PKDD.

[34]  Qionghai Dai,et al.  Low-Rank Structure Learning via Nonconvex Heuristic Recovery , 2010, IEEE Transactions on Neural Networks and Learning Systems.

[35]  Bart De Moor,et al.  Multiview Partitioning via Tensor Methods , 2013, IEEE Transactions on Knowledge and Data Engineering.

[36]  Dong Liu,et al.  Robust late fusion with rank minimization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Christoph H. Lampert,et al.  Learning Multi-View Neighborhood Preserving Projections , 2011, ICML.

[38]  Michael I. Jordan,et al.  Multi-task feature selection , 2006 .

[39]  Tong Zhang,et al.  Two-view feature generation model for semi-supervised learning , 2007, ICML '07.

[40]  Gaël Richard,et al.  Multiclass Feature Selection With Kernel Gram-Matrix-Based Criteria , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[41]  Dong Liu,et al.  Robust visual domain adaptation with low-rank reconstruction , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Songcan Chen,et al.  MultiK-MHKS: A Novel Multiple Kernel Learning Algorithm , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Aurelie C. Lozano,et al.  Multi-level Lasso for Sparse Multi-task Regression , 2012, ICML.

[44]  John Wright,et al.  Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization , 2009, NIPS.

[45]  Jiawei Han,et al.  Generalized Fisher Score for Feature Selection , 2011, UAI.

[46]  R. Tibshirani,et al.  A note on the group lasso and a sparse group lasso , 2010, 1001.0736.

[47]  Ishwar K. Sethi,et al.  Multimedia content processing through cross-modal association , 2003, MULTIMEDIA '03.

[48]  John Shawe-Taylor,et al.  Two view learning: SVM-2K, Theory and Practice , 2005, NIPS.

[49]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[50]  Amr Ahmed,et al.  Recovering time-varying networks of dependencies in social and biological studies , 2009, Proceedings of the National Academy of Sciences.