Heterogeneous Feature Selection With Multi-Modal Deep Neural Networks and Sparse Group LASSO

Heterogeneous feature representations are widely used in machine learning and pattern recognition, especially for multimedia analysis. The multi-modal, often also high- dimensional , features may contain redundant and irrelevant information that can deteriorate the performance of modeling in classification. It is a challenging problem to select the informative features for a given task from the redundant and heterogeneous feature groups. In this paper, we propose a novel framework to address this problem. This framework is composed of two modules, namely, multi-modal deep neural networks and feature selection with sparse group LASSO. Given diverse groups of discriminative features, the proposed technique first converts the multi-modal data into a unified representation with different branches of the multi-modal deep neural networks. Then, through solving a sparse group LASSO problem, the feature selection component is used to derive a weight vector to indicate the importance of the feature groups. Finally, the feature groups with large weights are considered more relevant and hence are selected. We evaluate our framework on three image classification datasets. Experimental results show that the proposed approach is effective in selecting the relevant feature groups and achieves competitive classification performance as compared with several recent baseline methods.

[1]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[2]  Qinghua Hu,et al.  Combining heterogeneous deep neural networks with conditional random fields for Chinese dialogue act recognition , 2015, Neurocomputing.

[3]  Xiaogang Wang,et al.  Multi-source Deep Learning for Human Pose Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Zi Huang,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence ℓ2,1-Norm Regularized Discriminative Feature Selection for Unsupervised Learning , 2022 .

[5]  Jitendra Malik,et al.  Shape matching and object recognition using low distortion correspondences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6]  Jean Ponce,et al.  A tensor-based algorithm for high-order graph matching , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Yueting Zhuang,et al.  Multi-Task Sparse Discriminant Analysis (MtSDA) with Overlapping Categories , 2010, AAAI.

[8]  Feiping Nie,et al.  Heterogeneous Visual Features Fusion via Sparse Multimodal Machine , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Nicu Sebe,et al.  Feature Selection for Multimedia Analysis by Sharing Information Among Multiple Tasks , 2013, IEEE Transactions on Multimedia.

[10]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[11]  Chunyan Miao,et al.  Online multimodal deep similarity learning with application to image retrieval , 2013, ACM Multimedia.

[12]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[13]  Jieping Ye,et al.  Multi-Task Feature Learning Via Efficient l2, 1-Norm Minimization , 2009, UAI.

[14]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[15]  R. Tibshirani,et al.  A note on the group lasso and a sparse group lasso , 2010, 1001.0736.

[16]  Qinghua Hu,et al.  Neighborhood rough set based heterogeneous feature subset selection , 2008, Inf. Sci..

[17]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[18]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[19]  Ji Zhu,et al.  Regularized Multivariate Regression for Identifying Master Predictors with Application to Integrative Genomics Study of Breast Cancer. , 2008, The annals of applied statistics.

[20]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[21]  Nicu Sebe,et al.  Exploiting the entire feature space with sparsity for automatic image annotation , 2011, ACM Multimedia.

[22]  Jiawei Han,et al.  Semi-supervised Discriminant Analysis , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[23]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[24]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[25]  Yves Grandvalet,et al.  More efficiency in multiple kernel learning , 2007, ICML '07.

[26]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[27]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Yueting Zhuang,et al.  Heterogeneous feature selection by group lasso with logistic regression , 2010, ACM Multimedia.

[29]  Huiping Sun,et al.  CQArank: jointly model topics and expertise in community question answering , 2013, CIKM.

[30]  Markus A. Stricker,et al.  Similarity of color images , 1995, Electronic Imaging.

[31]  Qi Tian,et al.  Multi-label boosting for image annotation by structural grouping sparsity , 2010, ACM Multimedia.

[32]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[34]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[35]  Thomas S. Huang,et al.  Web-Scale Multimedia Information Networks , 2012, Proceedings of the IEEE.

[36]  B. S. Manjunath,et al.  Texture Features for Browsing and Retrieval of Image Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Charu C. Aggarwal,et al.  Breaking the Barrier to Transferring Link Information across Networks , 2015, IEEE Transactions on Knowledge and Data Engineering.

[38]  Chee Sun Won,et al.  Efficient use of local edge histogram descriptor , 2000, MULTIMEDIA '00.

[39]  Jieping Ye,et al.  Moreau-Yosida Regularization for Grouped Tree Structure Learning , 2010, NIPS.

[40]  Zi Huang,et al.  Multi-Feature Fusion via Hierarchical Regression for Multimedia Analysis , 2013, IEEE Transactions on Multimedia.

[41]  Jing Huang,et al.  Image indexing using color correlograms , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[42]  Cordelia Schmid,et al.  Multimodal semi-supervised learning for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[43]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[44]  Andrew Zisserman,et al.  Multiple kernels for object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[45]  Yoshua Bengio,et al.  Exploring Strategies for Training Deep Neural Networks , 2009, J. Mach. Learn. Res..

[46]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Heng Ji,et al.  Exploring Context and Content Links in Social Media: A Latent Space Method , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  尾島 修一,et al.  IS & T/SPIE's Symposium on Electronic Imaging : Science & Technology , 1995 .

[49]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[50]  Rama Chellappa,et al.  Joint Sparse Representation for Robust Multimodal Biometrics Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Chiou-Shann Fuh,et al.  Multiple Kernel Learning for Dimensionality Reduction , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[53]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[54]  Chiou-Shann Fuh,et al.  Local Ensemble Kernel Learning for Object Category Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Jian Yang,et al.  Feature fusion: parallel strategy vs. serial strategy , 2003, Pattern Recognit..