Sparse Multigraph Embedding for Multimodal Feature Representation

Data fusion is used to integrate features from heterogenous data sources into a consistent and accurate representation for certain learning tasks. As an effective technique for data fusion, unsupervised multimodal feature representation aims to learn discriminative features, indicating the improvement of classification and clustering performance of learning algorithms. However, it is a challenging issue since varying modality favors different structural learning. In this paper, we propose an efficient feature learning method to represent multimodal images as a sparse multigraph structure embedding problem. First, an effective algorithm is proposed to learn a sparse multigraph construction from multimodal data, where each modality corresponds to one regularized graph structure. Second, incorporating the learned multigraph structure, the feature learning problem for multimodal images is formulated as a form of matrix factorization. An efficient corresponding algorithm is developed to optimize the problem and its convergence is also proved. Finally, the proposed method is compared with several state-of-the-art single-modal and multimodal feature learning techniques in eight publicly available face image datasets. Comprehensive experimental results demonstrate that the proposed method outperforms the existing ones in terms of clustering performance for all tested datasets.

[1]  R. Gray Entropy and Information Theory , 1990, Springer New York.

[2]  Dimitris K. Agrafiotis,et al.  Stochastic proximity embedding , 2003, J. Comput. Chem..

[3]  Joshua B. Tenenbaum,et al.  The Isomap Algorithm and Topological Stability , 2002, Science.

[4]  Xiaojun Wu,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Shuang Wu,et al.  Multimodal feature fusion for robust event detection in web videos , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  David J. Kriegman,et al.  Acquiring linear subspaces for face recognition under variable lighting , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[8]  Avinash C. Kak,et al.  PCA versus LDA , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Ioannis A. Kakadiaris,et al.  Feature fusion for facial landmark detection , 2014, Pattern Recognit..

[10]  Brian A. Baertlein,et al.  Feature-Level and Decision-Level Fusion of Noncoincidently Sampled Sensors for Land Mine Detection , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Leslie M. Collins,et al.  Model-based statistical sensor fusion for unexploded ordnance detection , 2002, IEEE International Geoscience and Remote Sensing Symposium.

[12]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[13]  Feiping Nie,et al.  Heterogeneous Visual Features Fusion via Sparse Multimodal Machine , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Yun Fu,et al.  Multiple feature fusion by subspace learning , 2008, CIVR '08.

[15]  Feiping Nie,et al.  Heterogeneous image feature integration via multi-modal spectral clustering , 2011, CVPR 2011.

[16]  Fuchun Sun,et al.  Unsupervised multimodal feature learning for semantic image segmentation , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[17]  Feiping Nie,et al.  Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Multi-View K-Means Clustering on Big Data , 2022 .

[18]  Hui Li,et al.  A multimodal framework for unsupervised feature fusion , 2013, CIKM.

[19]  John Wright,et al.  Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization , 2009, NIPS.

[20]  Xin Chen,et al.  Face Recognition Using 2-D, 3-D, and Infrared: Is Multimodal Better Than Multisample? , 2006, Proceedings of the IEEE.

[21]  Shutao Li,et al.  Image Fusion With Guided Filtering , 2013, IEEE Transactions on Image Processing.

[22]  Hanseok Ko,et al.  Joint patch clustering-based dictionary learning for multimodal image fusion , 2016, Inf. Fusion.

[23]  Witold Pedrycz,et al.  Subspace learning for unsupervised feature selection via matrix factorization , 2015, Pattern Recognit..

[24]  Mohammed Bennamoun,et al.  An Efficient Multimodal 2D-3D Hybrid Approach to Automatic Face Recognition , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Ioannis Pitas,et al.  Multimodal decision-level fusion for person authentication , 1999, IEEE Trans. Syst. Man Cybern. Part A.

[26]  Shuicheng Yan,et al.  Neighborhood preserving embedding , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[27]  Maja Pantic,et al.  Decision Level Fusion of Domain Specific Regions for Facial Action Recognition , 2014, 2014 22nd International Conference on Pattern Recognition.

[28]  Bing Li,et al.  Multimodal Web Aesthetics Assessment Based on Structural SVM and Multitask Fusion Learning , 2016, IEEE Transactions on Multimedia.

[29]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[30]  Feiping Nie,et al.  Clustering and projected clustering with adaptive neighbors , 2014, KDD.

[31]  Somnath Sengupta,et al.  Neighborhood Supported Model Level Fuzzy Aggregation for Moving Object Segmentation , 2014, IEEE Transactions on Image Processing.

[32]  Rainer Lienhart,et al.  Multimodal Image Retrieval , 2012, International Journal of Multimedia Information Retrieval.

[33]  Zi Huang,et al.  Multi-Feature Fusion via Hierarchical Regression for Multimedia Analysis , 2013, IEEE Transactions on Multimedia.

[34]  Feiping Nie,et al.  Optimal Mean Robust Principal Component Analysis , 2014, ICML.

[35]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[36]  Silvio Savarese,et al.  Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[37]  Nicu Sebe,et al.  Multimodal Personality Recognition in Collaborative Goal-Oriented Tasks , 2016, IEEE Transactions on Multimedia.

[38]  Jian Yang,et al.  KPCA plus LDA: a complete kernel Fisher discriminant framework for feature extraction and recognition , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Ioannis A. Kakadiaris,et al.  Multimodal face recognition: combination of geometry with physiological information , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[40]  William Zhu,et al.  Sparse Graph Embedding Unsupervised Feature Selection , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[41]  Donghui Wang,et al.  Integration of multi-feature fusion and dictionary learning for face recognition , 2013, Image Vis. Comput..