Simultaneous Semi-Coupled Dictionary Learning for Matching in Canonical Space

Cross-modal recognition and matching with privileged information are important challenging problems in the field of computer vision. The cross-modal scenario deals with matching across different modalities and needs to take care of the large variations present across and within each modality. The privileged information scenario deals with the situation that all the information available during training may not be available during the testing stage, and hence, algorithms need to leverage the extra information from the training stage itself. We show that for multi-modal data, either one of the above situations may arise if one modality is absent during testing. Here, we propose a novel framework, which can handle both these scenarios seamlessly with applications to matching multi-modal data. The proposed approach jointly uses data from the two modalities to build a canonical representation, which encompasses information from both the modalities. We explore four different types of canonical representations for different types of data. The algorithm computes dictionaries and canonical representation for data from both the modalities, such that the transformed sparse coefficients of both the modalities are equal to that of the canonical representation. The sparse coefficients are finally matched using Mahalanobis metric. Extensive experiments on different data sets, involving RGBD, text-image, and audio-image data, show the effectiveness of the proposed framework.

[1]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[2]  Licheng Jiao,et al.  Fusion Similarity-Based Reranking for SAR Image Retrieval , 2017, IEEE Geoscience and Remote Sensing Letters.

[3]  Yu-Chiang Frank Wang,et al.  Coupled Dictionary and Feature Space Learning with Applications to Cross-Domain Image Synthesis and Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[4]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[5]  Quan Pan,et al.  Semi-coupled dictionary learning with applications to image super-resolution and photo-sketch synthesis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  David J. Kriegman,et al.  Nine points of light: acquiring subspaces for face recognition under variable lighting , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[7]  Peter Tiño,et al.  Incorporating Privileged Information Through Metric Learning , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[8]  Yu-Chiang Frank Wang,et al.  Heterogeneous Domain Adaptation and Classification by Exploiting the Correlation Subspace , 2014, IEEE Transactions on Image Processing.

[9]  Jean-Luc Dugelay,et al.  KinectFaceDB: A Kinect Database for Face Recognition , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[10]  Alberto Del Bimbo,et al.  A Set of Selected SIFT Features for 3D Facial Expression Recognition , 2010, 2010 20th International Conference on Pattern Recognition.

[11]  Vladimir Vapnik,et al.  A new learning paradigm: Learning using privileged information , 2009, Neural Networks.

[12]  Yueting Zhuang,et al.  Multi-modal Mutual Topic Reinforce Modeling for Cross-media Retrieval , 2014, ACM Multimedia.

[13]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[14]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[15]  Guisheng Liao,et al.  A novel extreme learning machine using privileged information , 2015, Neurocomputing.

[16]  Roger Levy,et al.  On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  David Zhang,et al.  Fisher Discrimination Dictionary Learning for sparse representation , 2011, 2011 International Conference on Computer Vision.

[18]  Dieter Fox,et al.  Depth kernel descriptors for object recognition , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[19]  Beng Chin Ooi,et al.  Effective deep learning-based multi-modal retrieval , 2015, The VLDB Journal.

[20]  Ivor W. Tsang,et al.  This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 1 Soft Margin Multiple Kernel Learning , 2022 .

[21]  Devraj Mandal,et al.  Simultaneous Semi-Coupled Dictionary Learning for Matching RGBD Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[22]  Zhanghui Kuang,et al.  Relatively-Paired Space Analysis: Learning a Latent Common Space From Relatively-Paired Observations , 2015, International Journal of Computer Vision.

[23]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[24]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[25]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[26]  Rama Chellappa,et al.  Robust Estimation of Albedo for Illumination-invariant Matching and Shape Recovery , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[27]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[28]  Jeff A. Bilmes,et al.  Unsupervised learning of acoustic features via deep canonical correlation analysis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Nikhil Rasiwasia,et al.  Cluster Canonical Correlation Analysis , 2014, AISTATS.

[30]  Devraj Mandal,et al.  Generalized Coupled Dictionary Learning Approach With Applications to Cross-Modal Matching , 2016, IEEE Transactions on Image Processing.

[31]  Alan C. Bovik,et al.  Texas 3D Face Recognition Database , 2010, 2010 IEEE Southwest Symposium on Image Analysis & Interpretation (SSIAI).

[32]  Dacheng Tao,et al.  Relative Attribute SVM+ Learning for Age Estimation , 2016, IEEE Transactions on Cybernetics.

[33]  Yueting Zhuang,et al.  Supervised Coupled Dictionary Learning with Group Structures for Multi-modal Retrieval , 2013, AAAI.

[34]  Christoph H. Lampert,et al.  Weakly-Paired Maximum Covariance Analysis for Multimodal Dimensionality Reduction and Transfer Learning , 2010, ECCV.

[35]  Ivor W. Tsang,et al.  Simple and Efficient Learning using Privileged Information , 2016, ArXiv.

[36]  Xinbo Gao,et al.  Discriminative Latent Feature Space Learning for Cross-Modal Retrieval , 2015, ICMR.

[37]  Horst Bischof,et al.  Large scale metric learning from equivalence constraints , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Dong Xu,et al.  Distance Metric Learning Using Privileged Information for Face Verification and Person Re-Identification , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[39]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[40]  David W. Jacobs,et al.  Generalized Multiview Analysis: A discriminative latent space , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Christoph H. Lampert,et al.  Learning to Rank Using Privileged Information , 2013, 2013 IEEE International Conference on Computer Vision.

[42]  Chong-Wah Ngo,et al.  Circular Reranking for Visual Search , 2013, IEEE Transactions on Image Processing.

[43]  Qiang Ji,et al.  Classifier learning with hidden information , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[45]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[46]  Rauf Izmailov,et al.  Learning using privileged information: similarity control and knowledge transfer , 2015, J. Mach. Learn. Res..