Weakly-paired deep dictionary learning for cross-modal retrieval

Abstract Many multi-modal data suffers from significant weak-pairing characteristics, i.e., there is no sample-to-sample correspondence between modalities, rather classes of samples in one modality correspond to classes of samples in the other modality. This provides great challenges for the cross-modal learning for retrieval. In this work, our focus is learning cross-modal representations with minimal class label supervision and without correspondences between samples. To tackle this challenging problem, we establish a scalable hierarchical learning architecture to deal with the extensive weakly-paired heterogeneous multi-modal data. A shared classifier across different modalities is used to effectively deal with label supervision information, and a multi-modal low-rank model is introduced to encourage the modal-invariant representation. Finally, some cross-modal validations on publicly available datasets are performed to show the advantages of the proposed method.

[1]  Tieniu Tan,et al.  Group-Invariant Cross-Modal Subspace Learning , 2016, IJCAI.

[2]  Karthikeyan Natesan Ramamurthy,et al.  Multiple Kernel Sparse Representations for Supervised and Unsupervised Learning , 2013, IEEE Transactions on Image Processing.

[3]  Yun Fu,et al.  Deep Low-rank Sparse Collective Factorization for Cross-Domain Recommendation , 2017, ACM Multimedia.

[4]  Nicu Sebe,et al.  The Many Shades of Negativity , 2017, IEEE Transactions on Multimedia.

[5]  Mayank Vatsa,et al.  Deep Dictionary Learning , 2016, IEEE Access.

[6]  Qingming Huang,et al.  Multi-level Discriminative Dictionary Learning towards Hierarchical Visual Categorization , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Xiaojun Chang,et al.  Feature Interaction Augmented Sparse Learning for Fast Kinect Motion Detection , 2017, IEEE Transactions on Image Processing.

[8]  Heng Tao Shen,et al.  Semi-Paired Discrete Hashing: Learning Latent Hash Codes for Semi-Paired Cross-View Retrieval , 2017, IEEE Transactions on Cybernetics.

[9]  Yueting Zhuang,et al.  Supervised Coupled Dictionary Learning with Group Structures for Multi-modal Retrieval , 2013, AAAI.

[10]  Qi Tian,et al.  Adaptively Unified Semi-supervised Learning for Cross-Modal Retrieval , 2017, IJCAI.

[11]  Nicu Sebe,et al.  Joint Attributes and Event Analysis for Multimedia Event Detection , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[12]  Asok Ray,et al.  Multimodal Task-Driven Dictionary Learning for Image Classification , 2015, IEEE Transactions on Image Processing.

[13]  George Trigeorgis,et al.  A Deep Semi-NMF Model for Learning Hidden Representations , 2014, ICML.

[14]  Qinghua Zheng,et al.  Adaptive Unsupervised Feature Selection With Structure Regularization , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[15]  Christoph H. Lampert,et al.  Weakly-Paired Maximum Covariance Analysis for Multimodal Dimensionality Reduction and Transfer Learning , 2010, ECCV.

[16]  Yong Luo,et al.  Low-Rank Multi-View Learning in Matrix Completion for Multi-Label Image Classification , 2015, AAAI.

[17]  Lina Yao,et al.  Diagnosis Code Assignment Using Sparsity-Based Disease Correlation Embedding , 2016, IEEE Transactions on Knowledge and Data Engineering.

[18]  Antonio Torralba,et al.  Cross-Modal Scene Networks , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Yi Yang,et al.  Bi-Level Semantic Representation Analysis for Multimedia Event Detection , 2017, IEEE Transactions on Cybernetics.

[20]  Yi Yang,et al.  Semantic Pooling for Complex Event Analysis in Untrimmed Videos , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Nikhil Rasiwasia,et al.  Cluster Canonical Correlation Analysis , 2014, AISTATS.

[22]  Wei Liu,et al.  Discriminative Dictionary Learning With Common Label Alignment for Cross-Modal Retrieval , 2016, IEEE Transactions on Multimedia.

[23]  Yun Fu,et al.  Multi-View Clustering via Deep Matrix Factorization , 2017, AAAI.

[24]  Qinghua Zheng,et al.  Simple to Complex Cross-modal Learning to Rank , 2017, Comput. Vis. Image Underst..

[25]  Devraj Mandal,et al.  Generalized Semantic Preserving Hashing for N-Label Cross-Modal Retrieval , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Changsheng Xu,et al.  Learning Consistent Feature Representation for Cross-Modal Multimedia Retrieval , 2015, IEEE Transactions on Multimedia.

[27]  Lina Yao,et al.  Learning Multiple Diagnosis Codes for ICU Patients with Local Disease Correlation Mining , 2017, ACM Trans. Knowl. Discov. Data.

[28]  C. V. Jawahar,et al.  Multi-label Cross-Modal Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Devraj Mandal,et al.  Generalized Coupled Dictionary Learning Approach With Applications to Cross-Modal Matching , 2016, IEEE Transactions on Image Processing.

[30]  Jinhui Tang,et al.  Weakly Supervised Deep Matrix Factorization for Social Image Understanding , 2017, IEEE Transactions on Image Processing.

[31]  Richa Singh,et al.  Detecting Silicone Mask-Based Presentation Attack via Deep Dictionary Learning , 2017, IEEE Transactions on Information Forensics and Security.

[32]  Fuchun Sun,et al.  Multimodal Measurements Fusion for Surface Material Categorization , 2018, IEEE Transactions on Instrumentation and Measurement.

[33]  Lei Zhang,et al.  Metaface learning for sparse representation based face recognition , 2010, 2010 IEEE International Conference on Image Processing.

[34]  Yang Yang,et al.  Adversarial Cross-Modal Retrieval , 2017, ACM Multimedia.