论文信息 - Weakly-paired deep dictionary learning for cross-modal retrieval

Weakly-paired deep dictionary learning for cross-modal retrieval

Abstract Many multi-modal data suffers from significant weak-pairing characteristics, i.e., there is no sample-to-sample correspondence between modalities, rather classes of samples in one modality correspond to classes of samples in the other modality. This provides great challenges for the cross-modal learning for retrieval. In this work, our focus is learning cross-modal representations with minimal class label supervision and without correspondences between samples. To tackle this challenging problem, we establish a scalable hierarchical learning architecture to deal with the extensive weakly-paired heterogeneous multi-modal data. A shared classifier across different modalities is used to effectively deal with label supervision information, and a multi-modal low-rank model is introduced to encourage the modal-invariant representation. Finally, some cross-modal validations on publicly available datasets are performed to show the advantages of the proposed method.

[1] Tieniu Tan,et al. Group-Invariant Cross-Modal Subspace Learning , 2016, IJCAI.

[2] Karthikeyan Natesan Ramamurthy,et al. Multiple Kernel Sparse Representations for Supervised and Unsupervised Learning , 2013, IEEE Transactions on Image Processing.

[3] Yun Fu,et al. Deep Low-rank Sparse Collective Factorization for Cross-Domain Recommendation , 2017, ACM Multimedia.

[4] Nicu Sebe,et al. The Many Shades of Negativity , 2017, IEEE Transactions on Multimedia.

[5] Mayank Vatsa,et al. Deep Dictionary Learning , 2016, IEEE Access.

[6] Qingming Huang,et al. Multi-level Discriminative Dictionary Learning towards Hierarchical Visual Categorization , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7] Xiaojun Chang,et al. Feature Interaction Augmented Sparse Learning for Fast Kinect Motion Detection , 2017, IEEE Transactions on Image Processing.

[8] Heng Tao Shen,et al. Semi-Paired Discrete Hashing: Learning Latent Hash Codes for Semi-Paired Cross-View Retrieval , 2017, IEEE Transactions on Cybernetics.

[9] Yueting Zhuang,et al. Supervised Coupled Dictionary Learning with Group Structures for Multi-modal Retrieval , 2013, AAAI.

[10] Qi Tian,et al. Adaptively Unified Semi-supervised Learning for Cross-Modal Retrieval , 2017, IJCAI.

[11] Nicu Sebe,et al. Joint Attributes and Event Analysis for Multimedia Event Detection , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[12] Asok Ray,et al. Multimodal Task-Driven Dictionary Learning for Image Classification , 2015, IEEE Transactions on Image Processing.

[13] George Trigeorgis,et al. A Deep Semi-NMF Model for Learning Hidden Representations , 2014, ICML.

[14] Qinghua Zheng,et al. Adaptive Unsupervised Feature Selection With Structure Regularization , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[15] Christoph H. Lampert,et al. Weakly-Paired Maximum Covariance Analysis for Multimodal Dimensionality Reduction and Transfer Learning , 2010, ECCV.

[16] Yong Luo,et al. Low-Rank Multi-View Learning in Matrix Completion for Multi-Label Image Classification , 2015, AAAI.

[17] Lina Yao,et al. Diagnosis Code Assignment Using Sparsity-Based Disease Correlation Embedding , 2016, IEEE Transactions on Knowledge and Data Engineering.

[18] Antonio Torralba,et al. Cross-Modal Scene Networks , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19] Yi Yang,et al. Bi-Level Semantic Representation Analysis for Multimedia Event Detection , 2017, IEEE Transactions on Cybernetics.

[20] Yi Yang,et al. Semantic Pooling for Complex Event Analysis in Untrimmed Videos , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21] Nikhil Rasiwasia,et al. Cluster Canonical Correlation Analysis , 2014, AISTATS.

[22] Wei Liu,et al. Discriminative Dictionary Learning With Common Label Alignment for Cross-Modal Retrieval , 2016, IEEE Transactions on Multimedia.

[23] Yun Fu,et al. Multi-View Clustering via Deep Matrix Factorization , 2017, AAAI.

[24] Qinghua Zheng,et al. Simple to Complex Cross-modal Learning to Rank , 2017, Comput. Vis. Image Underst..

[25] Devraj Mandal,et al. Generalized Semantic Preserving Hashing for N-Label Cross-Modal Retrieval , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Changsheng Xu,et al. Learning Consistent Feature Representation for Cross-Modal Multimedia Retrieval , 2015, IEEE Transactions on Multimedia.

[27] Lina Yao,et al. Learning Multiple Diagnosis Codes for ICU Patients with Local Disease Correlation Mining , 2017, ACM Trans. Knowl. Discov. Data.

[28] C. V. Jawahar,et al. Multi-label Cross-Modal Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29] Devraj Mandal,et al. Generalized Coupled Dictionary Learning Approach With Applications to Cross-Modal Matching , 2016, IEEE Transactions on Image Processing.

[30] Jinhui Tang,et al. Weakly Supervised Deep Matrix Factorization for Social Image Understanding , 2017, IEEE Transactions on Image Processing.

[31] Richa Singh,et al. Detecting Silicone Mask-Based Presentation Attack via Deep Dictionary Learning , 2017, IEEE Transactions on Information Forensics and Security.

[32] Fuchun Sun,et al. Multimodal Measurements Fusion for Surface Material Categorization , 2018, IEEE Transactions on Instrumentation and Measurement.

[33] Lei Zhang,et al. Metaface learning for sparse representation based face recognition , 2010, 2010 IEEE International Conference on Image Processing.

[34] Yang Yang,et al. Adversarial Cross-Modal Retrieval , 2017, ACM Multimedia.