Cross-Modality Bridging and Knowledge Transferring for Image Understanding

The understanding of web images has been a hot research topic in both artificial intelligence and multimedia content analysis domains. The web images are composed of various complex foregrounds and backgrounds, which makes the design of an accurate and robust learning algorithm a challenging task. To solve the above significant problem, first, we learn a cross-modality bridging dictionary for the deep and complete understanding of a vast quantity of web images. The proposed algorithm leverages the visual features into the semantic concept probability distribution, which can construct a global semantic description for images while preserving the local geometric structure. To discover and model the occurrence patterns between intra- and inter-categories, multi-task learning is introduced for formulating the objective formulation with Capped-$\ell _{1}$ penalty, which can obtain the optimal solution with a higher probability and outperform the traditional convex function-based methods. Second, we propose a knowledge-based concept transferring algorithm to discover the underlying relations of different categories. This distribution probability transferring among categories can bring the more robust global feature representation, and enable the image semantic representation to generalize better as the scenario becomes larger. Experimental comparisons and performance discussion with classical methods on the ImageNet, Caltech-256, SUN397, and Scene15 datasets show the effectiveness of our proposed method at three traditional image understanding tasks.

[1]  Wenwu Zhu,et al.  Learning Compact Hash Codes for Multimodal Representations Using Orthogonal Deep Structure , 2015, IEEE Transactions on Multimedia.

[2]  Qingming Huang,et al.  Partial-Duplicate Image Retrieval via Saliency-Guided Visual Matching , 2013, IEEE MultiMedia.

[3]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[4]  Yuhong Guo,et al.  Convex Subspace Representation Learning from Multi-View Data , 2013, AAAI.

[5]  Qiang Yang,et al.  Heterogeneous Transfer Learning for Image Classification , 2011, AAAI.

[6]  Yue Gao,et al.  Continuous Probability Distribution Prediction of Image Emotions via Multitask Shared Sparse Regression , 2017, IEEE Transactions on Multimedia.

[7]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Jiwen Lu,et al.  Deep Coupled Metric Learning for Cross-Modal Matching , 2017, IEEE Transactions on Multimedia.

[9]  Tao Li,et al.  A Non-negative Matrix Tri-factorization Approach to Sentiment Classification with Lexical Prior Knowledge , 2009, ACL.

[10]  Ming Shao,et al.  Cross-Modality Feature Learning Through Generic Hierarchical Hyperlingual-Words , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Lifeng Sun,et al.  Social-Aware Video Recommendation for Online Social Groups , 2017, IEEE Transactions on Multimedia.

[12]  Jieping Ye,et al.  Sparse methods for biomedical data , 2012, SKDD.

[13]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Nuno Vasconcelos,et al.  Holistic Context Models for Visual Recognition , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Tong Zhang Multi-stage Convex Relaxation for Feature Selection , 2011, 1106.0565.

[16]  Luis Herranz,et al.  Joint multi-feature spatial context for scene recognition in the semantic manifold , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Feiping Nie,et al.  Large-Scale Multi-View Spectral Clustering via Bipartite Graph , 2015, AAAI.

[18]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[19]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[20]  Qiang Yang,et al.  Translated Learning: Transfer Learning across Different Feature Spaces , 2008, NIPS.

[21]  Qingming Huang,et al.  LSH-based semantic dictionary learning for large scale image understanding , 2015, J. Vis. Commun. Image Represent..

[22]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[23]  Yuxin Peng,et al.  Cross-modal Common Representation Learning by Hybrid Transfer Network , 2017, IJCAI.

[24]  Qingming Huang,et al.  A Graph Regularized Deep Neural Network for Unsupervised Image Representation Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.

[26]  Hao Su,et al.  Object Bank: An Object-Level Image Representation for High-Level Visual Recognition , 2014, International Journal of Computer Vision.

[27]  Yue Gao,et al.  Large-Scale Cross-Modality Search via Collective Matrix Factorization Hashing , 2016, IEEE Transactions on Image Processing.

[28]  Qingming Huang,et al.  Cross-media retrieval with semantics clustering and enhancement , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[29]  Xiaohua Zhai,et al.  Semi-Supervised Cross-Media Feature Learning With Unified Patch Graph Regularization , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[30]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[31]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[34]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[35]  Gang Hua,et al.  Context aware topic model for scene recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Qiang Yang,et al.  Co-clustering based classification for out-of-domain documents , 2007, KDD '07.

[37]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[38]  Charu C. Aggarwal,et al.  Towards cross-category knowledge propagation for learning visual concepts , 2011, CVPR 2011.

[39]  Nuno Vasconcelos,et al.  Scene classification with low-dimensional semantic spaces and weak supervision , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Qingming Huang,et al.  Attentive Recurrent Neural Network for Weak-supervised Multi-label Image Classification , 2018, ACM Multimedia.

[41]  Xin Li,et al.  Latent Semantic Representation Learning for Scene Classification , 2014, ICML.

[42]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Yin Li,et al.  Learning Deep Structure-Preserving Image-Text Embeddings , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[45]  Qi Tian,et al.  Joint image representation and classification in random semantic spaces , 2015, Neurocomputing.

[46]  Jieping Ye,et al.  Multi-stage multi-task feature learning , 2012, J. Mach. Learn. Res..

[47]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[48]  Shumeet Baluja,et al.  VisualRank: Applying PageRank to Large-Scale Image Search , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Qingming Huang,et al.  Learning Hierarchical Semantic Description Via Mixed-Norm Regularization for Image Understanding , 2012, IEEE Transactions on Multimedia.

[50]  Qingming Huang,et al.  Dependency Exploitation: A Unified CNN-RNN Approach for Visual Emotion Recognition , 2017, IJCAI.

[51]  Qingming Huang,et al.  Cross-media Topic Detection with Refined CNN based Image-Dominant Topic Model , 2015, ACM Multimedia.

[52]  Yongdong Zhang,et al.  A Fast Uyghur Text Detector for Complex Background Images , 2018, IEEE Transactions on Multimedia.

[53]  Yongdong Zhang,et al.  Novel Visual and Statistical Image Features for Microblogs News Verification , 2017, IEEE Transactions on Multimedia.

[54]  Chong-Wah Ngo,et al.  Click-through-based cross-view learning for image search , 2014, SIGIR.

[55]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[56]  Yiannis Andreopoulos,et al.  Voronoi-Based Compact Image Descriptors: Efficient Region-of-Interest Retrieval With VLAD and Deep-Learning-Based Descriptors , 2016, IEEE Transactions on Multimedia.

[57]  Daniel A. Keim,et al.  A Survey on Visual Analytics of Social Media Data , 2016, IEEE Transactions on Multimedia.

[58]  Haixun Wang,et al.  Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[59]  Xin Huang,et al.  An Overview of Cross-Media Retrieval: Concepts, Methodologies, Benchmarks, and Challenges , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[60]  Qiang Yang,et al.  Transferring Naive Bayes Classifiers for Text Classification , 2007, AAAI.

[61]  Pengpeng Zhao,et al.  Weak-Labeled Active Learning With Conditional Label Dependence for Multilabel Image Classification , 2017, IEEE Transactions on Multimedia.

[62]  Luigi Grippo,et al.  On the convergence of the block nonlinear Gauss-Seidel method under convex constraints , 2000, Oper. Res. Lett..

[63]  Qingming Huang,et al.  Distributed image understanding with semantic dictionary and semantic expansion , 2016, Neurocomputing.

[64]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[65]  Quanquan Gu,et al.  Learning the Shared Subspace for Multi-task Clustering and Transductive Transfer Classification , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[66]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[67]  Ting Rui,et al.  Joint user-interest and social-influence emotion prediction for individuals , 2017, Neurocomputing.

[68]  Tao Mei,et al.  Video Captioning with Transferred Semantic Attributes , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Yuxin Peng,et al.  CCL: Cross-modal Correlation Learning With Multigrained Fusion by Hierarchical Network , 2017, IEEE Transactions on Multimedia.