User-Click-Data-Based Fine-Grained Image Recognition via Weakly Supervised Metric Learning

We present a novel fine-grained image recognition framework using user click data, which can bridge the semantic gap in distinguishing categories that are similar in visual. As query set in click data is usually large-scale and redundant, we first propose a click-feature-based query-merging approach to merge queries with similar semantics and construct a compact click feature. Afterward, we utilize this compact click feature and convolutional neural network (CNN)-based deep visual feature to jointly represent an image. Finally, with the combined feature, we employ the metriclearning-based template-matching scheme for efficient recognition. Considering the heavy noise in the training data, we introduce a reliability variable to characterize the image reliability, and propose a weakly-supervised metric and template leaning with smooth assumption and click prior (WMTLSC) method to jointly learn the distance metric, object templates, and image reliability. Extensive experiments are conducted on a public Clickture-Dog dataset and our newly established Clickture-Bird dataset. It is shown that the click-data-based query merging helps generating a highly compact (the dimension is reduced to 0.9%) and dense click feature for images, which greatly improves the computational efficiency. Also, introducing this click feature into CNN feature further boosts the recognition accuracy. The proposed framework performs much better than previous state-of-the-arts in fine-grained recognition tasks.

[1]  Jing Wang,et al.  Clickage: towards bridging semantic and intent gaps via mining click logs of search engines , 2013, ACM Multimedia.

[2]  Jinhui Tang,et al.  Generalized Deep Transfer Networks for Knowledge Propagation in Heterogeneous Domains , 2016, ACM Trans. Multim. Comput. Commun. Appl..

[3]  Yongdong Zhang,et al.  Web Image Search Re-Ranking With Click-Based Similarity and Typicality , 2016, IEEE Transactions on Image Processing.

[4]  Dong Liu,et al.  Fine-Grained Image Recognition from Click-Through Logs Using Deep Siamese Network , 2017, MMM.

[5]  Luming Zhang,et al.  Spatial-Temporal Tag Mining for Automatic Geospatial Video Annotation , 2015, ACM Trans. Multim. Comput. Commun. Appl..

[6]  Jianping Fan,et al.  Integrating multi-level deep learning and concept ontology for large-scale visual recognition , 2018, Pattern Recognit..

[7]  Tiejun Zhao,et al.  Automatic Image Dataset Construction from Click-through Logs Using Deep Neural Network , 2015, ACM Multimedia.

[8]  Gang Pan,et al.  Feature Reduction for Efficient Object Detection via L1-norm Latent SVM , 2012, IScIDE.

[9]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[10]  Changsheng Xu,et al.  Boosted Multifeature Learning for Cross-Domain Transfer , 2015, ACM Trans. Multim. Comput. Commun. Appl..

[11]  Yi Yang,et al.  Weakly Supervised Human Fixations Prediction , 2016, IEEE Transactions on Cybernetics.

[12]  Fei-Fei Li,et al.  Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[13]  Tat-Seng Chua,et al.  Learning from Collective Intelligence , 2016, ACM Trans. Multim. Comput. Commun. Appl..

[14]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[15]  Fei Gao,et al.  Deep Multimodal Distance Metric Learning Using Click Constraints for Image Ranking , 2017, IEEE Transactions on Cybernetics.

[16]  Jianping Fan,et al.  Fine-grained image recognition via weakly supervised click data guided bilinear CNN model , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[17]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[18]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[19]  Iasonas Kokkinos,et al.  Understanding Objects in Detail with Fine-Grained Attributes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Hanqing Lu,et al.  Learning to recognition from Bing Clickture data , 2016, 2016 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[21]  Seung Woo Lee,et al.  Birdsnap: Large-Scale Fine-Grained Visual Categorization of Birds , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Yi Ma,et al.  Learning Category-Specific Dictionary and Shared Dictionary for Fine-Grained Image Categorization , 2014, IEEE Transactions on Image Processing.

[23]  Jun Yu,et al.  Exploiting Click Constraints and Multi-view Features for Image Re-ranking , 2014, IEEE Transactions on Multimedia.

[24]  Rong Jin,et al.  Fine-grained visual categorization via multi-stage metric learning , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Yue Gao,et al.  Attribute-Augmented Semantic Hierarchy: Towards a Unified Framework for Content-Based Image Retrieval , 2014, TOMM.

[27]  Trevor Darrell,et al.  PANDA: Pose Aligned Networks for Deep Attribute Modeling , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Zhaohui Wu,et al.  L1-norm latent SVM for compact features in object detection , 2014, Neurocomputing.

[29]  Min Tan,et al.  Robust object recognition via weakly supervised metric and template learning , 2016, Neurocomputing.

[30]  Forrest N. Iandola,et al.  Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction , 2013, 2013 IEEE International Conference on Computer Vision.

[31]  Changshui Zhang,et al.  Traffic Sign Recognition With Hinge Loss Trained Convolutional Neural Networks , 2014, IEEE Transactions on Intelligent Transportation Systems.

[32]  Cees Snoek,et al.  Active Transfer Learning with Zero-Shot Priors: Reusing Past Datasets for Future Tasks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[34]  Yongdong Zhang,et al.  Click-boosting multi-modality graph-based reranking for image search , 2014, Multimedia Systems.

[35]  Gert R. G. Lanckriet,et al.  Metric Learning to Rank , 2010, ICML.

[36]  Lingling Meng,et al.  A Review of Semantic Similarity Measures in WordNet 1 , 2013 .

[37]  Qingming Huang,et al.  Click data guided query modeling with click propagation and sparse coding , 2018, Multimedia Tools and Applications.

[38]  Jun Yu,et al.  Deep Neural Network Boosted Large Scale Image Recognition Using User Click Data , 2016, ICIMCS.

[39]  Jun Yu,et al.  Click Prediction for Web Image Reranking Using Multimodal Sparse Coding , 2014, IEEE Transactions on Image Processing.

[40]  Y. Rui,et al.  Learning to Rank Using User Clicks and Visual Features for Image Retrieval , 2015, IEEE Transactions on Cybernetics.

[41]  Jian-Huang Lai,et al.  Improve dog recognition by mining more information from both click-through logs and pre-trained models , 2016, 2016 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[42]  Zhaohui Wu,et al.  Weakly Supervised Metric Learning for Traffic Sign Recognition in a LIDAR-Equipped Vehicle , 2016, IEEE Transactions on Intelligent Transportation Systems.

[43]  Qiang Song,et al.  Learning Deep Features For MSR-bing Information Retrieval Challenge , 2015, ACM Multimedia.

[44]  Christos Diou,et al.  Building effective SVM concept detectors from clickthrough data for large-scale image retrieval , 2015, International Journal of Multimedia Information Retrieval.

[45]  Changsheng Xu,et al.  Semantic Feature Mining for Video Event Understanding , 2016, ACM Trans. Multim. Comput. Commun. Appl..