Fine-grained image recognition via weakly supervised click data guided bilinear CNN model

Bilinear convolutional neural networks (BCNN) model, the state-of-the-art in fine-grained image recognition, fails in distinguishing the categories with subtle visual differences. We design a novel BCNN model guided by user click data (C-BCNN) to improve the performance via capturing both the visual and semantical content in images. Specially, to deal with the heavy noise in large-scale click data, we propose a weakly supervised learning approach to learn the C-BCNN, namely W-C-BCNN. It can automatically weight the training images based on their reliability. Extensive experiments are conducted on the public Clickture-Dog dataset. It shows that: (1) integrating CNN with click feature largely improves the performance; (2) both the click data and visual consistency can help to model image reliability. Moreover, the method can be easily customized to medical image recognition. Our model performs much better than conventional BCNN models on both the Clickture-Dog and medical image dataset.

[1]  Jing Wang,et al.  Clickage: towards bridging semantic and intent gaps via mining click logs of search engines , 2013, ACM Multimedia.

[2]  Zhaohui Wu,et al.  Container Port Performance Measurement and Comparison Leveraging Ship GPS Traces and Maritime Open Data , 2016, IEEE Transactions on Intelligent Transportation Systems.

[3]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Shuang Chen,et al.  Face recognition based on subset selection via metric learning on manifold , 2015, Frontiers of Information Technology & Electronic Engineering.

[5]  Iasonas Kokkinos,et al.  Understanding Objects in Detail with Fine-Grained Attributes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Hanqing Lu,et al.  Learning to recognition from Bing Clickture data , 2016, 2016 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[7]  Gang Pan,et al.  A 3D Feature Descriptor Recovered from a Single 2D Palmprint Image , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Zhaohui Wu,et al.  L1-norm latent SVM for compact features in object detection , 2014, Neurocomputing.

[9]  Yi Ma,et al.  Learning Category-Specific Dictionary and Shared Dictionary for Fine-Grained Image Categorization , 2014, IEEE Transactions on Image Processing.

[10]  Trevor Darrell,et al.  PANDA: Pose Aligned Networks for Deep Attribute Modeling , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Zhaohui Wu,et al.  Weakly Supervised Metric Learning for Traffic Sign Recognition in a LIDAR-Equipped Vehicle , 2016, IEEE Transactions on Intelligent Transportation Systems.

[12]  Fei-Fei Li,et al.  Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[13]  Hervé Jégou,et al.  A Comparison of Dense Region Detectors for Image Search and Fine-Grained Classification , 2014, IEEE Transactions on Image Processing.

[14]  Min Tan,et al.  Robust object recognition via weakly supervised metric and template learning , 2016, Neurocomputing.

[15]  Jun Yu,et al.  Deep Neural Network Boosted Large Scale Image Recognition Using User Click Data , 2016, ICIMCS.

[16]  Daqing Zhang,et al.  crowddeliver: Planning City-Wide Package Delivery Paths Leveraging the Crowd of Taxis , 2017, IEEE Transactions on Intelligent Transportation Systems.

[17]  Seung Woo Lee,et al.  Birdsnap: Large-Scale Fine-Grained Visual Categorization of Birds , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Xuelong Li,et al.  Graph Regularized Non-Negative Low-Rank Matrix Factorization for Image Clustering , 2017, IEEE Transactions on Cybernetics.