Annotation modification for fine-grained visual recognition

Abstract Query modification is an intensively studied and widely used technique in information retrieval, for it helps better understand the intention of the users. In this work, we introduce this idea into fine-grained visual recognition, which is important to ambiguous queries in image retrieval task. Unlike most existing works, which incorporate information about object bounding boxes or parts for extracting discriminative local features, we propose a novel approach from a new viewpoint to solve the fine-grained recognition problem, namely annotation modification. The proposed approach fully exploits the inter-class ambiguity (which is generally regarded as noise) to form active sets of annotations for boosting the fine-grained visual recognition. Specifically, it first obtains some most confusing classes of each image through an easy-to-evaluate classifier, and then modify the annotation of each image using the active set of annotations. To handle the modified annotations, a novel ranking based loss function is further designed to learn effective classification models. We evaluate the proposed approach on three popular fine-grained image datasets (i.e., Oxford-IIIT Pets, Flower-102 and CUB200-2011), and the experimental results clearly demonstrate its effectiveness.

[1]  Thomas S. Huang,et al.  Content-based image retrieval with relevance feedback in MARS , 1997, Proceedings of International Conference on Image Processing.

[2]  Xuelong Li,et al.  Speed up deep neural network based pedestrian detection by sharing features across multi-scale models , 2016, Neurocomputing.

[3]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[5]  Trevor Darrell,et al.  Part-Based R-CNNs for Fine-Grained Category Detection , 2014, ECCV.

[6]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[7]  Mahadev Satyanarayanan,et al.  Diamond: A Storage Architecture for Early Discard in Interactive Search , 2004, FAST.

[8]  Nanning Zheng,et al.  Incorporating image priors with deep convolutional neural networks for image super-resolution , 2016, Neurocomputing.

[9]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Bingbing Ni,et al.  Image Classification by Selective Regularized Subspace Learning , 2016, IEEE Transactions on Multimedia.

[12]  Esther de Ves,et al.  Modeling user preferences in content-based image retrieval: A novel attempt to bridge the semantic gap , 2015, Neurocomputing.

[13]  Xuelong Li,et al.  Image Annotation by Multiple-Instance Learning With Discriminative Feature Mapping and Selection , 2014, IEEE Transactions on Cybernetics.

[14]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[15]  Yi Yang,et al.  Discriminative Cellets Discovery for Fine-Grained Image Categories Retrieval , 2014, ICMR.

[16]  Matti Pietikäinen,et al.  Dynamic texture and scene classification by transferring deep image features , 2015, Neurocomputing.

[17]  Pietro Perona,et al.  Bird Species Categorization Using Pose Normalized Deep Convolutional Nets , 2014, ArXiv.

[18]  Xuelong Li,et al.  Detecting Densely Distributed Graph Patterns for Fine-Grained Image Categorization , 2016, IEEE Transactions on Image Processing.

[19]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Stefan Winkler,et al.  Category Specific Dictionary Learning for Attribute Specific Feature Selection , 2016, IEEE Transactions on Image Processing.

[21]  Sugata Ghosal,et al.  An image retrieval system with automatic query modification , 2002, IEEE Trans. Multim..

[22]  Subhransu Maji,et al.  Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.

[23]  Meng Wang,et al.  Harvesting visual concepts for image search with complex queries , 2012, ACM Multimedia.

[24]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[25]  Cewu Lu,et al.  Deep LAC: Deep localization, alignment and classification for fine-grained recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Mohand Boughanem,et al.  Query Modification Based on Relevance Back-Propagation in an Ad hoc Environment , 1999, Inf. Process. Manag..

[27]  Yuxin Peng,et al.  The application of two-level attention models in deep convolutional neural network for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  WangMeng,et al.  Oracle in Image Search , 2012 .

[29]  Michael Stonebraker,et al.  Implementation of integrity constraints and views by query modification , 1975, SIGMOD '75.

[30]  Bart Thomee,et al.  Interactive search in image retrieval: a survey , 2012, International Journal of Multimedia Information Retrieval.

[31]  Luming Zhang,et al.  Unified Photo Enhancement by Discovering Aesthetic Communities From Flickr , 2016, IEEE Transactions on Image Processing.

[32]  C. V. Jawahar,et al.  Cats and dogs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Xuelong Li,et al.  A Fine-Grained Image Categorization System by Cellet-Encoded Spatial Pyramid Modeling , 2015, IEEE Transactions on Industrial Electronics.

[34]  Michael S. Lew,et al.  Deep learning for visual understanding: A review , 2016, Neurocomputing.

[35]  Jason Weston,et al.  Large scale image annotation: learning to rank with joint word-image embeddings , 2010, Machine Learning.

[36]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[37]  Wenjie Lu,et al.  Regional deep learning model for visual tracking , 2016, Neurocomputing.

[38]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  BengioSamy,et al.  Large scale image annotation , 2010 .

[40]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[41]  Thomas S. Huang,et al.  Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[42]  Philip H. S. Torr,et al.  BING: Binarized normed gradients for objectness estimation at 300fps , 2014, Computational Visual Media.

[43]  Jonathan Krause,et al.  Fine-grained recognition without part annotations , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[45]  Qi Tian,et al.  Understanding Blooming Human Groups in Social Networks , 2015, IEEE Transactions on Multimedia.

[46]  Ying Liu,et al.  A survey of content-based image retrieval with high-level semantics , 2007, Pattern Recognit..

[47]  Patrick Gallinari,et al.  Ranking with ordered weighted pairwise classification , 2009, ICML '09.

[48]  Meng Wang,et al.  Oracle in Image Search: A Content-Based Approach to Performance Prediction , 2012, TOIS.

[49]  Bo Geng,et al.  Query difficulty estimation for image retrieval , 2012, Neurocomputing.

[50]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Kriengkrai Porkaew,et al.  Query refinement for multimedia similarity retrieval in MARS , 1999, MULTIMEDIA '99.

[52]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.