Fine-Grained Recognition in the Wild: A Multi-task Domain Adaptation Approach

While fine-grained object recognition is an important problem in computer vision, current models are unlikely to accurately classify objects in the wild. These fully supervised models need additional annotated images to classify objects in every new scenario, a task that is infeasible. However, sources such as e-commerce websites and field guides provide annotated images for many classes. In this work, we study fine-grained domain adaptation as a step towards overcoming the dataset shift between easily acquired annotated images and the real world. Adaptation has not been studied in the fine-grained setting where annotations such as attributes could be used to increase performance. Our work uses an attribute based multi-task adaptation loss to increase accuracy from a baseline of 4:1% to 19:1% in the semi-supervised adaptation case. Prior domain adaptation works have been benchmarked on small datasets such as [46] with a total of 795 images for some domains, or simplistic datasets such as [41] consisting of digits. We perform experiments on a subset of a new challenging fine-grained dataset consisting of 1; 095; 021 images of 2; 657 car categories drawn from e-commerce websites and Google Street View.

[1]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[2]  Andrew Zisserman,et al.  Symbiotic Segmentation and Part Localization for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision.

[3]  Yuan Shi,et al.  Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[5]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[6]  Andrew Zisserman,et al.  Learning Visual Attributes , 2007, NIPS.

[7]  Shaogang Gong,et al.  Unsupervised Domain Adaptation for Zero-Shot Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Andrew Zisserman,et al.  Tabula rasa: Model transfer for object category detection , 2011, 2011 International Conference on Computer Vision.

[9]  Alan L. Yuille,et al.  Learning Deep Structured Models , 2014, ICML.

[10]  Yoshua Bengio,et al.  Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism , 2016, NAACL.

[11]  Frédéric Jurie,et al.  Improving Image Classification Using Semantic Attributes , 2012, International Journal of Computer Vision.

[12]  Jonathan Krause,et al.  Fine-Grained Crowdsourcing for Fine-Grained Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Trevor Darrell,et al.  Deep Domain Confusion: Maximizing for Domain Invariance , 2014, CVPR 2014.

[14]  Fei-Fei Li,et al.  Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[15]  Shenghuo Zhu,et al.  Image segmentation for large-scale subcategory flower recognition , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[16]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[19]  Namil Kim,et al.  Pixel-Level Domain Transfer , 2016, ECCV.

[20]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[21]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[22]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[23]  Christoph H. Lampert,et al.  Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Arnold W. M. Smeulders,et al.  Fine-Grained Categorization by Alignments , 2013, 2013 IEEE International Conference on Computer Vision.

[25]  Trevor Darrell,et al.  Adapting Visual Category Models to New Domains , 2010, ECCV.

[26]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[27]  Luc Van Gool,et al.  TriCoS: A Tri-level Class-Discriminative Co-segmentation Method for Image Classification , 2012, ECCV.

[28]  Kun Duan,et al.  Discovering localized attributes for fine-grained recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Andrew Zisserman,et al.  A Visual Vocabulary for Flower Classification , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[30]  Philip H. S. Torr,et al.  An embarrassingly simple approach to zero-shot learning , 2015, ICML.

[31]  Trevor Darrell,et al.  Simultaneous Deep Transfer Across Domains and Tasks , 2015, ICCV.

[32]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[33]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Samy Bengio,et al.  Large-Scale Object Classification Using Label Relation Graphs , 2014, ECCV.

[35]  Seung Woo Lee,et al.  Birdsnap: Large-Scale Fine-Grained Visual Categorization of Birds , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Adriana Kovashka,et al.  Attribute Adaptation for Personalized Image Search , 2013, 2013 IEEE International Conference on Computer Vision.

[37]  Peter N. Belhumeur,et al.  POOF: Part-Based One-vs.-One Features for Fine-Grained Categorization, Face Verification, and Attribute Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[39]  Tinne Tuytelaars,et al.  Lightweight Unsupervised Domain Adaptation by Convolutional Filter Reconstruction , 2016, ECCV Workshops.

[40]  Michael S. Bernstein,et al.  Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[41]  Pietro Perona,et al.  Bird Species Categorization Using Pose Normalized Deep Convolutional Nets , 2014, ArXiv.

[42]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[43]  Michael I. Jordan,et al.  Unsupervised Domain Adaptation with Residual Transfer Networks , 2016, NIPS.

[44]  Jonathan Krause,et al.  Fine-Grained Car Detection for Visual Census Estimation , 2017, AAAI.

[45]  George Trigeorgis,et al.  Domain Separation Networks , 2016, NIPS.

[46]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[47]  Tinne Tuytelaars,et al.  Joint cross-domain classification and subspace learning for unsupervised adaptation , 2014, Pattern Recognit. Lett..

[48]  Andrew Zisserman,et al.  BiCoS: A Bi-level co-segmentation method for image classification , 2011, 2011 International Conference on Computer Vision.

[49]  Jian Sun,et al.  Instance-Aware Semantic Segmentation via Multi-task Network Cascades , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Jonathan Krause,et al.  The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition , 2015, ECCV.

[51]  Larry S. Davis,et al.  Birdlets: Subordinate categorization using volumetric primitives and pose-normalized appearance , 2011, 2011 International Conference on Computer Vision.

[52]  Fei-Fei Li,et al.  Attribute Learning in Large-Scale Datasets , 2010, ECCV Workshops.

[53]  Xiaoou Tang,et al.  A large-scale car dataset for fine-grained categorization and verification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).