Scalable domain adaptation of convolutional neural networks

Convolutional neural networks (CNNs) tend to become a standard approach to solve a wide array of computer vision problems. Besides important theoretical and practical advances in their design, their success is built on the existence of manually labeled visual resources, such as ImageNet. The creation of such datasets is cumbersome and here we focus on alternatives to manual labeling. We hypothesize that new resources are of uttermost importance in domains which are not or weakly covered by ImageNet, such as tourism photographs. We first collect noisy Flickr images for tourist points of interest and apply automatic or weakly-supervised reranking techniques to reduce noise. Then, we learn domain adapted models with a standard CNN architecture and compare them to a generic model obtained from ImageNet. Experimental validation is conducted with publicly available datasets, including Oxford5k, INRIA Holidays and Div150Cred. Results show that low-cost domain adaptation improves results compared to the use of generic models but also compared to strong non-CNN baselines such as triangulation embedding.

[1]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[5]  Adrian Popescu,et al.  Gazetiki: automatic creation of a geographical gazetteer , 2008, JCDL '08.

[6]  Jiri Matas,et al.  Total recall II: Query expansion revisited , 2011, CVPR 2011.

[7]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[9]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[10]  Florent Perronnin,et al.  Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Yannis Avrithis,et al.  To Aggregate or Not to aggregate: Selective Match Kernels for Image Search , 2013, 2013 IEEE International Conference on Computer Vision.

[12]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[13]  Bogdan Ionescu,et al.  Retrieving Diverse Social Images at MediaEval 2017: Challenges, Dataset and Evaluation , 2017, MediaEval.

[14]  Yi Liu,et al.  Large-scale image annotation using visual synset , 2011, 2011 International Conference on Computer Vision.

[15]  Andrew Zisserman,et al.  Triangulation Embedding and Democratic Aggregation for Image Search , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[17]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[19]  Hervé Jégou,et al.  Negative Evidences and Co-occurences in Image Retrieval: The Benefit of PCA and Whitening , 2012, ECCV.

[20]  Victor S. Lempitsky,et al.  Neural Codes for Image Retrieval , 2014, ECCV.

[21]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[22]  Antonio Criminisi,et al.  Harvesting Image Databases from the Web , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[23]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[24]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.