Object instance identification with fully convolutional networks

This paper presents a novel approach for instance search and object detection, applied to museum visits. This approach relies on fully convolutional networks (FCN) to obtain region proposals and object representation. Our proposal consists in four steps: a classical convolutional network is first fined-tuned as classifier over the dataset, next we build from this network a second one, fully convolutional, trained as classifier, that focuses on all regions of the corpus images, this network is used in a third step to define image global descriptors in a siamese architecture using triplets of images, and eventually these descriptors are then used for retrieval using classical scalar product between vectors. Our framework has the following features: i) it is well suited for small datasets with low objects variability as we use transfer learning, ii) it does not require any additional component in the network as we rely on classical (i.e. not fully convolutional) and fully convolutional networks, and iii) it does not need region annotations in the dataset as it deals with regions in a unsupervised way. Through multiple experiments on two image datasets taken from museum visits, we detail the effect of each parameter, and we show that the descriptors obtained using our proposed network outperform those from previous state-of-the-art approaches.

[1]  Victor S. Lempitsky,et al.  Neural Codes for Image Retrieval , 2014, ECCV.

[2]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[3]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[4]  Philippe Mulhem,et al.  Construction et évaluation d'un corpus pour la recherche d'instances d'images muséales , 2017, CORIA.

[5]  Albert Gordo,et al.  Deep Image Retrieval: Learning Global Representations for Image Search , 2016, ECCV.

[6]  Albert Gordo,et al.  End-to-End Learning of Deep Visual Representations for Image Retrieval , 2016, International Journal of Computer Vision.

[7]  Thomas Brox,et al.  Descriptor Matching with Convolutional Neural Networks: a Comparison to SIFT , 2014, ArXiv.

[8]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[9]  Iasonas Kokkinos,et al.  Fracking Deep Convolutional Image Descriptors , 2014, ArXiv.

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[12]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[16]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[18]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[19]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[21]  Ondrej Chum,et al.  CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples , 2016, ECCV.

[22]  Luiz André Barroso,et al.  Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.

[23]  Shin'ichi Satoh,et al.  Faster R-CNN Features for Instance Search , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[24]  Victor S. Lempitsky,et al.  Aggregating Local Deep Features for Image Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Josef Sivic,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Cordelia Schmid,et al.  Local Convolutional Features with Unsupervised Training for Image Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Yann LeCun,et al.  Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches , 2015, J. Mach. Learn. Res..

[28]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[29]  Ronan Sicre,et al.  Particular object retrieval with integral max-pooling of CNN activations , 2015, ICLR.

[30]  Panu Turcot,et al.  Better matching with fewer features: The selection of useful features in large database recognition problems , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.