3D-Assisted Image Feature Synthesis for Novel Views of an Object

Comparing two images in a view-invariant way has been a challenging problem in computer vision for a long time, as visual features are not stable under large view point changes. In this paper, given a single input image of an object, we synthesize new features for other views of the same object. To accomplish this, we introduce an aligned set of 3D models in the same class as the input object image. Each 3D model is represented by a set of views, and we study the correlation of image patches between different views, seeking what we call surrogates --- patches in one view whose feature content predicts well the features of a patch in another view. In particular, for each patch in the novel desired view, we seek surrogates from the observed view of the given image. For a given surrogate, we predict that surrogate using linear combination of the corresponding patches of the 3D model views, learn the coefficients, and then transfer these coefficients on a per patch basis to synthesize the features of the patch in the novel view. In this way we can create feature sets for all views of the latent object, providing us a multi-view representation of the object. View-invariant object comparisons are achieved simply by computing the $L^2$ distances between the features of corresponding views. We provide theoretical and empirical analysis of the feature synthesis process, and evaluate the proposed view-agnostic distance (VAD) in fine-grained image retrieval (100 object classes) and classification tasks. Experimental results show that our synthesized features do enable view-independent comparison between images and perform significantly better than traditional image features in this respect.

[1]  A. Suresh,et al.  The Complexity of Estimating R\'enyi Entropy , 2014 .

[2]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[3]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[5]  R. Vidal A TUTORIAL ON SUBSPACE CLUSTERING , 2010 .

[6]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[7]  Alexei A. Efros,et al.  Seeing 3D Chairs: Exemplar Part-Based 2D-3D Alignment Using a Large Dataset of CAD Models , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[9]  Himanshu Tyagi,et al.  The Complexity of Estimating Rényi Entropy , 2015, SODA.

[10]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[11]  Fei-Fei Li,et al.  Hierarchical semantic indexing for large scale image retrieval , 2011, CVPR 2011.

[12]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[13]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Ming Ouhyoung,et al.  On Visual Similarity Based 3D Model Retrieval , 2003, Comput. Graph. Forum.

[15]  Silvio Savarese,et al.  3D generic object categorization, localization and pose estimation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[16]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[17]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[18]  Leonidas J. Guibas,et al.  Estimating image depth using shape collections , 2014, ACM Trans. Graph..

[19]  Leonidas J. Guibas,et al.  Fine-grained semi-supervised labeling of large shape collections , 2013, ACM Trans. Graph..

[20]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[21]  Subhransu Maji,et al.  Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.