A 3D deformable model-based framework for the retrieval of near-isometric flattenable objects using Bag-of-Visual-Words

Abstract We introduce a 3D deformable model-based framework for the retrieval of near-isometric flattenable objects using keypoints and BoVW (Bag-of-Visual-Words). By 3D deformable model we mean a texturemapped 3D shape which may deform isometrically. We assume that such a model is available for each object in the database. We exploit the 3D deformable models at the training and the retrieval phases. For our first contribution, we exploit the possibility of generating synthetic data from the 3D deformable models to define a new BoVW model for the database object representation. Our model chooses an optimal per-object representation by maximizing each object’s mean average precision. The maximization is done over multiple candidate representations which are generated using the criteria of keypoint repeatability, weight discriminance and stability. Our second contribution is the use of SfT (Shape-from-Template) to facilitate geometric verification at the retrieval phase, for a few objects hypothesized using the new BoVW model. Existing methods use a rigid model, such as the fundamental matrix, or a simple deformable model based on semi-local constraints. SfT however is a physics-based method which uses an object’s 3D deformable model to reconstruct its isometric 3D deformation from a single input image. The output of SfT thus directly provides a geometric verification score. A byproduct of our work is to extend the scope of SfT. The proposed object retrieval framework is used to provide SfT with a few object hypotheses which may be quickly tested for the 3D deformable object selection. Performance evaluation on synthetic and real images reveals the benefits of our retrieval framework using a database with size varying between 20 and 1000 objects. The use of the new BoVW model and SfT versus the BoVW baseline and a rigid model improves the retrieval performance by 4.2% and 11.3% with p -values of 5 × 10 − 6 and 7 × 10 − 30 respectively.

[1]  Cor J. Veenman,et al.  Kernel Codebooks for Scene Categorization , 2008, ECCV.

[2]  Atilla Baskurt,et al.  Generalizations of angular radial transform for 2D and 3D shape retrieval , 2005, Pattern Recognit. Lett..

[3]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Adrien Bartoli,et al.  [POSTER] Realtime Shape-from-Template: System and Applications , 2015, 2015 IEEE International Symposium on Mixed and Augmented Reality.

[5]  Subrata Rakshit,et al.  Feature selection using bag-of-visual-words representation , 2010, 2010 IEEE 2nd International Advance Computing Conference (IACC).

[6]  Daniel Pizarro-Perez,et al.  Feature-Based Deformable Surface Detection with Self-Occlusion Reasoning , 2011, International Journal of Computer Vision.

[7]  Ameesh Makadia,et al.  Feature Tracking for Wide-Baseline Image Retrieval , 2010, ECCV.

[8]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[9]  Stefano Soatto,et al.  Relaxed matching kernels for robust image comparison , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Mathieu Perriollat,et al.  A computational model of bounded developable surfaces with application to image‐based three‐dimensional reconstruction , 2013, Comput. Animat. Virtual Worlds.

[11]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[12]  Minsu Cho,et al.  Hyper-graph matching via reweighted random walks , 2011, CVPR 2011.

[13]  Pascal Fua,et al.  Laplacian Meshes for Monocular 3D Shape Recovery , 2012, ECCV.

[14]  Yannis Avrithis,et al.  Hough Pyramid Matching: Speeded-Up Geometry Re-ranking for Large Scale Image Retrieval , 2014, International Journal of Computer Vision.

[15]  Richard I. Hartley,et al.  In Defense of the Eight-Point Algorithm , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Martial Hebert,et al.  A spectral technique for correspondence problems using pairwise constraints , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[17]  Cristian Sminchisescu,et al.  Semi-supervised learning and optimization for hypergraph matching , 2011, 2011 International Conference on Computer Vision.

[18]  Bernt Schiele,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[19]  Qi Tian,et al.  Coupled Binary Embedding for Large-Scale Image Retrieval , 2014, IEEE Transactions on Image Processing.

[20]  Hideo Saito,et al.  Virtually augmenting hundreds of real pictures: An approach based on learning, retrieval, and tracking , 2010, 2010 IEEE Virtual Reality Conference (VR).

[21]  Michael Isard,et al.  Descriptor Learning for Efficient Retrieval , 2010, ECCV.

[22]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Jiri Matas,et al.  Total recall II: Query expansion revisited , 2011, CVPR 2011.

[24]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[25]  Amnon Shashua,et al.  Probabilistic graph and hypergraph matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Adrien Bartoli,et al.  An Analysis of Errors in Graph-Based Keypoint Matching and Proposed Solutions , 2014, ECCV.

[27]  Shih-Fu Chang,et al.  3D shape retrieval using a single depth image from low-cost sensors , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[28]  Siyuan Qi,et al.  Object retrieval with image graph traversal-based re-ranking , 2016, Signal Process. Image Commun..

[29]  Luc Van Gool,et al.  Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors , 2011, CVPR 2011.

[30]  Thomas Vetter,et al.  Face Recognition Based on Fitting a 3D Morphable Model , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  C. Schmid,et al.  On the burstiness of visual elements , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Pierre Tirilly,et al.  Distances and weighting schemes for bag of visual words image retrieval , 2010, MIR '10.

[33]  Ralph R. Martin,et al.  Skeleton-based canonical forms for non-rigid 3D shape retrieval , 2016, Computational Visual Media.

[34]  Panu Turcot,et al.  Better matching with fewer features: The selection of useful features in large database recognition problems , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[35]  Ying Wu,et al.  Object retrieval and localization with spatially-constrained similarity measure and k-NN re-ranking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Andrew Zisserman,et al.  Smooth object retrieval using a bag of boundaries , 2011, 2011 International Conference on Computer Vision.

[37]  Sebastian Nowozin,et al.  On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[38]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Vincent Lepetit,et al.  Fast Non-Rigid Surface Detection, Registration and Realistic Augmentation , 2008, International Journal of Computer Vision.

[40]  Trevor Darrell,et al.  The Pyramid Match Kernel: Efficient Learning with Sets of Features , 2007, J. Mach. Learn. Res..

[41]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[42]  Gang Hua,et al.  Picking the best DAISY , 2009, CVPR.

[43]  Ognjen Arandjelovic,et al.  Object Matching Using Boundary Descriptors , 2012, BMVC.

[44]  Yusuf Sahillioglu,et al.  Detail-Preserving Mesh Unfolding for Nonrigid Shape Retrieval , 2016, ACM Trans. Graph..

[45]  Gustavo Carneiro,et al.  Flexible Spatial Configuration of Local Image Features , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Pascal Fua,et al.  Linear Local Models for Monocular Reconstruction of Deformable Surfaces , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Qi Tian,et al.  Lp-Norm IDF for Large Scale Image Search , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Cordelia Schmid,et al.  Local Grayvalue Invariants for Image Retrieval , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[49]  Jiri Matas,et al.  Learning Vocabularies over a Fine Quantization , 2013, International Journal of Computer Vision.

[50]  Wei Liu,et al.  Discrete hyper-graph matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Anton van den Hengel,et al.  Boosting Object Retrieval With Group Queries , 2012, IEEE Signal Processing Letters.

[52]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[53]  Andrew Zisserman,et al.  Multiple queries for large scale specific object retrieval , 2012, BMVC.

[54]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Chih-Fong Tsai,et al.  Bag-of-Words Representation in Image Annotation: A Review , 2012 .

[56]  Cordelia Schmid,et al.  Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[57]  Hans-Peter Kriegel,et al.  State-of-the-Art in Content-Based Image and Video Retrieval , 2001, Computational Imaging and Vision.

[58]  Daniel Pizarro-Perez,et al.  Shape-from-Template , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Li Bicheng,et al.  Bag-of-Visual-Words Based Object Retrieval with E2LSH and Query Expansion , 2012 .

[60]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[61]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[62]  Tsuhan Chen,et al.  Image retrieval with geometry-preserving visual phrases , 2011, CVPR 2011.

[63]  Pascal Fua,et al.  Live Texturing of Augmented Reality Characters from Colored Drawings , 2015, IEEE Transactions on Visualization and Computer Graphics.

[64]  Adrien Bartoli,et al.  Deformable 3D Reconstruction with an Object Database , 2012, BMVC.

[65]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[66]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[67]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[68]  Hervé Jégou,et al.  Negative Evidences and Co-occurences in Image Retrieval: The Benefit of PCA and Whitening , 2012, ECCV.

[69]  Jan-Michael Frahm,et al.  RECON: Scale-adaptive robust estimation via Residual Consensus , 2011, 2011 International Conference on Computer Vision.

[70]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[71]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[72]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).