Supervised learning of bag‐of‐features shape descriptors using sparse coding

We present a method for supervised learning of shape descriptors for shape retrieval applications. Many content‐based shape retrieval approaches follow the bag‐of‐features (BoF) paradigm commonly used in text and image retrieval by first computing local shape descriptors, and then representing them in a ‘geometric dictionary’ using vector quantization. A major drawback of such approaches is that the dictionary is constructed in an unsupervised manner using clustering, unaware of the last stage of the process (pooling of the local descriptors into a BoF, and comparison of the latter using some metric). In this paper, we replace the clustering with dictionary learning, where every atom acts as a feature, followed by sparse coding and pooling to get the final BoF descriptor. Both the dictionary and the sparse codes can be learned in the supervised regime via bi‐level optimization using a task‐specific objective that promotes invariance desired in the specific application. We show significant performance improvement on several standard shape retrieval benchmarks.

[1]  Thomas A. Funkhouser,et al.  Shape-based retrieval and analysis of 3d models , 2005, CACM.

[2]  Bülent Sankur,et al.  Similarity Learning for 3D Object Retrieval Using Relevance Feedback and Risk Minimization , 2010, International Journal of Computer Vision.

[3]  Patrice Marcotte,et al.  An overview of bilevel optimization , 2007, Ann. Oper. Res..

[4]  Masayuki Nakajima,et al.  Supervised Learning of Similarity Measures for Content-Based 3D Model Retrieval , 2008, LKR.

[5]  Daniel Cohen-Or,et al.  Structure-aware shape processing , 2013, Eurographics.

[6]  Jean Ponce,et al.  Task-Driven Dictionary Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[8]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  Karl Skretting,et al.  Texture Classification Using Sparse Frame-Based Representations , 2006, EURASIP J. Adv. Signal Process..

[10]  A. Ben Hamza,et al.  A multiresolution descriptor for deformable 3D shape retrieval , 2013, The Visual Computer.

[11]  Andrea Fusiello,et al.  The bag of words approach for retrieval and categorization of 3D objects , 2010, The Visual Computer.

[12]  Thomas A. Funkhouser,et al.  Shape-based retrieval and analysis of 3D models , 2004, SIGGRAPH '04.

[13]  Guillermo Sapiro,et al.  Sparse similarity-preserving hashing , 2013, ICLR.

[14]  Svetlana Lazebnik,et al.  Supervised Learning of Quantizer Codebooks by Information Loss Minimization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Kjersti Engan,et al.  Method of optimal directions for frame design , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[16]  Bo Li,et al.  Shape Retrieval of Non-Rigid 3D Human Models , 2014, 3DOR@Eurographics.

[17]  Guillermo Sapiro,et al.  Discriminative learned dictionaries for local image analysis , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Guillaume Lavoué,et al.  Combination of bag-of-words descriptors for robust partial shape retrieval , 2012, The Visual Computer.

[19]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[20]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[21]  Ligang Liu,et al.  Co‐Segmentation of 3D Shapes via Subspace Clustering , 2012, Comput. Graph. Forum.

[22]  Xiaogang Wang,et al.  Learning Semantic Signatures for 3D Object Retrieval , 2013, IEEE Transactions on Multimedia.

[23]  N. Mitra,et al.  Exploration of continuous variability in collections of 3D shapes , 2011, SIGGRAPH 2011.

[24]  Leonidas J. Guibas,et al.  Fine-grained semi-supervised labeling of large shape collections , 2013, ACM Trans. Graph..

[25]  Leonidas J. Guibas,et al.  Shape google: Geometric words and expressions for invariant shape retrieval , 2011, TOGS.

[26]  HofmannThomas Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2001 .

[27]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[28]  Yi Liu,et al.  Shape Topics: A Compact Representation and New Algorithms for 3D Partial Shape Retrieval , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[29]  Arnold W. M. Smeulders,et al.  Fine-Grained Categorization by Alignments , 2013, 2013 IEEE International Conference on Computer Vision.

[30]  Yosi Keller,et al.  Scale-Invariant Features for 3-D Mesh Models , 2012, IEEE Transactions on Image Processing.

[31]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[32]  Iasonas Kokkinos,et al.  Scale-invariant heat kernel signatures for non-rigid shape recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  Ulrich Pinkall,et al.  Computing Discrete Minimal Surfaces and Their Conjugates , 1993, Exp. Math..

[34]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[35]  Leonidas J. Guibas,et al.  A concise and provably informative multi-scale signature based on heat diffusion , 2009 .

[36]  A. Ben Hamza,et al.  Intrinsic spatial pyramid matching for deformable 3D shape retrieval , 2013, International Journal of Multimedia Information Retrieval.

[37]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[38]  Thomas S. Huang,et al.  A Max-Margin Perspective on Sparse Representation-Based Classification , 2013, 2013 IEEE International Conference on Computer Vision.

[39]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[40]  Nebojsa Jojic,et al.  Multidimensional counting grids: Inferring word order from disordered bags of words , 2011, UAI.

[41]  Paul Suetens,et al.  A comparison of methods for non-rigid 3D shape retrieval , 2013, Pattern Recognit..

[42]  Yi Liu,et al.  Learning Robust Similarity Measures for 3D Partial Shape Retrieval , 2010, International Journal of Computer Vision.

[43]  Guillermo Sapiro,et al.  Learning Efficient Sparse and Low Rank Models , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[45]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[46]  Afzal Godil,et al.  CM-BOF: visual similarity-based 3D shape retrieval using Clock Matching and Bag-of-Features , 2013, Machine Vision and Applications.

[47]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[48]  Umberto Castellani,et al.  Local Signature Quantization by Sparse Coding , 2013, 3DOR@Eurographics.

[49]  Andrea Giachetti,et al.  Radial Symmetry Detection and Shape Characterization with the Multiscale Area Projection Transform , 2012, Comput. Graph. Forum.

[50]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[51]  Yizhou Yu,et al.  Fast nonrigid 3D retrieval using modal space transform , 2013, ICMR.

[52]  Saturnino Maldonado-Bascón,et al.  Evaluating 3D spatial pyramids for classifying 3D shapes , 2013, Comput. Graph..