Accurate Aggregation of Local Features by using K-sparse Autoencoder for 3D Model Retrieval

Aggregating a set of local features has been used widely to realize recognition or retrieval of multimedia data including 2D images and 3D models. A number of feature aggregation algorithms (e.g., Bag-of-Features, Locality-constrained Linear coding, or Fisher Vector coding) have been proposed. They first learn a codebook, or a set of codewords, by clustering the local features and then encode these local features by using the learned codebook. Despite the great success of these feature aggregation algorithms, we argue that they are not necessarily optimal in terms of accuracy since their codebook learning and feature encoding are computed separately. In this paper, we propose two novel feature aggregation algorithms based on k-Sparse Autoencoder (kSA) that realize more accurate local feature aggregation. Our proposed algorithms, called Database-adaptive kSA (DkSA) aggregation and Per-data-adaptive kSA (PkSA) aggregation, jointly optimize codebook learning and feature encoding. In addition, the kSA-based feature encoding enhances saliency of local features due to k-sparseness constraints and non-negativity constraints. Of the two proposed algorithms, the PkSA aggregation exploits reconstruction error of a local feature derived from the kSA for more accurate aggregated feature. Experimental evaluation using a shape-based 3D model retrieval scenario showed that the retrieval accuracy of our proposed algorithms are superior to the existing feature aggregation algorithms we have compared against.

[1]  Karthik Ramani,et al.  Developing an engineering shape benchmark for CAD models , 2006, Comput. Aided Des..

[2]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[3]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[5]  Ryutarou Ohbuchi,et al.  Diffusion-on-Manifold Aggregation of Local Features for Shape-based 3D Model Retrieval , 2015, ICMR.

[6]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[7]  Terrence J. Sejnowski,et al.  Edges are the Independent Components of Natural Scenes , 1996, NIPS.

[8]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[9]  Ryutarou Ohbuchi,et al.  Fusing Multiple Features for Shape-based 3D Model Retrieval , 2014, BMVC.

[10]  Brendan J. Frey,et al.  k-Sparse Autoencoders , 2013, ICLR.

[11]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[12]  Dimitris Achlioptas,et al.  Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..

[13]  Tieniu Tan,et al.  Feature Coding in Image Classification: A Comprehensive Study , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Mohammed Bennamoun,et al.  Rotational Projection Statistics for 3D Local Surface Description and Object Recognition , 2013, International Journal of Computer Vision.

[15]  Thomas S. Huang,et al.  Image Classification Using Super-Vector Coding of Local Image Descriptors , 2010, ECCV.

[16]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  Ryutarou Ohbuchi,et al.  Non-rigid 3D Model Retrieval Using Set of Local Statistical Features , 2012, 2012 IEEE International Conference on Multimedia and Expo Workshops.

[19]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[20]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Lei Wang,et al.  In defense of soft-assignment coding , 2011, 2011 International Conference on Computer Vision.

[22]  Thomas A. Funkhouser,et al.  The Princeton Shape Benchmark , 2004, Proceedings Shape Modeling Applications, 2004..

[23]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[24]  Andrew E. Johnson,et al.  Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[26]  Paul Suetens,et al.  SHREC '11 Track: Shape Retrieval on Non-rigid 3D Watertight Meshes , 2011, 3DOR@Eurographics.

[27]  Bin Fang,et al.  Large Scale Comprehensive 3D Shape Retrieval , 2014, 3DOR@Eurographics.

[28]  Ryutarou Ohbuchi,et al.  Dense sampling and fast encoding for 3D model retrieval using bag-of-visual features , 2009, CIVR '09.