论文信息 - FAemb: A function approximation-based embedding method for image retrieval

FAemb: A function approximation-based embedding method for image retrieval

The objective of this paper is to design an embedding method mapping local features describing image (e.g. SIFT) to a higher dimensional representation used for image retrieval problem. By investigating the relationship between the linear approximation of a nonlinear function in high dimensional space and state-of-the-art feature representation used in image retrieval, i.e., VLAD, we first introduce a new approach for the approximation. The embedded vectors resulted by the function approximation process are then aggregated to form a single representation used in the image retrieval framework. The evaluation shows that our embedding method gives a performance boost over the state of the art in image retrieval, as demonstrated by our experiments on the standard public image retrieval benchmarks.

[1] David J. Field,et al. Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[2] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[3] Naila Murray,et al. Generalized Max Pooling , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4] Tong Zhang,et al. Improved Local Coordinate Coding using Local Tangents , 2010, ICML.

[5] Cordelia Schmid,et al. Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6] Hervé Jégou,et al. Negative Evidences and Co-occurences in Image Retrieval: The Benefit of PCA and Whitening , 2012, ECCV.

[7] Andrew Zisserman,et al. Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8] David Picard,et al. Web-Scale Image Retrieval Using Compact Tensor Aggregation of Visual Descriptors , 2013, IEEE MultiMedia.

[9] Andrew Zisserman,et al. Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10] Jean Ponce,et al. A Theoretical Analysis of Feature Pooling in Visual Recognition , 2010, ICML.

[11] Yihong Gong,et al. Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.

[12] Rajat Raina,et al. Efficient sparse coding algorithms , 2006, NIPS.

[13] Yihong Gong,et al. Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14] Andrew Zisserman,et al. All About VLAD , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15] Cordelia Schmid,et al. Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[16] David Picard,et al. Improving image similarity with vectors of locally aggregated tensors , 2011, 2011 18th IEEE International Conference on Image Processing.

[17] Florent Perronnin,et al. Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[18] Tunc Geveci,et al. Advanced Calculus , 2014, Nature.

[19] Thomas F. Coleman,et al. An Interior Trust Region Approach for Nonlinear Minimization Subject to Bounds , 1993, SIAM J. Optim..

[20] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[21] Yihong Gong,et al. Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[22] Michael Isard,et al. Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[23] Michael Isard,et al. Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[24] Thomas S. Huang,et al. Image Classification Using Super-Vector Coding of Local Image Descriptors , 2010, ECCV.

[25] Thomas F. Coleman,et al. On the convergence of interior-reflective Newton methods for nonlinear minimization subject to bounds , 1994, Math. Program..

[26] Patrick Pérez,et al. Revisiting the VLAD image representation , 2013, ACM Multimedia.

[27] Georges Quénot,et al. Descriptor optimization for multimedia indexing and retrieval , 2013, Multimedia Tools and Applications.

[28] C. Schmid,et al. On the burstiness of visual elements , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[29] Andrew Zisserman,et al. Triangulation Embedding and Democratic Aggregation for Image Search , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[31] Thomas Mensink,et al. Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[32] Cordelia Schmid,et al. Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[33] Cordelia Schmid,et al. Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.