SIFTpack: A Compact Representation for Efficient SIFT Matching

Computing distances between large sets of SIFT descriptors is a basic step in numerous algorithms in computer vision. When the number of descriptors is large, as is often the case, computing these distances can be extremely time consuming. In this paper we propose the SIFT pack: a compact way of storing SIFT descriptors, which enables significantly faster calculations between sets of SIFTs than the current solutions. SIFT pack can be used to represent SIFTs densely extracted from a single image or sparsely from multiple different images. We show that the SIFT pack representation saves both storage space and run time, for both finding nearest neighbors and for computing all distances between all descriptors. The usefulness of SIFT pack is also demonstrated as an alternative implementation for K-means dictionaries of visual words.

[1]  Kk Umesh,et al.  Web Image Retrieval Using Visual Dictionary , 2012 .

[2]  Y. C. Pati,et al.  Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[3]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4]  Vidya R. Khapli,et al.  Image retrieval for compressed and uncompressed images , 2009, 2009 7th International Conference on Information, Communications and Signal Processing (ICICS).

[5]  Olga Sorkine-Hornung,et al.  A comparative study of image retargeting , 2010, ACM Trans. Graph..

[6]  Kenneth Rose,et al.  Constrained-storage vector quantization with a universal codebook , 1998, IEEE Trans. Image Process..

[7]  Eli Shechtman,et al.  PatchMatch: a randomized correspondence algorithm for structural image editing , 2009, ACM Trans. Graph..

[8]  Jian Sun,et al.  Computing nearest-neighbor fields via Propagation-Assisted KD-Trees , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Denis Simakov,et al.  Summarizing visual data using bidirectional similarity , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[11]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[12]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Franklin C. Crow,et al.  Summed-area tables for texture mapping , 1984, SIGGRAPH.

[14]  Vincent Lepetit,et al.  A fast local descriptor for dense matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[17]  Jean Ponce,et al.  Discriminative clustering for image co-segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Adam Finkelstein,et al.  The Generalized PatchMatch Correspondence Algorithm , 2010, ECCV.

[19]  Gang Hua,et al.  Picking the best DAISY , 2009, CVPR.

[20]  Vincent Lepetit,et al.  Boosting Binary Keypoint Descriptors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Luc Van Gool,et al.  Dense matching of multiple wide-baseline views , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[22]  B. Janet,et al.  Image index model for retrieval using Hausdorff distortion , 2010, 2010 International Conference on Computer Applications and Industrial Electronics.

[23]  Jean Ponce,et al.  Sparse image representation with epitomes , 2011, CVPR 2011.

[24]  Frédéric Jurie,et al.  Sampling Strategies for Bag-of-Features Image Classification , 2006, ECCV.

[25]  Dimitris N. Metaxas,et al.  Spatial and temporal pyramids for grammatical expression recognition of American sign language , 2009, Assets '09.

[26]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[28]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[29]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[30]  Shai Avidan,et al.  Coherency Sensitive Hashing , 2011, ICCV.

[31]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[32]  Jian Yao,et al.  3D modeling and rendering from multiple wide-baseline images by match propagation , 2006, Signal Process. Image Commun..

[33]  Jean Ponce,et al.  Multi-class cosegmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Vincent Lepetit,et al.  BRIEF: Computing a Local Binary Descriptor Very Fast , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[36]  Ankur Agarwal,et al.  Hyperfeatures - Multilevel Local Coding for Visual Recognition , 2006, ECCV.

[37]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[38]  Pascal Fua,et al.  LDAHash: Improved Matching with Smaller Descriptors , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Pierre Vandergheynst,et al.  FREAK: Fast Retina Keypoint , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Michael Elad,et al.  Sparse and Redundant Modeling of Image Content Using an Image-Signature-Dictionary , 2008, SIAM J. Imaging Sci..

[41]  Masahide Kaneko,et al.  Robust Face Recognition Using Block-Based Bag of Words , 2010, 2010 20th International Conference on Pattern Recognition.

[42]  SchmidCordelia,et al.  A Performance Evaluation of Local Descriptors , 2005 .

[43]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[45]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[46]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, CVPR 2004.

[47]  Shai Avidan,et al.  TreeCANN - k-d Tree Coherence Approximate Nearest Neighbor Algorithm , 2012, ECCV.

[48]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[49]  Michal Irani,et al.  What Is a Good Image Segment? A Unified Approach to Segment Extraction , 2008, ECCV.