BIG-OH: BInarization of gradient orientation histograms

Extracting local keypoints and keypoint descriptions from images is a primary step for many computer vision and image retrieval applications. In the literature, many researchers have proposed methods for representing local texture around keypoints with varying levels of robustness to photometric and geometric transformations. Gradient-based descriptors such as the Scale Invariant Feature Transform (SIFT) are among the most consistent and robust descriptors. The SIFT descriptor, a 128-element vector consisting of multiple gradient histograms computed from local image patches around a keypoint, is widely considered as the gold standard keypoint descriptor. However, SIFT descriptors require at least 128bytes of storage per descriptor. Since images are typically described by thousands of keypoints, it may require more space to store the SIFT descriptors for an image than the original image itself. This may be prohibitive in extremely large-scale applications and applications on memory-constrained devices such as tablets and smartphones. In this paper, with the goal of reducing the memory requirements of keypoint descriptors such as SIFT, without affecting their performance, we propose BIG-OH, a simple yet extremely effective method for binary quantization of any descriptor based on gradient orientation histograms. BIG-OH's memory requirements are very small-when it uses SIFT's default parameters for the construction of the gradient orientation histograms, it only requires 16bytes per descriptor. BIG-OH quantizes gradient orientation histograms by computing a bit vector representing the relative magnitudes of local gradients associated with neighboring orientation bins. In a series of experiments on keypoint matching with different types of keypoint detectors under various photometric and geometric transformations, we find that the quantized descriptor has performance comparable to or better than other descriptors, including BRISK, CARD, BRIEF, D-BRIEF, SQ, and PCA-SIFT. Our experiments also show that BIG-OH is extremely effective for image retrieval, with modestly better performance than SIFT. BIG-OH's drastic reduction in memory requirements, obtained while preserving or improving the image matching and image retrieval performance of SIFT, makes it an excellent descriptor for large image databases and applications running on memory-constrained devices. BIG-OH, binary quantization of gradient orientation based descriptors, is proposed.Quantized SIFT descriptors reduce memory by 88% compared to classical SIFT.BIG-OH has performance comparable to SIFT and GLOH.BIG-OH has better performance than BRISK, CARD, BRIEF, and other descriptors.BIG-OH is effective for large scale applications such as copy detection.

[1]  Qi Tian,et al.  Spatial coding for large scale partial-duplicate web image search , 2010, ACM Multimedia.

[2]  Tony Lindeberg,et al.  Shape-adapted smoothing in estimation of 3-D shape cues from affine deformations of local 2-D brightness structure , 1997, Image Vis. Comput..

[3]  Adam Baumberg,et al.  Reliable feature matching across widely separated views , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[4]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Cordelia Schmid,et al.  An Image-Based Approach to Video Copy Detection With Spatio-Temporal Post-Filtering , 2010, IEEE Transactions on Multimedia.

[6]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Xianglong Liu,et al.  Multiple feature kernel hashing for large-scale visual search , 2014, Pattern Recognit..

[8]  Vincent Lepetit,et al.  DAISY: An Efficient Dense Descriptor Applied to Wide-Baseline Stereo , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Sidney Fels,et al.  Local Image Descriptors Using Supervised Kernel ICA , 2009 .

[10]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[11]  Junaid Baber,et al.  Video segmentation into scenes using entropy and SURF , 2011, 2011 7th International Conference on Emerging Technologies.

[12]  Pascal Fua,et al.  LDAHash: Improved Matching with Smaller Descriptors , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Hefei Ling,et al.  Efficient Image Copy Detection Using Multiscale Fingerprints , 2012, IEEE MultiMedia.

[14]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[15]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Junaid Baber,et al.  Shot boundary detection from videos using entropy and local descriptor , 2011, 2011 17th International Conference on Digital Signal Processing (DSP).

[17]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[18]  Qi Tian,et al.  Visual word expansion and BSIFT verification for large-scale image search , 2013, Multimedia Systems.

[19]  Jan-Michael Frahm,et al.  Comparative Evaluation of Binary Features , 2012, ECCV.

[20]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[21]  Sidney S. Fels,et al.  Local Image Descriptors Using Supervised Kernel ICA , 2009, PSIVT.

[22]  Natasha Gelfand,et al.  SURFTrac: Efficient tracking and continuous object recognition using local feature descriptors , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Matti Pietikäinen,et al.  View-based recognition of real-world textures , 2004, Pattern Recognit..

[24]  Marko Heikkilä,et al.  Description of interest regions with local binary patterns , 2009, Pattern Recognit..

[25]  Shih-Fu Chang,et al.  Semi-Supervised Hashing for Large-Scale Search , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[29]  Shin'ichi Satoh,et al.  A Framework for Video Segmentation using Global and Local Features , 2013, Int. J. Pattern Recognit. Artif. Intell..

[30]  Marko Heikkilä,et al.  A texture-based method for modeling the background and detecting moving objects , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[32]  WangJun,et al.  Semi-Supervised Hashing for Large-Scale Search , 2012 .

[33]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[34]  Qi Tian,et al.  Scalar quantization for large scale image search , 2012, ACM Multimedia.

[35]  Rongrong Ji,et al.  Supervised hashing with kernels , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Vincent Lepetit,et al.  Efficient Discriminative Projections for Compact Binary Descriptors , 2012, ECCV.

[37]  Yuichi Yoshida,et al.  CARD: Compact And Real-time Descriptors , 2011, 2011 International Conference on Computer Vision.

[38]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[39]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[41]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[42]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[43]  Shuicheng Yan,et al.  An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[44]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[45]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[46]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[47]  Dima Damen,et al.  Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[49]  Richard Szeliski,et al.  Skeletal graphs for efficient structure from motion , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.