Fast Coding of Feature Vectors Using Neighbor-to-Neighbor Search

Searching for matches to high-dimensional vectors using hard/soft vector quantization is the most computationally expensive part of various computer vision algorithms including the bag of visual word (BoW). This paper proposes a fast computation method, Neighbor-to-Neighbor (NTN) search [1], which skips some calculations based on the similarity of input vectors. For example, in image classification using dense SIFT descriptors, the NTN search seeks similar descriptors from a point on a grid to an adjacent point. Applications of the NTN search to vector quantization, a Gaussian mixture model, sparse coding, and a kernel codebook for extracting image or video representation are presented in this paper. We evaluated the proposed method on image and video benchmarks: the PASCAL VOC 2007 Classification Challenge and the TRECVID 2010 Semantic Indexing Task. NTN-VQ reduced the coding cost by 77.4 percent, and NTN-GMM reduced it by 89.3 percent, without any significant degradation in classification performance.

[1]  Hairong Qi,et al.  Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps , 2013, 2013 IEEE International Conference on Computer Vision.

[2]  Cees Snoek,et al.  VideoStory: A New Multimedia Embedding for Few-Example Recognition and Translation of Events , 2014, ACM Multimedia.

[3]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[4]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[6]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[7]  Christoph H. Lampert,et al.  Deep Fisher Kernels -- End to End Learning of the Fisher Kernel GMM Parameters , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Thomas S. Huang,et al.  Image Classification Using Super-Vector Coding of Local Image Descriptors , 2010, ECCV.

[9]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[10]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[11]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  Cees G. M. Snoek,et al.  The MediaMill at TRECVID 2013: : Searching concepts, Objects, Instances and events in video , 2013, TRECVID.

[14]  Frédéric Jurie,et al.  Randomized Clustering Forests for Image Classification , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Gabriela Csurka,et al.  Adapted Vocabularies for Generic Visual Categorization , 2006, ECCV.

[16]  Tanaya Guha,et al.  Learning Sparse Representations for Human Action Recognition , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[18]  Arnold W. M. Smeulders,et al.  Real-Time Visual Concept Classification , 2010, IEEE Transactions on Multimedia.

[19]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[20]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[21]  Leonidas J. Guibas,et al.  Exploration of continuous variability in collections of 3D shapes , 2011, ACM Trans. Graph..

[22]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Larry S. Davis,et al.  Exploiting local features from deep networks for image retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[24]  Andrew Zisserman,et al.  The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[25]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[26]  Shuicheng Yan,et al.  An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[27]  Robert F. Sproull,et al.  Refinements to nearest-neighbor searching ink-dimensional trees , 1991, Algorithmica.

[28]  Joseph J. LaViola,et al.  Exploring the Trade-off Between Accuracy and Observational Latency in Action Recognition , 2013, International Journal of Computer Vision.

[29]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[30]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[31]  Stephen M. Omohundro,et al.  Efficient Algorithms with Neural Network Behavior , 1987, Complex Syst..

[32]  Koichi Shinoda,et al.  q-Gaussian mixture models for image and video semantic indexing , 2013, J. Vis. Commun. Image Represent..

[33]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[34]  Koichi Shinoda,et al.  A fast MAP adaptation technique for gmm-supervector-based video semantic indexing systems , 2011, ACM Multimedia.

[35]  Andrew Zisserman,et al.  Deep Fisher Networks for Large-Scale Image Classification , 2013, NIPS.

[36]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Jeffrey K. Uhlmann,et al.  Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..

[38]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[39]  Stephen M. Omohundro,et al.  Five Balltree Construction Algorithms , 2009 .

[40]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[41]  David G. Lowe,et al.  Shape indexing using approximate nearest-neighbour search in high-dimensional spaces , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[42]  Koichi Shinoda,et al.  Neighbor-to-Neighbor Search for Fast Coding of Feature Vectors , 2013, 2013 IEEE International Conference on Computer Vision.

[43]  Richard I. Hartley,et al.  Optimised KD-trees for fast image descriptor matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Marcel Worring,et al.  Concept-Based Video Retrieval , 2009, Found. Trends Inf. Retr..

[45]  Cor J. Veenman,et al.  Kernel Codebooks for Scene Categorization , 2008, ECCV.

[46]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[47]  Anders P. Eriksson,et al.  Fast Convolutional Sparse Coding , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[50]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[52]  Deyu Meng,et al.  Interactive Surveillance Event Detection through Mid-level Discriminative Representation , 2014, ICMR.

[53]  Florent Perronnin,et al.  Fisher vectors meet Neural Networks: A hybrid classification architecture , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Quan Z. Sheng,et al.  Online human gesture recognition from motion data streams , 2013, ACM Multimedia.

[55]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.