Fast-BoW: Scaling Bag-of-Visual-Words Generation

The bag-of-visual-words (BoW) generation is a widely used unsupervised feature extraction method for the variety of computer vision applications. However, space and computational complexity of bag-of-visual-words generation increase with an increase in the size of the dataset because of computational complexities involved in underlying algorithms. In this paper, we present Fast-BoW, a scalable method for BoW generation for both hard and soft vector-quantization with time complexities O(|h| log2 k) and O(|h|k), respectively1. We replace the process of finding the closest cluster center with a softmax classifier which improves the cluster boundaries over k-means and also can be used for both hard and soft BoW encoding. To make the model compact and faster, we quantize the real weights into integer weights which can be represented using few bits (2−8) only. Also, on the quantized weights, we apply the hashing to reduce the number of multiplications which makes the process further faster. We evaluated the proposed approach on several public benchmark datasets. The experimental results outperform the existing hierarchical clustering tree-based approach by ≈ 12 times.

[1]  C. Krishna Mohan,et al.  Graph formulation of video activities for abnormal activity recognition , 2017, Pattern Recognit..

[2]  Ji Zhao,et al.  Feature and Region Selection for Visual Learning , 2014, IEEE Transactions on Image Processing.

[3]  C. Krishna Mohan,et al.  Distributed quadratic programming solver for kernel SVM using genetic algorithm , 2016, 2016 IEEE Congress on Evolutionary Computation (CEC).

[4]  Dinesh Singh,et al.  Deep Spatio-Temporal Representation for Detection of Road Accidents Using Stacked Autoencoder , 2019, IEEE Transactions on Intelligent Transportation Systems.

[5]  S. Graf,et al.  Foundations of Quantization for Probability Distributions , 2000 .

[6]  Sei-ichiro Kamata,et al.  Efficient keypoint detection and description using filter kernel decomposition in scale space , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[7]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[8]  C. Krishna Mohan,et al.  DiP-SVM : Distribution Preserving Kernel Support Vector Machine for Big Data , 2017, IEEE Transactions on Big Data.

[9]  Lei Wang,et al.  A Generalized Probabilistic Framework for Compact Codebook Creation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[11]  Miroslaw Bober,et al.  Improving Large-Scale Image Retrieval Through Robust Aggregation of Local Descriptors , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Narciso García,et al.  Visual Face Recognition Using Bag of Dense Derivative Depth Patterns , 2016, IEEE Signal Processing Letters.

[13]  Chong-Wah Ngo,et al.  Fast Covariant VLAD for Image Search , 2016, IEEE Transactions on Multimedia.

[14]  Dorian Gálvez-López,et al.  Bags of Binary Words for Fast Place Recognition in Image Sequences , 2012, IEEE Transactions on Robotics.

[15]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[16]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[17]  C. Krishna Mohan,et al.  Automatic detection of bike-riders without helmet using surveillance videos in real-time , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[18]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[19]  Yuefeng Ji,et al.  Contextual Bag-of-Words for Robust Visual Tracking , 2018, IEEE Transactions on Image Processing.

[20]  Koichi Shinoda,et al.  Fast Coding of Feature Vectors Using Neighbor-to-Neighbor Search , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Sanjoy Dasgupta,et al.  Random projection trees for vector quantization , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[22]  Prateek Jain,et al.  Learning Mixture of Gaussians with Streaming Data , 2017, NIPS.