Enhanced bag of visual words representations for content based image retrieval: a comparative study

The exponential growth of digital image data poses numerous open problems to computer vision researchers. In this regard, designing an efficient and more accurate mechanism that finds and retrieve desired images from large repositories is of greater importance. To this end, various types of content based image retrieval (CBIR) systems have been developed. A typical CBIR system enables the search and retrieval of desired images from large databases that are similar to a given query image by means of automatically extracted visual features from image pixels. In CBIR domain, the bag of visual words (BoVW) model is one of the most widely used feature representation scheme and there exist a number of image retrieval frameworks based on BoVW model. It has been observed that most of them demonstrated promising results for the task of medium and large scale image retrieval. However, image retrieval literature lacks a comparative evaluation of these extended BoVW formulations. To this end, this paper aims to categorize and evaluate the existing BoVW model based formulations for the task of content based image retrieval. The commonly used datasets and the evaluation metrics to assess the retrieval effectiveness of these existing models are discussed. Moreover, quantitative evaluation of state of the art image retrieval systems based on BoVW model is also provided. Finally, certain promising directions for future research are proposed on the basis of the existing models and the demand from real-world.

[1]  Mathias Lux,et al.  Bag of visual words revisited: an exploratory study on robust image retrieval exploiting fuzzy codebooks , 2010, MDMKDD '10.

[2]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[3]  Bipin C. Desai,et al.  A unified image retrieval framework on local visual and semantic concept-based feature spaces , 2009, J. Vis. Commun. Image Represent..

[4]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[5]  Ning Zhou,et al.  Jointly Learning Visually Correlated Dictionaries for Large-Scale Visual Recognition Applications. , 2014, IEEE transactions on pattern analysis and machine intelligence.

[6]  Tomás Pajdla,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Gang Hua,et al.  Weakly Supervised Visual Dictionary Learning by Harnessing Image Attributes , 2014, IEEE Transactions on Image Processing.

[8]  Peng Li,et al.  Correlated PLSA for Image Clustering , 2011, MMM.

[9]  David Salesin,et al.  Fast multiresolution image querying , 1995, SIGGRAPH.

[10]  Paul Clough,et al.  The IAPR TC-12 Benchmark: A New Evaluation Resource for Visual Information Systems , 2006 .

[11]  Tieniu Tan,et al.  Salient coding for image classification , 2011, CVPR 2011.

[12]  Richa Singh,et al.  Greedy Deep Dictionary Learning , 2016, ArXiv.

[13]  Joel A. Tropp,et al.  Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit , 2007, IEEE Transactions on Information Theory.

[14]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[15]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Agma J. M. Traina,et al.  From Bag-of-Visual-Words to Bag-of-Visual-Phrases Using n-Grams , 2013, 2013 XXVI Conference on Graphics, Patterns and Images.

[17]  Stéphane Mallat,et al.  Bandelet Image Approximation and Compression , 2005, Multiscale Model. Simul..

[18]  Nir Friedman,et al.  Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning , 2009 .

[19]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Changhu Wang,et al.  Spatial-bag-of-features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Wei Li,et al.  Pachinko allocation: DAG-structured mixture models of topic correlations , 2006, ICML.

[22]  Jian Sun,et al.  Sparse-Coded Features for Image Retrieval , 2013, BMVC.

[23]  Bart Thomee,et al.  New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative , 2010, MIR '10.

[24]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[25]  Anastasios Tefas,et al.  Learning Bag-of-Features Pooling for Deep Convolutional Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[27]  Peter V. Gehler,et al.  The rate adapting poisson model for information retrieval and object recognition , 2006, ICML.

[28]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[29]  Jenny Benois-Pineau,et al.  Multi-layer Local Graph Words for Object Recognition , 2012, MMM.

[30]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[31]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Frédéric Jurie,et al.  Sampling Strategies for Bag-of-Features Image Classification , 2006, ECCV.

[33]  Shiliang Zhang,et al.  Learning attribute-aware dictionary for image classification and search , 2013, ICMR.

[34]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[35]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[36]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[37]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[38]  C. Krishna Mohan,et al.  Content based medical image retrieval using dictionary learning , 2015, Neurocomputing.

[39]  Ricardo da Silva Torres,et al.  Visual word spatial arrangement for image retrieval and classification , 2014, Pattern Recognit..

[40]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[41]  Lei Wang,et al.  In defense of soft-assignment coding , 2011, 2011 International Conference on Computer Vision.

[42]  Jing-Yu Yang,et al.  Content-based image retrieval using computational visual attention model , 2015, Pattern Recognit..

[43]  Kjersti Engan,et al.  Method of optimal directions for frame design , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[44]  Jianping Fan,et al.  Jointly Learning Visually Correlated Dictionaries for Large-Scale Visual Recognition Applications , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[47]  Greg C. Lee,et al.  Probabilistic semantic component descriptor , 2011, Multimedia Tools and Applications.

[48]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[49]  R. Lienhart,et al.  Correlated Topic Models for Image Retrieval , 2008 .

[50]  Wei Zhou,et al.  Face Recognition with Learned Local Curvelet Patterns and 2-Directional L1-Norm Based 2DPCA , 2012, ACCV Workshops.

[51]  Yamina Tlili Guiyassa,et al.  Image Indexing and Retrieval with Pachinko Allocation Model: Application on Local and Global Features , 2012, PKAW.

[52]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[53]  Guojun Lu,et al.  Evaluation of similarity measurement for image retrieval , 2003, International Conference on Neural Networks and Signal Processing, 2003. Proceedings of the 2003.

[54]  Tieniu Tan,et al.  Group encoding of local features in image classification , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[55]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[56]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[57]  Jean-Marc Odobez,et al.  A Thousand Words in a Scene , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Yihong Gong,et al.  Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.

[59]  Eva Hörster,et al.  Topic models for image retrieval on large-scale databases , 2009, ACMMR.

[60]  Rainer Lienhart,et al.  Multilayer pLSA for multimodal image retrieval , 2009, CIVR '09.

[61]  David Zhang,et al.  Fisher Discrimination Dictionary Learning for sparse representation , 2011, 2011 International Conference on Computer Vision.

[62]  Geoffrey E. Hinton,et al.  Replicated Softmax: an Undirected Topic Model , 2009, NIPS.

[63]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[64]  Deborah Richards,et al.  Proceedings of the 12th Pacific Rim conference on Knowledge Management and Acquisition for Intelligent Systems , 2012 .

[65]  Tieniu Tan,et al.  Feature Coding in Image Classification: A Comprehensive Study , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[66]  Kjersti Engan,et al.  Recursive Least Squares Dictionary Learning Algorithm , 2010, IEEE Transactions on Signal Processing.