Optimizing visual dictionaries for effective image retrieval

Characterizing images by high-level concepts from a learned visual dictionary is extensively used in image classification and retrieval. This paper deals with inferring discriminative visual dictionaries for effective image retrieval and examines a non-negative visual dictionary learning scheme towards this direction. More specifically, a non-negative matrix factorization framework with $$\ell _0$$ℓ0-sparseness constraint on the coefficient matrix for optimizing the dictionary is proposed. It is a two-step iterative process composed of sparse encoding and dictionary enhancement stages. An initial estimate of the visual dictionary is updated in each iteration with the proposed $$\ell _0$$ℓ0-constraint gradient projection algorithm. A desirable attribute of this formulation is an adaptive sequential dictionary initialization procedure. This leads to a sharp drop down of the approximation error and a faster convergence. Finally, the proposed dictionary optimization scheme is used to derive a compact image representation for the retrieval task. A new image signature is obtained by projecting local descriptors on to the basis elements of the optimized visual dictionary and then aggregating the resulting sparse encodings in to a single feature vector. Experimental results on various benchmark datasets show that the proposed system can infer enhanced visual dictionaries and the derived image feature vector can achieve better retrieval results as compared to state-of-the-art techniques.

[1]  Andrew Zisserman,et al.  Near Duplicate Image Detection: min-Hash and tf-idf Weighting , 2008, BMVC.

[2]  David Salesin,et al.  Fast multiresolution image querying , 1995, SIGGRAPH.

[3]  Antonio Torralba,et al.  Small codes and large image databases for recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Tony Lindeberg,et al.  Feature Detection with Automatic Scale Selection , 1998, International Journal of Computer Vision.

[5]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  Tsuhan Chen,et al.  Image retrieval with geometry-preserving visual phrases , 2011, CVPR 2011.

[7]  Laura Rebollo-Neira Dictionary redundancy elimination , 2004 .

[8]  Y. C. Pati,et al.  Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[9]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[10]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[11]  Richard S. Zemel,et al.  Learning Parts-Based Representations of Data , 2006, J. Mach. Learn. Res..

[12]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Michael W. Spratling Learning Image Components for Object Recognition , 2006, J. Mach. Learn. Res..

[14]  Moustapha Kardouchi,et al.  Improving Bag of Visual Words Image Retrieval: A Fuzzy Weighting Scheme for Efficient Indexation , 2009, 2009 Fifth International Conference on Signal Image Technology and Internet Based Systems.

[15]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[17]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[18]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[19]  Frédéric Jurie,et al.  Sampling Strategies for Bag-of-Features Image Classification , 2006, ECCV.

[20]  Hideyuki Tamura,et al.  Textural Features Corresponding to Visual Perception , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[21]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[22]  Florent Perronnin,et al.  Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Franz Pernkopf,et al.  Sparse nonnegative matrix factorization with ℓ0-constraints , 2012, Neurocomputing.

[24]  M. N. Vartak On an Application of Kronecker Product of Matrices to Statistical Designs , 1955 .

[25]  Andrew Zisserman,et al.  The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[26]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[27]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[28]  Michael Elad,et al.  K-SVD and its non-negative variant for dictionary design , 2005, SPIE Optics + Photonics.

[29]  Bhaskar D. Rao,et al.  Sparse signal reconstruction from limited data using FOCUSS: a re-weighted minimum norm algorithm , 1997, IEEE Trans. Signal Process..

[30]  Reza Boostani,et al.  An Efficient Initialization Method for Nonnegative Matrix Factorization , 2011 .

[31]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[32]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[33]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[34]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[35]  Jingdong Wang,et al.  Online Robust Non-negative Dictionary Learning for Visual Tracking , 2013, 2013 IEEE International Conference on Computer Vision.

[36]  Michael W. Berry,et al.  Algorithms and applications for approximate nonnegative matrix factorization , 2007, Comput. Stat. Data Anal..

[37]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, CVPR 2004.

[38]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[39]  Stéphane Mallat,et al.  Bandelet Image Approximation and Compression , 2005, Multiscale Model. Simul..

[40]  Ye Zhao,et al.  Image matching by fast random sample consensus , 2013, ICIMCS '13.

[41]  Vincent Lepetit,et al.  DAISY: An Efficient Dense Descriptor Applied to Wide-Baseline Stereo , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Wei Zhou,et al.  Face Recognition with Learned Local Curvelet Patterns and 2-Directional L1-Norm Based 2DPCA , 2012, ACCV Workshops.

[43]  Aline Roumy,et al.  K-WEB: Nonnegative dictionary learning for sparse image representations , 2013, 2013 IEEE International Conference on Image Processing.

[44]  Michael Shneier,et al.  Exploiting the JPEG Compression Scheme for Image Retrieval , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  Satoshi Nakamura,et al.  Cluster-based language model for spoken document retrieval using NMF-based document clustering , 2010, INTERSPEECH.

[46]  James A. Cadzow Minimum l1, l2, and l∞ Norm Approximate Solutions to an Overdetermined System of Linear Equations , 2002, Digit. Signal Process..

[47]  Yasuo Kuniyoshi,et al.  Dense Sampling Low-Level Statistics of Local Features , 2010 .

[48]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[49]  Guojun Lu,et al.  A novel image retrieval technique based on vector quantization , 1999 .

[50]  L. Armijo Minimization of functions having Lipschitz continuous first partial derivatives. , 1966 .

[51]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[52]  Terrence J. Sejnowski,et al.  Learning Overcomplete Representations , 2000, Neural Computation.

[53]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[54]  Feng Qianjin,et al.  Projected gradient methods for Non-negative Matrix Factorization based relevance feedback algorithm in medical image retrieval , 2011 .

[55]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[56]  Marian Stewart Bartlett,et al.  Face recognition by independent component analysis , 2002, IEEE Trans. Neural Networks.

[57]  C. D. Meyer,et al.  Initializations for the Nonnegative Matrix Factorization , 2006 .

[58]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[59]  Mark D. Plumbley,et al.  Fast Dictionary Learning for Sparse Representations of Speech Signals , 2011, IEEE Journal of Selected Topics in Signal Processing.

[60]  Hyunsoo Kim,et al.  Nonnegative Matrix Factorization Based on Alternating Nonnegativity Constrained Least Squares and Active Set Method , 2008, SIAM J. Matrix Anal. Appl..