Towards effective codebookless model for image classification

The bag-of-features (BoF) model for image classification has been thoroughly studied over the last decade. Different from the widely used BoF methods which model images with a pre-trained codebook, the alternative codebook-free image modeling method, which we call codebookless model (CLM), attracts little attention. In this paper, we present an effective CLM that represents an image with a single Gaussian for classification. By embedding Gaussian manifold into a vector space, we show that the simple incorporation of our CLM into a linear classifier achieves very competitive accuracy compared with state-of-the-art BoF methods (e.g., Fisher Vector). Since our CLM lies in a high-dimensional Riemannian manifold, we further propose a joint learning method of low-rank transformation with support vector machine (SVM) classifier on the Gaussian manifold, in order to reduce computational and storage cost. To study and alleviate the side effect of background clutter on our CLM, we also present a simple yet effective partial background removal method based on saliency detection. Experiments are extensively conducted on eight widely used databases to demonstrate the effectiveness and efficiency of our CLM method. HighlightsWe show that codebookless model (CLM) is a very competitive alternative to the mainstream BoF model.Two well-motivated parameters are introduced to further improve our CLM.A joint low-rank learning and SVM is proposed on Gaussian manifold.The comprehensive experiments demonstrated the promising performance of our CLM.

[1]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[2]  Fang Liu,et al.  Shape of Gaussians as feature descriptors , 2009, CVPR.

[3]  Christoph H. Lampert,et al.  Deep Fisher Kernels -- End to End Learning of the Fisher Kernel GMM Parameters , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Thomas S. Huang,et al.  Image Classification Using Super-Vector Coding of Local Image Descriptors , 2010, ECCV.

[5]  Cristian Sminchisescu,et al.  Free-Form Region Description with Second-Order Pooling , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Iasonas Kokkinos,et al.  Describing Textures in the Wild , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Zhaoquan Cai,et al.  Deep Boosting: Joint feature selection and analysis dictionary learning in hierarchy , 2016, Neurocomputing.

[8]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10]  Wilhelm Burger,et al.  Digital Image Processing - An Algorithmic Introduction using Java , 2008, Texts in Computer Science.

[11]  Takumi Kobayashi,et al.  Dirichlet-Based Histogram Feature Transform for Image Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Jason Weston,et al.  WSABIE: Scaling Up to Large Vocabulary Image Annotation , 2011, IJCAI.

[13]  William K. Pratt,et al.  Digital Image Processing, 4th Edition , 2007, J. Electronic Imaging.

[14]  Nicholas Ayache,et al.  Fast and Simple Calculus on Tensors in the Log-Euclidean Framework , 2005, MICCAI.

[15]  Qilong Wang,et al.  Local Log-Euclidean Covariance Matrix (L2ECM) for Image Representation and Its Applications , 2012, ECCV.

[16]  Dieter Fox,et al.  Multipath Sparse Coding Using Hierarchical Matching Pursuit , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  TianQi,et al.  Towards Codebook-Free , 2014 .

[18]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[19]  Kaare Brandt Petersen,et al.  Sparse Kernel Orthonormalized PLS for feature extraction in large data sets , 2006, NIPS.

[20]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[21]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[22]  Haim H. Permuter,et al.  A study of Gaussian mixture models of color and texture features for image classification and segmentation , 2006, Pattern Recognit..

[23]  Qi Tian,et al.  Towards Codebook-Free: Scalable Cascaded Hashing for Mobile Image Search , 2014, IEEE Transactions on Multimedia.

[24]  Yasuo Kuniyoshi,et al.  Global Gaussian approach for scene categorization using information geometry , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Fatih Murat Porikli,et al.  Region Covariance: A Fast Descriptor for Detection and Classification , 2006, ECCV.

[26]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[27]  Xavier Pennec,et al.  A Riemannian Framework for Tensor Computing , 2005, International Journal of Computer Vision.

[28]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[29]  SánchezJorge,et al.  Image Classification with the Fisher Vector , 2013 .

[30]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[31]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[32]  Rita Cucchiara,et al.  GOLD: Gaussians of Local Descriptors for image representation , 2015, Comput. Vis. Image Underst..

[33]  K. Mikolajczyk,et al.  Higher-order Occurrence Pooling on Mid- and Low-level Features: Visual Concept Detection , 2013 .

[34]  Trevor Darrell,et al.  Pose pooling kernels for sub-category recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Barbara Caputo,et al.  Class-Specific Material Categorisation , 2005, ICCV.

[36]  Liyu Gong,et al.  Shape of Gaussians as feature descriptors , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Cristian Sminchisescu,et al.  Efficient Match Kernel between Sets of Features for Visual Recognition , 2009, NIPS.

[38]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[39]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[40]  Jing Xu,et al.  Deep boosting: Layered feature mining for general image classification , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[41]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[42]  Cor J. Veenman,et al.  Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[44]  I. Dryden,et al.  Non-Euclidean statistics for covariance matrices, with applications to diffusion tensor imaging , 2009, 0910.1656.

[45]  I. Johnstone,et al.  Optimal Shrinkage of Eigenvalues in the Spiked Covariance Model. , 2013, Annals of statistics.

[46]  Yu-Bin Yang,et al.  Visual feature coding for image classification integrating dictionary structure , 2015, Pattern Recognit..

[47]  Lei Zhang,et al.  A Novel Earth Mover's Distance Methodology for Image Matching with Gaussian Mixture Models , 2013, 2013 IEEE International Conference on Computer Vision.

[48]  Fei-FeiLi,et al.  One-Shot Learning of Object Categories , 2006 .

[49]  Thomas Seidl,et al.  Modeling image similarity by Gaussian mixture models and the Signature Quadratic Form Distance , 2011, 2011 International Conference on Computer Vision.

[50]  Huchuan Lu,et al.  Saliency Detection via Absorbing Markov Chain , 2013, 2013 IEEE International Conference on Computer Vision.

[51]  Mehrtash Tafazzoli Harandi,et al.  From Manifold to Manifold: Geometry-Aware Dimensionality Reduction for SPD Matrices , 2014, ECCV.

[52]  Miroslav Lovric,et al.  Multivariate Normal Distributions Parametrized as a Riemannian Symmetric Space , 2000 .

[53]  Liang Lin,et al.  Representing and recognizing objects with massive local image patches , 2012, Pattern Recognit..

[54]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[55]  Jian Sun,et al.  Blessing of Dimensionality: High-Dimensional Feature and Its Efficient Compression for Face Verification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Edward H. Adelson,et al.  Material perception: What can you see in a brief glance? , 2010 .

[57]  Cristian Sminchisescu,et al.  Semantic Segmentation with Second-Order Pooling , 2012, ECCV.

[58]  C. Stein Lectures on the theory of estimation of many parameters , 1986 .

[59]  Luc Van Gool,et al.  Self-Adaptable Templates for Feature Coding , 2014, NIPS.

[60]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[61]  Gary R. Bradski,et al.  A codebook-free and annotation-free approach for fine-grained image categorization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[62]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[63]  Hongdong Li,et al.  Kernel Methods on the Riemannian Manifold of Symmetric Positive Definite Matrices , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[64]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.