Salient coding for image classification

The codebook based (bag-of-words) model is a widely applied model for image classification. We analyze recent coding strategies in this model, and find that saliency is the fundamental characteristic of coding. The saliency in coding means that if a visual code is much closer to a descriptor than other codes, it will obtain a very strong response. The salient representation under maximum pooling operation leads to the state-of-the-art performance on many databases and competitions. However, most current coding schemes do not recognize the role of salient representation, so that they may lead to large deviations in representing local descriptors. In this paper, we propose “salient coding”, which employs the ratio between descriptors' nearest code and other codes to describe descriptors. This approach can guarantee salient representation without deviations. We study salient coding on two sets of image classification databases (15-Scenes and PASCAL VOC2007). The experimental results demonstrate that our approach outperforms all other coding methods in image classification.

[1]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[3]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[4]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[5]  James Ze Wang,et al.  Content-based image indexing and searching using Daubechies' wavelets , 1998, International Journal on Digital Libraries.

[6]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[7]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[8]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[9]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[10]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[12]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[13]  Cordelia Schmid,et al.  A sparse texture representation using local affine regions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[16]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[17]  Cor J. Veenman,et al.  Kernel Codebooks for Scene Categorization , 2008, ECCV.

[18]  Yihong Gong,et al.  Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.

[19]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[20]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[21]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.