Learning relative features through adaptive pooling for image classification

Bag-of-Feature (BoF) representations and spatial constraints have been popular in image classification research. One of the most successful methods uses sparse coding and spatial pooling to build discriminative features. However, minimizing the reconstruction error by sparse coding only considers the similarity between the input and codebooks. In contrast, this paper describes a novel feature learning approach for image classification by considering the dissimilarity between inputs and prototype images, or what we called reference basis (RB). First, we learn the feature representation by max-margin criterion between the input and the RB. The learned hyperplane is stored as the relative feature. Second, we propose an adaptive pooling technique to assemble multiple relative features generated by different RBs under the SVM framework, where the classifier and the pooling weights are jointly learned. Experiments based on three challenging datasets: Caltech-101, Scene 15 and Willow-Actions, demonstrate the effectiveness and generality of our framework.

[1]  Ivan Laptev,et al.  Learning person-object interactions for action recognition in still images , 2011, NIPS.

[2]  Hamid R. Rabiee,et al.  From Local Similarity to Global Coding: An Application to Image Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[4]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Trevor Darrell,et al.  Transfer learning for image classification with sparse prototype representations , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Shiguang Shan,et al.  Learning Prototype Hyperplanes for Face Verification in the Wild , 2013, IEEE Transactions on Image Processing.

[7]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[9]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Ming Shao,et al.  Prototype based feature learning for face image set classification , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[11]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[12]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Zhiwei Li,et al.  Max-Margin Dictionary Learning for Multiclass Image Categorization , 2010, ECCV.

[14]  Frédéric Jurie,et al.  Learning Tree-structured Quantizers for Image Categorization , 2011, BMVC.

[15]  Luc Van Gool,et al.  Ensemble Projection for Semi-supervised Image Classification , 2013, 2013 IEEE International Conference on Computer Vision.

[16]  Larry S. Davis,et al.  Learning Structured Low-Rank Representations for Image Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Mário A. T. Figueiredo,et al.  Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems , 2007, IEEE Journal of Selected Topics in Signal Processing.

[18]  Cordelia Schmid,et al.  Discriminative spatial saliency for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Thomas S. Huang,et al.  Efficient Highly Over-Complete Sparse Coding Using a Mixture Model , 2010, ECCV.

[20]  Tal Hassner,et al.  Multiple One-Shots for Utilizing Class Label Information , 2009, BMVC.

[21]  Takumi Kobayashi,et al.  BFO Meets HOG: Feature Extraction Based on Histograms of Oriented p.d.f. Gradients for Image Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Changsheng Xu,et al.  Low-Rank Sparse Coding for Image Classification , 2013, 2013 IEEE International Conference on Computer Vision.

[23]  Vincent Lepetit,et al.  Are sparse representations really relevant for image classification? , 2011, CVPR 2011.

[24]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[25]  Jean Ponce,et al.  A Theoretical Analysis of Feature Pooling in Visual Recognition , 2010, ICML.

[26]  Trevor Darrell,et al.  Beyond spatial pyramids: Receptive field learning for pooled image features , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[28]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.