Normalized Non-Negative Sparse Encoder for Fast Image Representation

Image representation based on sparse coding generalizes the bag of words model. Although it reduces the reconstruction error for local features to achieve the state-of-the-art image classification performance, the large computational cost hinders the application of sparse coding-based image features. In this paper, we propose approximating a sparse code using the output of a simple neural network. The resulting parameter learning model for the neural network automatically incorporates non-negative and shift-invariant constraints, leading to an efficient normalized non-negative sparse coding (N3SC) sparse encoder. Without the use of the traditional iterative process to solve the sparse coding objective, the sparse encoder directly “converts” each local feature into a sparse code. We also introduce a method for training the encoder based on the auto-encoder method. In addition, we formally propose the corresponding sparse coding scheme called N3SC, which enforces both the non-negative constraint and the shift-invariant constraint in addition to the traditional sparse coding criteria. As demonstrated by several experiments, the obtained N3SC encoder requires only 3%–10% of the processing time for image feature extraction compared with the standard sparse coding scheme. At the same time, the features extracted using the exact solutions of the N3SC coding scheme and the N3SC encoder offer superior image classification accuracy compared to the accuracy of many existing sparse coding-based representations.

[1]  Thomas S. Huang,et al.  Image Classification Using Super-Vector Coding of Local Image Descriptors , 2010, ECCV.

[2]  Liang-Tien Chia,et al.  Concurrent Single-Label Image Classification and Annotation via Efficient Multi-Layer Group Sparse Coding , 2014, IEEE Transactions on Multimedia.

[3]  Liang-Tien Chia,et al.  Local features are not lonely – Laplacian sparse coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  R. Fergus,et al.  Learning invariant features through topographic filter maps , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[7]  Qi Tian,et al.  Image classification by non-negative sparse coding, low-rank and sparse decomposition , 2011, CVPR 2011.

[8]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[9]  David M. Bradley,et al.  Differentiable Sparse Coding , 2008, NIPS.

[10]  James M. Rehg,et al.  Beyond the Euclidean distance: Creating effective visual codebooks using the Histogram Intersection Kernel , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11]  Patrik O. Hoyer,et al.  Non-negative sparse coding , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[12]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[14]  Cor J. Veenman,et al.  Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Chris H. Q. Ding,et al.  Convex and Semi-Nonnegative Matrix Factorizations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Yihong Gong,et al.  Discovering Image Semantics in Codebook Derivative Space , 2012, IEEE Transactions on Multimedia.

[18]  Fei-Fei Li,et al.  Object-Centric Spatial Pooling for Image Classification , 2012, ECCV.

[19]  Thomas S. Huang,et al.  Supervised translation-invariant sparse coding , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[21]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[22]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[23]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[24]  Tao Mei,et al.  A Bag-of-Importance Model With Locality-Constrained Coding Based Feature Learning for Video Summarization , 2014, IEEE Transactions on Multimedia.

[25]  Xiaofeng Zhu,et al.  Video-to-Shot Tag Propagation by Graph Sparse Group Lasso , 2013, IEEE Transactions on Multimedia.

[26]  Jing Zhao,et al.  Document Clustering Based on Nonnegative Sparse Matrix Factorization , 2005, ICNC.

[27]  Andrew Zisserman,et al.  The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[28]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[29]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[30]  Marc'Aurelio Ranzato,et al.  Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition , 2010, ArXiv.

[31]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[32]  Slim Essid,et al.  Smooth Nonnegative Matrix Factorization for Unsupervised Audiovisual Document Structuring , 2013, IEEE Transactions on Multimedia.

[33]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[34]  Richard G. Baraniuk,et al.  A deep learning approach to structured signal recovery , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[35]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.

[36]  Jorge Nocedal,et al.  Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[37]  Tao Chen,et al.  Discriminative Soft Bag-of-Visual Phrase for Mobile Landmark Recognition , 2014, IEEE Transactions on Multimedia.

[38]  Martin A. Riedmiller,et al.  Deep auto-encoder neural networks in reinforcement learning , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[39]  Tieniu Tan,et al.  Feature Coding in Image Classification: A Comprehensive Study , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Qiuqi Ruan,et al.  Hessian Semi-Supervised Sparse Feature Selection Based on ${L_{2,1/2}}$ -Matrix Norm , 2015, IEEE Transactions on Multimedia.

[41]  Sheng Tang,et al.  Sparse Ensemble Learning for Concept Detection , 2012, IEEE Transactions on Multimedia.

[42]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[43]  Yihong Gong,et al.  Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.

[44]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[45]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .