Incremental Codebook Adaptation for Visual Representation and Categorization

The bag-of-visual-words model is widely used for visual content analysis. For visual data, the codebook plays an important role for efficient representation. However, the codebook has to be relearned with the changes of training images. Once the codebook is changed, the encoding parameters of local features have to be recomputed. To alleviate this problem, in this paper, we propose an incremental codebook adaptation method for efficient visual representation. Instead of learning a new codebook, we gradually adapt a prelearned codebook using new images in an incremental way. To make use of the prelearned codebook, we try to make changes to the prelearned codebook with sparsity constraint and low-rank correlation. Besides, we also encode visually similar local features within a neighborhood to take advantage of locality information and ensure the encoded parameters are consistent. To evaluate the effectiveness of the proposed method, we apply the proposed method for categorization tasks on several public image datasets. Experimental results prove the effectiveness and usefulness of the proposed method over other codebook-based methods.

[1]  Qi Tian,et al.  Image Class Prediction by Joint Object, Context, and Background Modeling , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Cordelia Schmid,et al.  Image categorization using Fisher kernels of non-iid image models , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Bingbing Ni,et al.  Image Classification by Selective Regularized Subspace Learning , 2016, IEEE Transactions on Multimedia.

[5]  Qi Tian,et al.  Image classification by non-negative sparse coding, low-rank and sparse decomposition , 2011, CVPR 2011.

[6]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Jun Zhou,et al.  Object Classification via Feature Fusion Based Marginalized Kernels , 2015, IEEE Geoscience and Remote Sensing Letters.

[8]  Mubarak Shah,et al.  View-Invariant Representation and Recognition of Actions , 2002, International Journal of Computer Vision.

[9]  Yuan Shi,et al.  Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Andrew Zisserman,et al.  A Visual Vocabulary for Flower Classification , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  Peter Kontschieder,et al.  Neural Decision Forests for Semantic Image Labelling , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Andrew Zisserman,et al.  Tabula rasa: Model transfer for object category detection , 2011, 2011 International Conference on Computer Vision.

[13]  Thomas L. Griffiths,et al.  Learning author-topic models from text corpora , 2010, TOIS.

[14]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[15]  Yasuyuki Matsushita,et al.  Camera calibration with lens distortion from low-rank textures , 2011, CVPR 2011.

[16]  Sebastian Nowozin,et al.  On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[17]  Shengping Zhang,et al.  Robust Joint Discriminative Feature Learning for Visual Tracking , 2016, IJCAI.

[18]  Rama Chellappa,et al.  Domain adaptation for object recognition: An unsupervised approach , 2011, 2011 International Conference on Computer Vision.

[19]  Yanhui Xiao,et al.  Kernel Reconstruction ICA for Sparse Representation , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[20]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[21]  Pong C. Yuen,et al.  Multi-cue Visual Tracking Using Robust Feature-Level Fusion Based on Joint Sparse Representation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Qi Tian,et al.  Bundled Local Features for Image Representation , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[23]  Bing Li,et al.  Multi-Perspective Cost-Sensitive Context-Aware Multi-Instance Sparse Coding and Its Application to Sensitive Video Recognition , 2016, IEEE Transactions on Multimedia.

[24]  Yun Fu,et al.  Learning low-rank and discriminative dictionary for image classification , 2014, Image Vis. Comput..

[25]  Rama Chellappa,et al.  Cross-View Action Recognition via a Transferable Dictionary Pair , 2012, BMVC.

[26]  Florent Perronnin,et al.  Fisher vectors meet Neural Networks: A hybrid classification architecture , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Qi Tian,et al.  Boosted random contextual semantic space based representation for visual recognition , 2016, Inf. Sci..

[28]  Ling Shao,et al.  Correspondence-Free Dictionary Learning for Cross-View Action Recognition , 2014, 2014 22nd International Conference on Pattern Recognition.

[29]  Qi Wang,et al.  Hyperspectral Image Classification via Multitask Joint Sparse Representation and Stepwise MRF Optimization , 2016, IEEE Transactions on Cybernetics.

[30]  Qi Tian,et al.  Image classification using boosted local features with random orientation and location selection , 2015, Inf. Sci..

[31]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[32]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[33]  Frédéric Jurie,et al.  Fast Discriminative Visual Codebooks using Randomized Clustering Forests , 2006, NIPS.

[34]  Qi Tian,et al.  Beyond Explicit Codebook Generation: Visual Representation Using Implicitly Transferred Codebooks , 2015, IEEE Transactions on Image Processing.

[35]  John Wright,et al.  RASL: Robust Alignment by Sparse and Low-Rank Decomposition for Linearly Correlated Images , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[37]  Xiaodong Yu,et al.  Attribute-Based Transfer Learning for Object Categorization with Zero/One Training Example , 2010, ECCV.

[38]  Krista A. Ehinger,et al.  SUN Database: Exploring a Large Collection of Scene Categories , 2014, International Journal of Computer Vision.

[39]  James M. Rehg,et al.  Beyond the Euclidean distance: Creating effective visual codebooks using the Histogram Intersection Kernel , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[40]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[41]  Yicong Zhou,et al.  Learning Hierarchical Spectral–Spatial Features for Hyperspectral Image Classification , 2016, IEEE Transactions on Cybernetics.

[42]  Liang-Tien Chia,et al.  Laplacian Sparse Coding, Hypergraph Laplacian Sparse Coding, and Applications , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Rama Chellappa,et al.  View Invariance for Human Action Recognition , 2005, International Journal of Computer Vision.

[44]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[45]  Manik Varma,et al.  Learning The Discriminative Power-Invariance Trade-Off , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[46]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[47]  Cor J. Veenman,et al.  Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[49]  Yi Ma,et al.  Learning Category-Specific Dictionary and Shared Dictionary for Fine-Grained Image Categorization , 2014, IEEE Transactions on Image Processing.

[50]  Rama Chellappa,et al.  Joint Sparse Representation and Robust Feature-Level Fusion for Multi-Cue Visual Tracking , 2015, IEEE Transactions on Image Processing.

[51]  Nuno Vasconcelos,et al.  Scene Recognition on the Semantic Manifold , 2012, ECCV.

[52]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[53]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[54]  Anton van den Hengel,et al.  The treasure beneath convolutional layers: Cross-convolutional-layer pooling for image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Patrick Pérez,et al.  Cross-View Action Recognition from Temporal Self-similarities , 2008, ECCV.

[56]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[58]  Qi Tian,et al.  Image classification by search with explicitly and implicitly semantic representations , 2017, Inf. Sci..

[59]  Rong Yan,et al.  Cross-domain video concept detection using adaptive svms , 2007, ACM Multimedia.

[60]  Gabriela Csurka,et al.  Adapted Vocabularies for Generic Visual Categorization , 2006, ECCV.

[61]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[62]  Mubarak Shah,et al.  Actions sketch: a novel action representation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[63]  Qi Tian,et al.  Fine-Grained Image Classification via Low-Rank Sparse Coding With General and Class-Specific Codebooks , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[64]  Ramakant Nevatia,et al.  Single View Human Action Recognition using Key Pose Matching and Viterbi Path Searching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[65]  Ivor W. Tsang,et al.  Domain Transfer Multiple Kernel Learning , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[66]  Qingming Huang,et al.  Image classification by non-negative sparse coding, correlation constrained low-rank and sparse decomposition , 2014, Comput. Vis. Image Underst..

[67]  Qi Tian,et al.  Contextual Exemplar Classifier-Based Image Representation for Classification , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[68]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[69]  Fengfu Li,et al.  Biologically Inspired Model for Visual Cognition Achieving Unsupervised Episodic and Semantic Feature Learning , 2016, IEEE Transactions on Cybernetics.

[70]  Shuicheng Yan,et al.  Visual classification with multi-task joint sparse representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[71]  Min Jiang,et al.  Integration of Global and Local Metrics for Domain Adaptation Learning Via Dimensionality Reduction , 2017, IEEE Transactions on Cybernetics.

[72]  Alex Pentland,et al.  Task-Specific Gesture Analysis in Real-Time Using Interpolated Views , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[73]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[74]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[75]  Yun Fu,et al.  Unsupervised transfer learning via Low-Rank Coding for image clustering , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[76]  Qi Tian,et al.  Undo the codebook bias by linear transformation for visual applications , 2013, ACM Multimedia.

[77]  Xiaoning Song,et al.  Half-Face Dictionary Integration for Representation-Based Classification , 2017, IEEE Transactions on Cybernetics.

[78]  Dacheng Tao,et al.  GoDec: Randomized Lowrank & Sparse Matrix Decomposition in Noisy Case , 2011, ICML.

[79]  Xiaolin Hu,et al.  Recurrent convolutional neural network for object recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[80]  Jian Yang,et al.  Matrix Variate Distribution-Induced Sparse Representation for Robust Image Classification , 2015, IEEE Transactions on Neural Networks and Learning Systems.