Fast learning discriminative dictionaries for large-scale visual recognition

In this paper, we aim at improving the discriminative jointly dictionaries for large-scale image classification. Sparse representation is a popular tool for image classification. Visual dictionary is very critical to the classification performance. A visual tree is constructed according to the visual similarity, in which the higher layer represents the coarser membership and the lower layer represents the finer membership. Jointly dictionary is learned according to the visual tree. Bregman iterative algorithm is implemented to solve the optimal problem of joint dictionary learning, which makes the solution accurate and the running speed fast. Furthermore, we try to implement the pre-trained features learned from the convolution neural network (CNN) to represent an image, and the residual error of the sparse representation is utilized for image classification. The experimental results demonstrate that the CNN feature is more distinct than SIFT, and the hierarchical classification framework with the Bregman iteration algorithm can greatly improve the performance of classification.

[1]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[2]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[3]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[4]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[5]  Wen Gao,et al.  Group-Based Sparse Representation for Image Restoration , 2014, IEEE Transactions on Image Processing.

[6]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[7]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Alexei A. Efros,et al.  Unsupervised discovery of visual object class hierarchies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Frédéric Jurie,et al.  Fast Discriminative Visual Codebooks using Randomized Clustering Forests , 2006, NIPS.

[10]  Jianping Fan,et al.  Jointly Learning Visually Correlated Dictionaries for Large-Scale Visual Recognition Applications , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[12]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Jianping Fan,et al.  Structured Max-Margin Learning for Inter-Related Classifier Training and Multilabel Image Annotation , 2011, IEEE Transactions on Image Processing.

[14]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[16]  José M. Bioucas-Dias,et al.  A New TwIST: Two-Step Iterative Shrinkage/Thresholding Algorithms for Image Restoration , 2007, IEEE Transactions on Image Processing.

[17]  Wotao Yin,et al.  Bregman Iterative Algorithms for (cid:2) 1 -Minimization with Applications to Compressed Sensing ∗ , 2008 .

[18]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Frédéric Jurie,et al.  Sampling Strategies for Bag-of-Features Image Classification , 2006, ECCV.

[20]  Alexander C. Berg,et al.  Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition , 2011, NIPS.

[21]  Jason Weston,et al.  Label Embedding Trees for Large Multi-Class Tasks , 2010, NIPS.

[22]  Yong Wang,et al.  Refining image annotation using contextual relations between words , 2007, CIVR '07.

[23]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[24]  Wotao Yin,et al.  An Iterative Regularization Method for Total Variation-Based Image Restoration , 2005, Multiscale Model. Simul..

[25]  Jianping Fan,et al.  Mining Multilevel Image Semantics via Hierarchical Classification , 2008, IEEE Transactions on Multimedia.

[26]  Pietro Perona,et al.  Unsupervised learning of visual taxonomies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Guillermo Sapiro,et al.  Discriminative learned dictionaries for local image analysis , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Joseph F. Murray,et al.  Dictionary Learning Algorithms for Sparse Representation , 2003, Neural Computation.

[29]  Jianping Fan,et al.  Quantitative Characterization of Semantic Gaps for Learning Complexity Estimation and Inference Model Selection , 2012, IEEE Transactions on Multimedia.

[30]  David Zhang,et al.  Fisher Discrimination Dictionary Learning for sparse representation , 2011, 2011 International Conference on Computer Vision.

[31]  David Zhang,et al.  Relaxed collaborative representation for pattern classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Guillermo Sapiro,et al.  Classification and clustering via dictionary learning with structured incoherence and shared features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.