Jointly Learning Visually Correlated Dictionaries for Large-Scale Visual Recognition Applications

Learning discriminative dictionaries for image content representation plays a critical role in visual recognition. In this paper, we present a joint dictionary learning (JDL) algorithm which exploits the inter-category visual correlations to learn more discriminative dictionaries. Given a group of visually correlated categories, JDL simultaneously learns one common dictionary and multiple category-specific dictionaries to explicitly separate the shared visual atoms from the category-specific ones. The problem of JDL is formulated as a joint optimization with a discrimination promotion term according to the Fisher discrimination criterion. A visual tree method is developed to cluster a large number of categories into a set of disjoint groups, so that each of them contains a reasonable number of visually correlated categories. The process of image category clustering helps JDL to learn better dictionaries for classification by ensuring that the categories in the same group are of strong visual correlations. Also, it makes JDL to be computationally affordable in large-scale applications. Three classification schemes are adopted to make full use of the dictionaries learned by JDL for visual content representation in the task of image categorization. The effectiveness of the proposed algorithms has been evaluated using two image databases containing 17 and 1,000 categories, respectively.

[1]  Allen Y. Yang,et al.  A Review of Fast L(1)-Minimization Algorithms for Robust Face Recognition , 2010 .

[2]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[3]  Sebastian Nowozin,et al.  On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  Guillermo Sapiro,et al.  Classification and clustering via dictionary learning with structured incoherence and shared features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Pietro Perona,et al.  MULTIPLE DICTIONARIES FOR BAG OFWORDS LARGE SCALE IMAGE SEARCH , 2011 .

[6]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[8]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[10]  Ke Huang,et al.  Sparse Representation for Signal Classification , 2006, NIPS.

[11]  Thomas S. Huang,et al.  Supervised translation-invariant sparse coding , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Jason Weston,et al.  Label Embedding Trees for Large Multi-Class Tasks , 2010, NIPS.

[14]  Alexei A. Efros,et al.  Unsupervised discovery of visual object class hierarchies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Jianping Fan,et al.  Integrating Concept Ontology and Multitask Learning to Achieve More Effective Classifier Training for Multilevel Image Annotation , 2008, IEEE Transactions on Image Processing.

[16]  Thomas S. Huang,et al.  Image Classification Using Super-Vector Coding of Local Image Descriptors , 2010, ECCV.

[17]  Ming Yang,et al.  Large-scale image classification: Fast feature extraction and SVM training , 2011, CVPR 2011.

[18]  Kjersti Engan,et al.  Frame based signal compression using method of optimal directions (MOD) , 1999, ISCAS'99. Proceedings of the 1999 IEEE International Symposium on Circuits and Systems VLSI (Cat. No.99CH36349).

[19]  Guillermo Sapiro,et al.  Discriminative learned dictionaries for local image analysis , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Martial Hebert,et al.  Discriminative Sparse Image Models for Class-Specific Edge Detection and Image Interpretation , 2008, ECCV.

[21]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[22]  Lorenzo Torresani,et al.  Meta-class features for large-scale object categorization on a budget , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Larry S. Davis,et al.  Learning a discriminative dictionary for sparse coding via label consistent K-SVD , 2011, CVPR 2011.

[24]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[25]  Cordelia Schmid,et al.  Constructing Category Hierarchies for Visual Recognition , 2008, ECCV.

[26]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[27]  Alexander C. Berg,et al.  Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition , 2011, NIPS.

[28]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[29]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[30]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[31]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[32]  Pietro Perona,et al.  Unsupervised learning of visual taxonomies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Shang-Hong Lai,et al.  Learning component-level sparse representation using histogram information for image classification , 2011, 2011 International Conference on Computer Vision.

[34]  Allen Y. Yang,et al.  A Review of Fast l1-Minimization Algorithms for Robust Face Recognition , 2010, ArXiv.

[35]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[36]  Kristen Grauman,et al.  Learning a Tree of Metrics with Disjoint Visual Features , 2011, NIPS.

[37]  Jianping Fan,et al.  Learning inter-related visual dictionary for object recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[39]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[40]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[41]  Cordelia Schmid,et al.  Good Practice in Large-Scale Learning for Image Classification , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Florent Perronnin,et al.  High-dimensional signature compression for large-scale image classification , 2011, CVPR 2011.

[44]  José M. Bioucas-Dias,et al.  A New TwIST: Two-Step Iterative Shrinkage/Thresholding Algorithms for Image Restoration , 2007, IEEE Transactions on Image Processing.

[45]  Jianping Fan,et al.  Statistical modeling and conceptualization of natural images , 2005, Pattern Recognit..

[46]  Shuicheng Yan,et al.  Visual classification with multi-task joint sparse representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[47]  David Zhang,et al.  Fisher Discrimination Dictionary Learning for sparse representation , 2011, 2011 International Conference on Computer Vision.

[48]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[49]  Jianping Fan,et al.  Structured Max-Margin Learning for Inter-Related Classifier Training and Multilabel Image Annotation , 2011, IEEE Transactions on Image Processing.

[50]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[51]  Rong Jin,et al.  Unifying discriminative visual codebook generation with classifier training for object category recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[53]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.

[54]  Dimitris N. Metaxas,et al.  Deformable segmentation via sparse representation and dictionary learning , 2012, Medical Image Anal..

[55]  Jianping Fan,et al.  Quantitative Characterization of Semantic Gaps for Learning Complexity Estimation and Inference Model Selection , 2012, IEEE Transactions on Multimedia.

[56]  Baoxin Li,et al.  Discriminative K-SVD for dictionary learning in face recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[57]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[58]  Pietro Perona,et al.  Learning and using taxonomies for fast visual categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[60]  Thomas G. Dietterich,et al.  Learning non-redundant codebooks for classifying complex objects , 2009, ICML '09.

[61]  Fei-Fei Li,et al.  What Does Classifying More Than 10, 000 Image Categories Tell Us? , 2010, ECCV.