Unified Dictionary Learning and Region Tagging with Hierarchical Sparse Representation

Abstract Image patterns at different spatial levels are well organized, such as regions within one image and feature points within one region. These classes of spatial structures are hierarchical in nature. The appropriate integration and utilization of such relationship are important to improve the performance of region tagging. Inspired by the recent advances of sparse coding methods, we propose an approach, called Unified Dictionary Learning and Region Tagging with Hierarchical Sparse Representation. This approach consists of two steps: region representation and region reconstruction. In the first step, rather than using the l 1 -norm as it is commonly done in sparse coding, we add a hierarchical structure to the process of sparse coding and form a framework of tree-guided dictionary learning. In this framework, the hierarchical structures among feature points, regions, and images are encoded by forming a tree-guided multi-task learning process. With the learned dictionary, we obtain a better representation of training and testing regions. In the second step, we propose to use a sub-hierarchical structure to guide the sparse reconstruction for testing regions, i.e., the structure between regions and images. Thanks to this hierarchy, the obtained reconstruction coefficients are more discriminate. Finally, tags are propagated to testing regions by the learned reconstruction coefficients. Extensive experiments on three public benchmark image data sets demonstrate that the proposed approach has better performance of region tagging than the current state of the art methods.

[1]  Qi Tian,et al.  Multi-label boosting for image annotation by structural grouping sparsity , 2010, ACM Multimedia.

[2]  Shimon Ullman,et al.  Semantic Hierarchies for Recognizing Objects and Parts , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Tsuhan Chen,et al.  Unsupervised learning of hierarchical spatial structures in images , 2009, CVPR.

[4]  Hugo Jair Escalante,et al.  An energy-based model for region-labeling , 2011, Comput. Vis. Image Underst..

[5]  Francis R. Bach,et al.  Structured sparsity-inducing norms through submodular functions , 2010, NIPS.

[6]  Antonio Torralba,et al.  Learning hierarchical models of scenes, objects, and parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[7]  Martial Hebert,et al.  A hierarchical field framework for unified context-based classification , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[8]  Gang Hua,et al.  Integrated feature selection and higher-order spatial feature extraction for object categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[10]  Stuart Geman,et al.  Context and Hierarchy in a Probabilistic Image Model , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  Jake Porway,et al.  A hierarchical and contextual model for aerial image understanding , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[13]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[15]  Hai Jin,et al.  Label to region by bi-layer sparsity priors , 2009, MM '09.

[16]  Kanad K. Biswas,et al.  Region-based image retrieval using integrated color, shape, and location index , 2004, Comput. Vis. Image Underst..

[17]  Hugo Jair Escalante,et al.  The segmented and annotated IAPR TC-12 benchmark , 2010, Comput. Vis. Image Underst..

[18]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[19]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[20]  Qi Tian,et al.  Graph-guided sparse reconstruction for region tagging , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Jiebo Luo,et al.  Heterogeneous feature machines for visual recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[22]  Zi Huang,et al.  Tag localization with spatial correlations and joint group sparsity , 2011, CVPR 2011.

[23]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, CVPR.

[24]  R. Tibshirani,et al.  A note on the group lasso and a sparse group lasso , 2010, 1001.0736.

[25]  Daniel P. Huttenlocher,et al.  Spatial priors for part-based recognition using statistical models , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[26]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[27]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[28]  Tsuhan Chen,et al.  From appearance to context-based recognition: Dense labeling in small images , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Nicu Sebe,et al.  Exploiting the entire feature space with sparsity for automatic image annotation , 2011, ACM Multimedia.

[30]  Md. Monirul Islam,et al.  A review on automatic image annotation techniques , 2012, Pattern Recognit..

[31]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[32]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[33]  Qi Tian,et al.  Image Annotation by Input–Output Structural Grouping Sparsity , 2012, IEEE Transactions on Image Processing.

[34]  Hai Jin,et al.  Label-to-region with continuity-biased bi-layer sparsity priors , 2012, TOMCCAP.

[35]  Shimon Ullman,et al.  Unsupervised Classification and Part Localization by Consistency Amplification , 2008, ECCV.

[36]  Trevor Darrell,et al.  Transfer learning for image classification with sparse prototype representations , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[38]  Kevin P. Murphy,et al.  Multiscale Conditional Random Fields for Semi-supervised Labeling and Classification , 2011, 2011 Canadian Conference on Computer and Robot Vision.

[39]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[40]  Jieping Ye,et al.  Moreau-Yosida Regularization for Grouped Tree Structure Learning , 2010, NIPS.

[41]  Zi Huang,et al.  Local image tagging via graph regularized joint group sparsity , 2013, Pattern Recognit..

[42]  Yahong Han,et al.  Image classification with manifold learning for out-of-sample data , 2013, Signal Process..

[43]  Shuiwang Ji,et al.  SLEP: Sparse Learning with Efficient Projections , 2011 .

[44]  Jieping Ye,et al.  Multi-Task Feature Learning Via Efficient l2, 1-Norm Minimization , 2009, UAI.

[45]  Farshad Fotouhi,et al.  Region based image annotation through multiple-instance learning , 2005, MULTIMEDIA '05.