论文信息 - Submodular Attribute Selection for Visual Recognition

Submodular Attribute Selection for Visual Recognition

In real-world visual recognition problems, low-level features cannot adequately characterize the semantic content in images, or the spatio-temporal structure in videos. In this work, we encode objects or actions based on attributes that describe them as high-level concepts. We consider two types of attributes. One type of attributes is generated by humans, while the second type is data-driven attributes extracted from data using dictionary learning methods. Attribute-based representation may exhibit variations due to noisy and redundant attributes. We propose a discriminative and compact attribute-based representation by selecting a subset of discriminative attributes from a large attribute set. Three attribute selection criteria are proposed and formulated as a submodular optimization problem. A greedy optimization algorithm is presented and its solution is guaranteed to be at least (1−1/e)-approximation to the optimum. Experimental results on four public datasets demonstrate that the proposed attribute-based representation significantly boosts the performance of visual recognition and outperforms most recently proposed recognition approaches.

Rama Chellappa | Jingjing Zheng | Zhuolin Jiang

[1] Larry S. Davis,et al. Label Consistent K-SVD: Learning a Discriminative Dictionary for Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] Alexander C. Berg,et al. Automatic Attribute Discovery and Characterization from Noisy Web Data , 2010, ECCV.

[3] Larry S. Davis,et al. Submodular Salient Region Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[4] Rama Chellappa,et al. Entropy-Rate Clustering: Cluster Analysis via Maximizing a Submodular Function Subject to a Matroid Constraint , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5] Fei-Fei Li,et al. Learning latent temporal structure for complex event detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6] Nuno Vasconcelos,et al. Recognizing Activities by Attribute Dynamics , 2012, NIPS.

[7] Yang Wang,et al. A Discriminative Latent Model of Object Classes and Attributes , 2010, ECCV.

[8] Gang Wang,et al. Joint learning of visual attributes, object classes and visual saliency , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9] Jeff A. Bilmes,et al. Submodular feature selection for high-dimensional acoustic score spaces , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[11] Devi Parikh,et al. Attributes for Classifier Feedback , 2012, ECCV.

[12] Juan Carlos Niebles,et al. Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification , 2010, ECCV.

[13] Leonidas J. Guibas,et al. Human action recognition by learning bases of action attributes and parts , 2011, 2011 International Conference on Computer Vision.

[14] Ziad Al-Halah,et al. Learning semantic attributes via a common latent space , 2015, 2014 International Conference on Computer Vision Theory and Applications (VISAPP).

[15] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[16] Christoph H. Lampert,et al. Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17] Kun Duan,et al. Discovering localized attributes for fine-grained recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18] Shih-Fu Chang,et al. Designing Category-Level Attributes for Discriminative Visual Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[19] Fei-Fei Li,et al. Attribute Learning in Large-Scale Datasets , 2010, ECCV Workshops.

[20] Christoph H. Lampert,et al. Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[21] Andreas Krause,et al. Near-optimal sensor placements in Gaussian processes , 2005, ICML.

[22] Ali Farhadi,et al. Attribute Discovery via Predictable Discriminative Binary Codes , 2012, ECCV.

[23] Cordelia Schmid,et al. Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[24] Andrew W. Fitzgibbon,et al. Efficient Object Category Recognition Using Classemes , 2010, ECCV.

[25] Hui Lin,et al. A Class of Submodular Functions for Document Summarization , 2011, ACL.

[26] Ivan Laptev,et al. On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[27] Ali Farhadi,et al. Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[28] Serge J. Belongie,et al. Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[29] Abhimanyu Das,et al. Selecting Diverse Features via Spectral Regularization , 2012, NIPS.

[30] Shree K. Nayar,et al. Ieee Transactions on Pattern Analysis and Machine Intelligence Describable Visual Attributes for Face Verification and Image Search , 2022 .

[31] Shaogang Gong,et al. Attribute Learning for Understanding Unstructured Social Activity , 2012, ECCV.

[32] Pietro Perona,et al. Visual Recognition with Humans in the Loop , 2010, ECCV.

[33] Jason J. Corso,et al. Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34] Thomas Mensink,et al. Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[35] A. Bruckstein,et al. K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[36] Ling Shao,et al. Submodular Object Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37] Kristen Grauman,et al. Interactively building a discriminative vocabulary of nameable attributes , 2011, CVPR 2011.

[38] Christoph H. Lampert,et al. Augmented Attribute Representations , 2012, ECCV.

[39] Silvio Savarese,et al. Recognizing human actions by attributes , 2011, CVPR 2011.

[40] Jianxin Wu,et al. Towards Good Practices for Action Video Encoding , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[41] Cordelia Schmid,et al. Label-Embedding for Attribute-Based Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[42] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[43] Heng Wang. LEAR-INRIA submission for the THUMOS workshop , 2013 .

[44] Larry S. Davis,et al. Image ranking and retrieval based on multi-attribute queries , 2011, CVPR 2011.

[45] Andreas Krause,et al. Cost-effective outbreak detection in networks , 2007, KDD '07.

[46] Bernt Schiele,et al. What helps where – and why? Semantic relatedness for knowledge transfer , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[47] M. L. Fisher,et al. An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[48] Hanqing Lu,et al. What Visual Attributes Characterize an Object Class? , 2014, ACCV.

[49] Kristen Grauman,et al. Relative attributes , 2011, 2011 International Conference on Computer Vision.