Submodular Attribute Selection for Visual Recognition

In real-world visual recognition problems, low-level features cannot adequately characterize the semantic content in images, or the spatio-temporal structure in videos. In this work, we encode objects or actions based on attributes that describe them as high-level concepts. We consider two types of attributes. One type of attributes is generated by humans, while the second type is data-driven attributes extracted from data using dictionary learning methods. Attribute-based representation may exhibit variations due to noisy and redundant attributes. We propose a discriminative and compact attribute-based representation by selecting a subset of discriminative attributes from a large attribute set. Three attribute selection criteria are proposed and formulated as a submodular optimization problem. A greedy optimization algorithm is presented and its solution is guaranteed to be at least (1−1/e)-approximation to the optimum. Experimental results on four public datasets demonstrate that the proposed attribute-based representation significantly boosts the performance of visual recognition and outperforms most recently proposed recognition approaches.

[1]  Larry S. Davis,et al.  Label Consistent K-SVD: Learning a Discriminative Dictionary for Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Alexander C. Berg,et al.  Automatic Attribute Discovery and Characterization from Noisy Web Data , 2010, ECCV.

[3]  Larry S. Davis,et al.  Submodular Salient Region Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Rama Chellappa,et al.  Entropy-Rate Clustering: Cluster Analysis via Maximizing a Submodular Function Subject to a Matroid Constraint , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Fei-Fei Li,et al.  Learning latent temporal structure for complex event detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Nuno Vasconcelos,et al.  Recognizing Activities by Attribute Dynamics , 2012, NIPS.

[7]  Yang Wang,et al.  A Discriminative Latent Model of Object Classes and Attributes , 2010, ECCV.

[8]  Gang Wang,et al.  Joint learning of visual attributes, object classes and visual saliency , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  Jeff A. Bilmes,et al.  Submodular feature selection for high-dimensional acoustic score spaces , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[11]  Devi Parikh,et al.  Attributes for Classifier Feedback , 2012, ECCV.

[12]  Juan Carlos Niebles,et al.  Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification , 2010, ECCV.

[13]  Leonidas J. Guibas,et al.  Human action recognition by learning bases of action attributes and parts , 2011, 2011 International Conference on Computer Vision.

[14]  Ziad Al-Halah,et al.  Learning semantic attributes via a common latent space , 2015, 2014 International Conference on Computer Vision Theory and Applications (VISAPP).

[15]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[16]  Christoph H. Lampert,et al.  Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Kun Duan,et al.  Discovering localized attributes for fine-grained recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Shih-Fu Chang,et al.  Designing Category-Level Attributes for Discriminative Visual Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Fei-Fei Li,et al.  Attribute Learning in Large-Scale Datasets , 2010, ECCV Workshops.

[20]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Andreas Krause,et al.  Near-optimal sensor placements in Gaussian processes , 2005, ICML.

[22]  Ali Farhadi,et al.  Attribute Discovery via Predictable Discriminative Binary Codes , 2012, ECCV.

[23]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[24]  Andrew W. Fitzgibbon,et al.  Efficient Object Category Recognition Using Classemes , 2010, ECCV.

[25]  Hui Lin,et al.  A Class of Submodular Functions for Document Summarization , 2011, ACL.

[26]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[27]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[29]  Abhimanyu Das,et al.  Selecting Diverse Features via Spectral Regularization , 2012, NIPS.

[30]  Shree K. Nayar,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence Describable Visual Attributes for Face Verification and Image Search , 2022 .

[31]  Shaogang Gong,et al.  Attribute Learning for Understanding Unstructured Social Activity , 2012, ECCV.

[32]  Pietro Perona,et al.  Visual Recognition with Humans in the Loop , 2010, ECCV.

[33]  Jason J. Corso,et al.  Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[35]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[36]  Ling Shao,et al.  Submodular Object Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Kristen Grauman,et al.  Interactively building a discriminative vocabulary of nameable attributes , 2011, CVPR 2011.

[38]  Christoph H. Lampert,et al.  Augmented Attribute Representations , 2012, ECCV.

[39]  Silvio Savarese,et al.  Recognizing human actions by attributes , 2011, CVPR 2011.

[40]  Jianxin Wu,et al.  Towards Good Practices for Action Video Encoding , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Cordelia Schmid,et al.  Label-Embedding for Attribute-Based Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[43]  Heng Wang LEAR-INRIA submission for the THUMOS workshop , 2013 .

[44]  Larry S. Davis,et al.  Image ranking and retrieval based on multi-attribute queries , 2011, CVPR 2011.

[45]  Andreas Krause,et al.  Cost-effective outbreak detection in networks , 2007, KDD '07.

[46]  Bernt Schiele,et al.  What helps where – and why? Semantic relatedness for knowledge transfer , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[47]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[48]  Hanqing Lu,et al.  What Visual Attributes Characterize an Object Class? , 2014, ACCV.

[49]  Kristen Grauman,et al.  Relative attributes , 2011, 2011 International Conference on Computer Vision.