Beyond Semantic Attributes: Discrete Latent Attributes Learning for Zero-Shot Recognition

In this letter, we propose a novel approach for learning semantics-driven attributes, which are discriminative for zero-shot visual recognition. Latent attributes are derived in a principled manner, aiming at maintaining class-level semantic relatedness and attribute-wise balancedness. Unlike existing methods that binarize learned real-valued attributes via a quantization stage, we directly learn the optimal binary attributes by effectively addressing a discrete optimization problem. Particularly, we propose a class-wise discrete descent algorithm, based on which latent attributes of each class are learned iteratively. Moreover, we propose to simultaneously predict multiple attributes from low-level features via multioutput neural networks (MONN), which can model intrinsic correlation among attributes and make prediction more tractable. Extensive experiments on two standard datasets clearly demonstrate the superiority of our method over the state-of-the-arts.

[1]  Tianbao Yang,et al.  Learning Attributes Equals Multi-Source Domain Generalization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Chengqi Zhang,et al.  Dynamic Concept Composition for Zero-Example Event Detection , 2016, AAAI.

[3]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[4]  Masahiro Suzuki,et al.  Transfer learning based on the observation probability of each attribute , 2014, 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[5]  Venkatesh Saligrama,et al.  Zero-Shot Learning via Semantic Similarity Embedding , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Cordelia Schmid,et al.  Label-Embedding for Attribute-Based Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Silvio Savarese,et al.  Recognizing human actions by attributes , 2011, CVPR 2011.

[8]  Bernt Schiele,et al.  What helps where – and why? Semantic relatedness for knowledge transfer , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Ping Li,et al.  Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS) , 2014, NIPS.

[10]  Shih-Fu Chang,et al.  Designing Category-Level Attributes for Discriminative Visual Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  T. Judgmen Extrapolating human probability judgment , 1994 .

[12]  Shaogang Gong,et al.  Unsupervised Domain Adaptation for Zero-Shot Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Wei Liu,et al.  Supervised Discrete Hashing , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Shaogang Gong,et al.  Zero-shot object recognition by semantic manifold distance , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Philip H. S. Torr,et al.  An embarrassingly simple approach to zero-shot learning , 2015, ICML.

[16]  Jianmin Wang,et al.  Semantics-preserving hashing for cross-view retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Dale Schuurmans,et al.  Semi-Supervised Zero-Shot Classification with Label Representation Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Ling Shao,et al.  Fast action retrieval from videos via feature disaggregation , 2017, Comput. Vis. Image Underst..

[19]  Xin Li,et al.  Max-Margin Zero-Shot Learning for Multi-class Classification , 2015, AISTATS.

[20]  Xiaodong Yu,et al.  Attribute-Based Transfer Learning for Object Categorization with Zero/One Training Example , 2010, ECCV.

[21]  Alfredo PereiraJr Peter Gärdenfors, Conceptual Spaces: The Geometry of Thought , 2007 .

[22]  Shih-Fu Chang,et al.  Semi-supervised hashing for scalable image retrieval , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Ling Shao,et al.  Attribute Embedding with Visual-Semantic Ambiguity Removal for Zero-shot Learning , 2016, BMVC.

[24]  Qi Tian,et al.  Picking Deep Filter Responses for Fine-Grained Image Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Samy Bengio,et al.  Large-Scale Object Classification Using Label Relation Graphs , 2014, ECCV.

[26]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Shaogang Gong,et al.  Transductive Multi-View Zero-Shot Learning , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Gang Wang,et al.  Comparative object similarity for improved recognition with few or no examples , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29]  Kristen Grauman,et al.  Zero-shot recognition with unreliable attributes , 2014, NIPS.

[30]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[31]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Radha Poovendran,et al.  Activity Recognition Using a Combination of Category Components and Local Models for Video Surveillance , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[33]  Alfredo Pereira Peter Gärdenfors, Conceptual Spaces: The Geometry of Thought , 2007, Minds and Machines.

[34]  Shaogang Gong,et al.  Transductive Multi-view Embedding for Zero-Shot Recognition and Annotation , 2014, ECCV.

[35]  Mubarak Shah,et al.  Complex Events Detection Using Data-Driven Concepts , 2012, ECCV.

[36]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[37]  Christoph H. Lampert,et al.  Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Vinod Nair,et al.  A joint learning framework for attribute models and object descriptions , 2011, 2011 International Conference on Computer Vision.

[39]  Xiaogang Wang,et al.  Deeply learned attributes for crowded scene understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Wei Liu,et al.  Discrete Graph Hashing , 2014, NIPS.

[41]  Tao Xiang,et al.  Learning Multimodal Latent Attributes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Daoqiang Zhang,et al.  Attribute relation learning for zero-shot classification , 2014, Neurocomputing.

[43]  Yunhong Wang,et al.  Person re-identification by distance metric learning to discrete hashing , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[44]  Wei Liu,et al.  Learning Binary Codes for Maximum Inner Product Search , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).