Webly Supervised Semantic Embeddings for Large Scale Zero-Shot Learning

Zero-shot learning (ZSL) makes object recognition in images possible in absence of visual training data for a part of the classes from a dataset. When the number of classes is large, classes are usually represented by semantic class prototypes learned automatically from unannotated text collections. This typically leads to much lower performances than with manually designed semantic prototypes such as attributes. While most ZSL works focus on the visual aspect and reuse standard semantic prototypes learned from generic text collections, we focus on the problem of semantic class prototype design for large scale ZSL. More specifically, we investigate the use of noisy textual metadata associated to photos as text collections, as we hypothesize they are likely to provide more plausible semantic embeddings for visual classes if exploited appropriately. We thus make use of a source-based voting strategy to improve the robustness of semantic prototypes. Evaluation on the large scale ImageNet dataset shows a significant improvement in ZSL performances over two strong baselines, and over usual semantic embeddings used in previous works. We show that this improvement is obtained for several embedding methods, leading to state of the art results when one uses automatically created visual and text features.

[1]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[2]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[3]  Bernt Schiele,et al.  Evaluating knowledge transfer and zero-shot learning in a large-scale setting , 2011, CVPR 2011.

[4]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[5]  Yuji Matsumoto,et al.  Ridge Regression, Hubness, and Zero-Shot Learning , 2015, ECML/PKDD.

[6]  Vanessa Murdock,et al.  Modeling locations with social media , 2013, Information Retrieval.

[7]  Geoffrey E. Hinton,et al.  Zero-shot Learning with Semantic Output Codes , 2009, NIPS.

[8]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[9]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Stéphane Herbin,et al.  Zero-Shot Classification by Generating Artificial Visual Features , 2018 .

[11]  Bernt Schiele,et al.  Zero-Shot Learning — The Good, the Bad and the Ugly , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  XiangTao,et al.  Transductive Multi-View Zero-Shot Learning , 2015 .

[13]  Pierre Zweigenbaum,et al.  Embedding Strategies for Specialized Domains: Application to Clinical Entity Recognition , 2019, ACL.

[14]  Cordelia Schmid,et al.  Label-Embedding for Image Classification , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Yoshua Bengio,et al.  Zero-data Learning of New Tasks , 2008, AAAI.

[16]  Evgeniy Gabrilovich,et al.  A word at a time: computing word relatedness using temporal semantic analysis , 2011, WWW.

[17]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[18]  Derek Hoiem,et al.  ViCo: Word Embeddings From Visual Co-Occurrences , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Michel Crucianu,et al.  Modeling Inter and Intra-Class Relations in the Triplet Loss for Zero-Shot Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Wei-Lun Chao,et al.  An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild , 2016, ECCV.

[21]  Shaogang Gong,et al.  Unsupervised Domain Adaptation for Zero-Shot Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[23]  Abhinav Gupta,et al.  Zero-Shot Recognition via Semantic Embeddings and Knowledge Graphs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Alexandros Nanopoulos,et al.  Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data , 2010, J. Mach. Learn. Res..

[25]  Rada Mihalcea,et al.  Semantic Relatedness Using Salient Semantic Analysis , 2011, AAAI.

[26]  Adrian Popescu,et al.  Social media driven image retrieval , 2011, ICMR.

[27]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[28]  José M. F. Moura,et al.  VisualWord2Vec (Vis-W2V): Learning Visually Grounded Word Embeddings Using Abstract Scenes , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[30]  Fatih Porikli,et al.  A Unified Approach for Conventional Zero-Shot, Generalized Zero-Shot, and Few-Shot Learning , 2017, IEEE Transactions on Image Processing.

[31]  William A. Sethares,et al.  Domain Adapted Word Embeddings for Improved Sentiment Classification , 2018, ACL.

[32]  Yang Liu,et al.  Transductive Unbiased Embedding for Zero-Shot Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Bernt Schiele,et al.  Evaluation of output embeddings for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Bernt Schiele,et al.  Transfer Learning in a Transductive Setting , 2013, NIPS.

[35]  Piyush Rai,et al.  A Simple Exponential Family Framework for Zero-Shot Learning , 2017, ECML/PKDD.

[36]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[37]  Christoph H. Lampert,et al.  Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[39]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[40]  Christoph H. Lampert,et al.  Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Bernt Schiele,et al.  Latent Embeddings for Zero-Shot Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Piyush Rai,et al.  Generalized Zero-Shot Learning via Synthesized Examples , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Tetsuya Takiguchi,et al.  On Zero-Shot Recognition of Generic Objects , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Tomas Mikolov,et al.  Advances in Pre-Training Distributed Word Representations , 2017, LREC.

[45]  Philip H. S. Torr,et al.  An embarrassingly simple approach to zero-shot learning , 2015, ICML.

[46]  Wei-Lun Chao,et al.  Classifier and Exemplar Synthesis for Zero-Shot Learning , 2018, International Journal of Computer Vision.

[47]  Samy Bengio,et al.  Zero-Shot Learning by Convex Combination of Semantic Embeddings , 2013, ICLR.

[48]  Adrian Popescu,et al.  Harnessing noisy Web images for deep representation , 2017, Comput. Vis. Image Underst..

[49]  Soma Biswas,et al.  Preserving Semantic Relations for Zero-Shot Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50]  Hao Wang,et al.  Rethinking Knowledge Graph Propagation for Zero-Shot Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Bernt Schiele,et al.  Feature Generating Networks for Zero-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52]  Bernt Schiele,et al.  Multi-cue Zero-Shot Learning with Strong Supervision , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Allan Jabri,et al.  Learning Visual Features from Large Weakly Supervised Data , 2015, ECCV.

[54]  Andrew Y. Ng,et al.  Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[55]  James Hays,et al.  SUN attribute database: Discovering, annotating, and recognizing scene attributes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Venkatesh Saligrama,et al.  Zero-Shot Learning via Semantic Similarity Embedding , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[57]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[58]  Michel Crucianu,et al.  From Classical to Generalized Zero-Shot Learning: a Simple Adaptation Process , 2018, MMM.