WSABIE: Scaling Up to Large Vocabulary Image Annotation

Image annotation datasets are becoming larger and larger, with tens of millions of images and tens of thousands of possible annotations. We propose a strongly performing method that scales to such datasets by simultaneously learning to optimize precision at the top of the ranked list of annotations for a given image and learning a low-dimensional joint embedding space for both images and annotations. Our method, called WSABIE, both outperforms several baseline methods and is faster and consumes less memory.

[1]  J. Davenport Editor , 1960 .

[2]  E. Parzen Annals of Mathematical Statistics , 1962 .

[3]  Editors , 1986, Brain Research Bulletin.

[4]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[5]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[6]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[7]  C. Fellbaum An Electronic Lexical Database , 1998 .

[8]  Jitendra Malik,et al.  Recognizing surfaces using three-dimensional textons , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[9]  Nello Cristianini,et al.  Advances in Kernel Methods - Support Vector Learning , 1999 .

[10]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, International Conference on Artificial Neural Networks.

[11]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[12]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[13]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[14]  Francesca Odone,et al.  Histogram intersection kernel for image classification , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[15]  Daniel Gatica-Perez,et al.  PLSA-based image auto-annotation: constraining the latent space , 2004, MULTIMEDIA '04.

[16]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[17]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[18]  Qiang Yang,et al.  Semi-Supervised Learning with Very Few Labeled Training Examples , 2007, AAAI.

[19]  H. Robbins A Stochastic Approximation Method , 1951 .

[20]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[21]  David Grangier,et al.  A Discriminative Kernel-based Model to Rank Images from Text Queries , 2007 .

[22]  Trevor Darrell,et al.  The Pyramid Match Kernel: Efficient Learning with Sets of Features , 2007, J. Mach. Learn. Res..

[23]  Samy Bengio,et al.  A Discriminative Kernel-Based Approach to Rank Images from Text Queries , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Tie-Yan Liu,et al.  Listwise approach to learning to rank: theory and algorithm , 2008, ICML '08.

[25]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[26]  Antonio Torralba,et al.  Small codes and large image databases for recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Vladimir Pavlovic,et al.  A New Baseline for Image Annotation , 2008, ECCV.

[28]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[29]  Antonio Torralba,et al.  Semi-Supervised Learning in Gigantic Image Collections , 2009, NIPS.

[30]  Patrick Gallinari,et al.  Ranking with ordered weighted pairwise classification , 2009, ICML '09.

[31]  Jason Weston,et al.  Large scale image annotation: learning to rank with joint word-image embeddings , 2010, Machine Learning.

[32]  Sarod Yatawatta,et al.  2011 18TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP) , 2011, ICIP 2011.

[33]  Mark Sanderson,et al.  Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval , 2012, SIGIR 2012.