Learning semantic embedding at a large scale

A key problem in image annotation is to learn the underlying semantics. However, finding such semantic embeddings is a challenge task and often requires large amount of tagging information. In this paper, we propose to utilize multi-modality cues by incorporating visual and textual information as embedded objects. The paper further presents a multi-task learning framework that simultaneously learns the approximation of two semantic embeddings with efficient multi-stage convex relaxation technique. The experiments show that the proposed method presents very promising performance in both memory usage and training time for large-scale dataset, as well as image classification accuracy.

[1]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[2]  Daniel Gatica-Perez,et al.  Tagging and retrieving images with co-occurrence models: from corel to flickr , 2009, LS-MMRM '09.

[3]  Alin Achim,et al.  18th IEEE International Conference on Image Processing, ICIP 2011, Brussels, Belgium, September 11-14, 2011 , 2011, ICIP.

[4]  Y. Mori,et al.  Image-to-word transformation based on dividing and vector quantizing images with words , 1999 .

[5]  Christopher K. I. Williams On a Connection between Kernel PCA and Metric Multidimensional Scaling , 2004, Machine Learning.

[6]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Shuicheng Yan,et al.  Graph embedding: a general framework for dimensionality reduction , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[9]  Tong Zhang,et al.  Analysis of Multi-stage Convex Relaxation for Sparse Regularization , 2010, J. Mach. Learn. Res..

[10]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Rong Jin,et al.  Distance Metric Learning: A Comprehensive Survey , 2006 .

[12]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Chong Wang,et al.  Simultaneous image classification and annotation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Zhen Li,et al.  Hierarchical Gaussianization for image classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.