Learning multi-task local metrics for image annotation

The goal of image annotation is to automatically assign a set of textual labels to an image to describe the visual contents thereof. Recently, with the rapid increase in the number of web images, nearest neighbor (NN) based methods have become more attractive and have shown exciting results for image annotation. One of the key challenges of these methods is to define an appropriate similarity measure between images for neighbor selection. Several distance metric learning (DML) algorithms derived from traditional image classification problems have been applied to annotation tasks. However, a fundamental limitation of applying DML to image annotation is that it learns a single global distance metric over the entire image collection and measures the distance between image pairs in the image-level. For multi-label annotation problems, it may be more reasonable to measure similarity of image pairs in the label-level. In this paper, we develop a novel label prediction scheme utilizing multiple label-specific local metrics for label-level similarity measure, and propose two different local metric learning methods in a multi-task learning (MTL) framework. Extensive experimental results on two challenging annotation datasets demonstrate that 1) utilizing multiple local distance metrics to learn label-level distances is superior to using a single global metric in label prediction, and 2) the proposed methods using the MTL framework to learn multiple local metrics simultaneously can model the commonalities of labels, thereby facilitating label prediction results to achieve state-of-the-art annotation performance.

[1]  Koen E. A. van de Sande,et al.  The University of Amsterdam's Concept Detection System at ImageCLEF 2009 , 2009, CLEF.

[2]  Stefanie Nowak,et al.  The Fraunhofer IDMT at ImageCLEF 2011 Photo Annotation Task , 2011, CLEF.

[3]  Hagai Attias,et al.  Topic regression multi-modal Latent Dirichlet Allocation for image annotation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Chong-Wah Ngo,et al.  A revisit of Generative Model for Automatic Image Annotation using Markov Random Fields , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[6]  Rong Jin,et al.  Large-Scale Image Annotation by Efficient and Robust Kernel Metric Learning , 2013, 2013 IEEE International Conference on Computer Vision.

[7]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[8]  Cordelia Schmid,et al.  Image annotation with tagprop on the MIRFLICKR set , 2010, MIR '10.

[9]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[10]  Guiguang Ding,et al.  Image auto-annotation via tag-dependent random search over range-constrained visual neighbours , 2013, Multimedia Tools and Applications.

[11]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[12]  Xing Xu,et al.  Image Annotation by Learning Label-Specific Distance Metrics , 2013, ICIAP.

[13]  C. V. Jawahar,et al.  Exploring SVM for Image Annotation in Presence of Confusing Labels , 2013, BMVC.

[14]  Marcel Worring,et al.  Annotating images by harnessing worldwide user-tagged photos , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Motoaki Kawanabe,et al.  Enhanced representation and multi-task learning for image annotation , 2013, Comput. Vis. Image Underst..

[16]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Shuicheng Yan,et al.  Efficient large-scale image annotation by probabilistic collaborative multi-label propagation , 2010, ACM Multimedia.

[18]  Nenghai Yu,et al.  Distance metric learning from uncertain side information for automated photo tagging , 2011, TIST.

[19]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Guiguang Ding,et al.  Multi-source image auto-annotation , 2013, 2013 IEEE International Conference on Image Processing.

[21]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[22]  Frédéric Jurie,et al.  Semantic Contexts and Fisher Vectors for the ImageCLEF 2011 Photo Annotation Task , 2011, CLEF.

[23]  Kilian Q. Weinberger,et al.  Large Margin Multi-Task Metric Learning , 2010, NIPS.

[24]  Cordelia Schmid,et al.  Is that you? Metric learning approaches for face identification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[25]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[26]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[27]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[28]  Yang Yu,et al.  Automatic image annotation using group sparsity , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29]  Vladimir Pavlovic,et al.  A New Baseline for Image Annotation , 2008, ECCV.

[30]  Nicu Sebe,et al.  Feature Selection for Multimedia Analysis by Sharing Information Among Multiple Tasks , 2013, IEEE Transactions on Multimedia.

[31]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[32]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[33]  Shuicheng Yan,et al.  Towards multi-semantic image annotation with graph regularized exclusive group lasso , 2011, MM '11.

[34]  Tatsuya Harada,et al.  ISI at ImageCLEF 2012: Scalable System for Image Annotation , 2012, CLEF.

[35]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  C. V. Jawahar,et al.  Image Annotation Using Metric Learning in Semantic Neighbourhoods , 2012, ECCV.

[37]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[38]  Gabriela Csurka,et al.  Metric Learning for Large Scale Image Classification: Generalizing to New Classes at Near-Zero Cost , 2012, ECCV.