Image Annotation by Learning Label-Specific Distance Metrics

Recently, weighted k nearest neighbor based label prediction model combined with distance metric learning (KNN+ML) [10,14,17], has become more attractive and showed exciting results on image annotation task. Usually, in KNN+ML framework, a uniform distance metric is learned given a collection of similar/dissimilar image pairs from training data. Thus, for a couple of images, their distance is globally unique. However, this might not be sufficient for label prediction on annotation task because it is impossible to distinguish the multiple labels attached to each image. In this paper, we are motivated to learn multiple label-specific distance metrics, and measure the distance of an image pair under different labels’ distance metrics. We also propose a novel label specific prediction model, in which the weight of each label is determined by its specific distance value rather than previous global distance value. Compared with previous KNN+ML methods, our proposed method is able to exactly discriminate each label in each neighbor, and efficiently reduce the prediction of false positive and false negative labels. Extensive experimental results on three benchmark datasets demonstrate that proposed method achieves more accurate annotation results and competitive overall performance.

[1]  Jitendra Malik,et al.  Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[2]  C. V. Jawahar,et al.  Image Annotation Using Metric Learning in Semantic Neighbourhoods , 2012, ECCV.

[3]  Ying He,et al.  Mining social images with distance metric learning for automated image tagging , 2011, WSDM '11.

[4]  Yang Yu,et al.  Automatic image annotation using group sparsity , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[6]  Qian Zhang,et al.  Random Forest for Image Annotation , 2012, ECCV.

[7]  Matthieu Guillaumin,et al.  Segmentation Propagation in ImageNet , 2012, ECCV.

[8]  Chong-Wah Ngo,et al.  A revisit of Generative Model for Automatic Image Annotation using Markov Random Fields , 2009, CVPR.

[9]  Junzhou Huang,et al.  Automatic Image Annotation and Retrieval Using Group Sparsity , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[10]  Mads Nielsen,et al.  Computer Vision — ECCV 2002 , 2002, Lecture Notes in Computer Science.

[11]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Sriram Subramanian,et al.  Talking about tactile experiences , 2013, CHI.

[13]  Nenghai Yu,et al.  Efficient Tag Mining via Mixture Modeling for Real-Time Search-Based Image Annotation , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[14]  Vladimir Pavlovic,et al.  A New Baseline for Image Annotation , 2008, ECCV.

[15]  Cordelia Schmid,et al.  Is that you? Metric learning approaches for face identification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[17]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[18]  Yi Li,et al.  ARISTA - image search to annotation on billions of web photos , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Michael Grubinger,et al.  Analysis and evaluation of visual information systems performance , 2007 .

[20]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[21]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[22]  Zhi-Hua Zhou,et al.  Multi-Label Learning by Exploiting Label Correlations Locally , 2012, AAAI.

[23]  Gabriela Csurka,et al.  Metric Learning for Large Scale Image Classification: Generalizing to New Classes at Near-Zero Cost , 2012, ECCV.

[24]  Kilian Q. Weinberger,et al.  Fast solvers and efficient implementations for distance metric learning , 2008, ICML '08.

[25]  Hagai Attias,et al.  Topic regression multi-modal Latent Dirichlet Allocation for image annotation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  Andrew J. Davison,et al.  Active Matching , 2008, ECCV.

[27]  Horst Bischof,et al.  Large scale metric learning from equivalence constraints , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.