Learning Contextual Metrics for Automatic Image Annotation

The semantic contextual information is shown to be an important resource for improving the scene and image recognition, but is seldom explored in the literature of previous distance metric learning (DML) for images. In this work, we present a novel Contextual Metric Learning (CML) method for learning a set of contextual distance metrics for real world multi-label images. The relationships between classes are formulated as contextual constraints for the optimization framework to leverage the learning performance. In the experiment, we apply the proposed method for automatic image annotation task. The experimental results show that our approach outperforms the start-of-the-art DML algorithms.

[1]  TorralbaAntonio,et al.  Modeling the Shape of the Scene , 2001 .

[2]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[3]  Chong-Wah Ngo,et al.  A revisit of Generative Model for Automatic Image Annotation using Markov Random Fields , 2009, CVPR.

[4]  Xian-Sheng Hua,et al.  Learning semantic distance from community-tagged media collection , 2009, MM '09.

[5]  Mads Nielsen,et al.  Computer Vision — ECCV 2002 , 2002, Lecture Notes in Computer Science.

[6]  Tomer Hertz,et al.  Learning Distance Functions using Equivalence Relations , 2003, ICML.

[7]  Chong-Wah Ngo,et al.  Semantic context modeling with maximal margin Conditional Random Fields for automatic image annotation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[9]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[10]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[11]  Wei Liu,et al.  Learning Distance Metrics with Contextual Constraints for Image Retrieval , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[13]  Chong-Wah Ngo,et al.  Towards optimal bag-of-features for object categorization and semantic video retrieval , 2007, CIVR '07.

[14]  Ning Zhou,et al.  Collaborative and content-based image labeling , 2008, 2008 19th International Conference on Pattern Recognition.

[15]  Inderjit S. Dhillon,et al.  Differential Entropic Clustering of Multivariate Gaussians , 2006, NIPS.

[16]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[17]  Kilian Q. Weinberger,et al.  Fast solvers and efficient implementations for distance metric learning , 2008, ICML '08.

[18]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19]  Nenghai Yu,et al.  Distance metric learning from uncertain side information with application to automated photo tagging , 2009, ACM Multimedia.

[20]  Steve Branson,et al.  Similarity metrics for categorization: From monolithic to category specific , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[21]  Zhi-Hua Zhou,et al.  Learning instance specific distances using metric propagation , 2009, ICML '09.

[22]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.