Distance metric learning from uncertain side information with application to automated photo tagging

Automated photo tagging is essential to make massive unlabeled photos searchable by text search engines. Conventional image annotation approaches, though working reasonably well on small testbeds, are either computationally expensive or inaccurate when dealing with large-scale photo tagging. Recently, with the popularity of social networking websites, we observe a massive number of user-tagged images, referred to as "social images", that are available on the web. Unlike traditional web images, social images often contain tags and other user-generated content, which offer a new opportunity to resolve some long-standing challenges in multimedia. In this work, we aim to address the challenge of large-scale automated photo tagging by exploring the social images. We present a retrieval based approach for automated photo tagging. To tag a test image, the proposed approach first retrieves k social images that share the largest visual similarity with the test image. The tags of the test image are then derived based on the tagging of the similar images. Due to the well-known semantic gap issue, a regular Euclidean distance-based retrieval method often fails to find semantically relevant images. To address the challenge of semantic gap, we propose a novel probabilistic distance metric learning scheme that (1) automatically derives constraints from the uncertain side information, and (2) efficiently learns a distance metric from the derived constraints. We apply the proposed technique to automated photo tagging tasks based on a social image testbed with over 200,000 images crawled from Flickr. Encouraging results show that the proposed technique is effective and promising for automated photo tagging.

[1]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[2]  Rong Yan,et al.  A learning-based hybrid tagging and browsing approach for efficient manual image annotation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Wei Liu,et al.  Learning Distance Metrics with Contextual Constraints for Image Retrieval , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[4]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[5]  Alexei A. Efros,et al.  Scene completion using millions of photographs , 2007, SIGGRAPH 2007.

[6]  Tat-Seng Chua,et al.  Automatic image annotation via local multi-label classification , 2008, CIVR '08.

[7]  Amir Globerson,et al.  Metric Learning by Collapsing Classes , 2005, NIPS.

[8]  Wei-Ying Ma,et al.  AnnoSearch: Image Auto-Annotation by Search , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  Antonio Torralba,et al.  Small codes and large image databases for recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Nenghai Yu,et al.  Flickr distance , 2008, ACM Multimedia.

[11]  Richard S. Zemel,et al.  Learning Hybrid Models for Image Annotation with Partially Labeled Data , 2008, NIPS.

[12]  Wei Liu,et al.  Semi-supervised distance metric learning for collaborative image retrieval and clustering , 2010, ACM Trans. Multim. Comput. Commun. Appl..

[13]  Michael R. Lyu,et al.  A novel log-based relevance feedback technique in content-based image retrieval , 2004, MULTIMEDIA '04.

[14]  Edward Y. Chang,et al.  Combinational collaborative filtering for personalized community recommendation , 2008, KDD.

[15]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[16]  Tomer Hertz,et al.  Learning a Mahalanobis Metric from Equivalence Constraints , 2005, J. Mach. Learn. Res..

[17]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[19]  Maosong Sun,et al.  Semi-supervised Learning for Image Annotation Based on Conditional Random Fields , 2006, CIVR.

[20]  Jianping Fan,et al.  Multi-level annotation of natural scenes using dominant image components and semantic concepts , 2004, MULTIMEDIA '04.

[21]  Yi Liu,et al.  An Efficient Algorithm for Local Distance Metric Learning , 2006, AAAI.

[22]  James C. Bezdek,et al.  Convergence of Alternating Optimization , 2003, Neural Parallel Sci. Comput..

[23]  Jing Hua,et al.  Graph theoretical framework for simultaneously integrating visual and textual features for efficient web image clustering , 2008, WWW.

[24]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[25]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[26]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[27]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[28]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[29]  Gustavo Carneiro,et al.  Formulating semantic image annotation as a supervised learning problem , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[30]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Yoram Singer,et al.  Efficient Learning of Label Ranking by Soft Projections onto Polyhedra , 2006, J. Mach. Learn. Res..

[32]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[33]  Changhu Wang,et al.  Learning to reduce the semantic gap in web image retrieval and annotation , 2008, SIGIR '08.

[34]  Luo Si,et al.  Collaborative image retrieval via regularized metric learning , 2006, Multimedia Systems.

[35]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[36]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.