Flickr Distance: A Relationship Measure for Visual Concepts

This paper proposes the Flickr Distance (FD) to measure the visual correlation between concepts. For each concept, a collection of related images are obtained from the Flickr website. We assume that each concept consists of several states, e.g., different views, different semantics, etc., which are considered as latent topics. Then a latent topic visual language model (LTVLM) is built to capture these states. The Flickr distance between two concepts is defined as the Jensen-Shannon (J-S) divergence between their LTVLM. Differently from traditional conceptual distance measurements, which are based on Web textual documents, FD is based on the visual information. Comparing with the WordNet distance, FD can easily scale up with the increasing size of the conceptual corpus. Comparing with the Google Distance (NGD) and Tag Concurrence Distance (TCD), FD uses the visual information and can properly measure the conceptual relations. We apply FD to multimedia-related tasks and find methods based on FD significantly outperform those based on NGD and TCD. With the FD measurement, we also construct a large-scale visual conceptual network (VCNet) to store the knowledge of conceptual relationship. Experiments show that FD is more coherent to human cognition and it also outperforms text-based distances in real-world applications.

[1]  H. Damasio,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence: Special Issue on Perceptual Organization in Computer Vision , 1998 .

[2]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[4]  José A. Rodríguez-Serrano,et al.  A similarity measure between vector sequences with application to handwritten word image retrieval , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Antonio Torralba,et al.  Semantic Label Sharing for Learning with Many Categories , 2010, ECCV.

[6]  Bin Wang,et al.  Large-Scale Duplicate Detection for Web Image Search , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[7]  Dong Liu,et al.  Tag ranking , 2009, WWW '09.

[8]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[9]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[10]  Changhu Wang,et al.  Content-Based Image Annotation Refinement , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[12]  Mor Naaman,et al.  Why we tag: motivations for annotation in mobile and online media , 2007, CHI.

[13]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[14]  Nenghai Yu,et al.  Flickr distance , 2008, ACM Multimedia.

[15]  Tao Mei,et al.  Correlative multi-label video annotation , 2007, ACM Multimedia.

[16]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[17]  Nenghai Yu,et al.  Distance metric learning from uncertain side information with application to automated photo tagging , 2009, ACM Multimedia.

[18]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[19]  Jiebo Luo,et al.  Large-scale multimodal semantic concept detection for consumer video , 2007, MIR '07.

[20]  Nenghai Yu,et al.  Latent topic visual language model for object categorization , 2011, Proceedings of the International Conference on Signal Processing and Multimedia Applications.

[21]  Antonio Torralba,et al.  Building the gist of a scene: the role of global image features in recognition. , 2006, Progress in brain research.

[22]  Nenghai Yu,et al.  Visual language modeling for image classification , 2007, MIR '07.

[23]  Nenghai Yu,et al.  Query oriented subspace shifting for near-duplicate image detection , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[24]  Rong Yan,et al.  Semantic concept-based query expansion and re-ranking for multimedia retrieval , 2007, ACM Multimedia.

[25]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[26]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Edward Y. Chang,et al.  Active Learning for Interactive Multimedia Retrieval , 2008, Proceedings of the IEEE.

[28]  Nenghai Yu,et al.  Learning to tag , 2009, WWW '09.

[29]  Jiebo Luo,et al.  Annotating photo collections by label propagation according to multiple similarity cues , 2008, ACM Multimedia.

[30]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[31]  Bin Wang,et al.  Dual cross-media relevance model for image annotation , 2007, ACM Multimedia.

[32]  Arif Ghafoor,et al.  Semantic Modeling and Knowledge Representation in Multimedia Databases , 1999, IEEE Trans. Knowl. Data Eng..

[33]  Fei-Fei Li,et al.  Building and using a semantivisual image hierarchy , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34]  Graeme Hirst,et al.  Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .

[35]  John Riedl,et al.  tagging, communities, vocabulary, evolution , 2006, CSCW '06.

[36]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[37]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[38]  Mor Naaman,et al.  How flickr helps us make sense of the world: context and content in community-contributed media collections , 2007, ACM Multimedia.

[39]  Roelof van Zwol,et al.  Flickr tag recommendation based on collective knowledge , 2008, WWW.