Enhancing semantic and geographic annotation of web images via logistic canonical correlation regression

Photo community sites such as Flickr and Picasa Web Album host a massive amount of personal photos with millions of new photos uploaded every month. These photos constitute an overwhelming source of images that require effective management. There is an increasingly imperative need for semantic annotation of these web images. This paper addresses the problem by considering two kinds of annotation: semantic annotation and geographic annotation. Both are useful for image search and retrieval and for facilitating communities and social networks. This paper proposes a novel method of Logistic Canonical Correlation Regression (LCCR) for the annotation task. This model exploits the canonical correlation between heterogeneous features and an annotation lexicon of interest, and builds a generalized annotation engine based on canonical correlations in order to produce enhanced annotation for web images. We validate the effectiveness of our algorithm using a dataset of over 380,000 images tagged with GPS coordinates.

[1]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[2]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[3]  Ian H. Witten,et al.  WEKA: a machine learning workbench , 1994, Proceedings of ANZIIS '94 - Australian New Zealnd Intelligent Information Systems Conference.

[4]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[6]  Shih-Fu Chang,et al.  Visually Searching the Web for Content , 1997, IEEE Multim..

[7]  Dorin Comaniciu,et al.  Mean shift analysis and applications , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[8]  Colin Fyfe,et al.  Kernel and Nonlinear Canonical Correlation Analysis , 2000, IJCNN.

[9]  Nello Cristianini,et al.  Inferring a Semantic Representation of Text via Cross-Language Correlation Analysis , 2002, NIPS.

[10]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[11]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[12]  W. Zheng,et al.  Facial expression recognition using kernel canonical correlation analysis (KCCA) , 2006, IEEE Transactions on Neural Networks.

[13]  Mor Naaman,et al.  Generating summaries and visualization for large collections of geo-referenced photographs , 2006, MIR '06.

[14]  Josef Kittler,et al.  Discriminative Learning and Recognition of Image Set Classes Using Canonical Correlations , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Mor Naaman,et al.  How flickr helps us make sense of the world: context and content in community-contributed media collections , 2007, ACM Multimedia.

[16]  Changhu Wang,et al.  Learning to reduce the semantic gap in web image retrieval and annotation , 2008, SIGIR '08.

[17]  Jiebo Luo,et al.  Event recognition: viewing the world with a third eye , 2008, ACM Multimedia.

[18]  Jiebo Luo,et al.  Annotating photo collections by label propagation according to multiple similarity cues , 2008, ACM Multimedia.

[19]  Nenghai Yu,et al.  Flickr distance , 2008, ACM Multimedia.

[20]  Yanxi Liu,et al.  Detecting and matching repeated patterns for automatic geo-tagging in urban environments , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[22]  Yun Fu,et al.  Multiple feature fusion by subspace learning , 2008, CIVR '08.

[23]  Shumeet Baluja,et al.  Pagerank for product image search , 2008, WWW.

[24]  Jiebo Luo,et al.  Leveraging probabilistic season and location context models for scene understanding , 2008, CIVR '08.

[25]  Luc Van Gool,et al.  World-scale mining of objects and events from community photo collections , 2008, CIVR '08.

[26]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Nenghai Yu,et al.  Annotating personal albums via web mining , 2008, ACM Multimedia.

[28]  David A. Forsyth,et al.  Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[29]  Wei-Ying Ma,et al.  Annotating Images by Mining Image Search Results , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Kilian Q. Weinberger,et al.  Resolving tag ambiguity , 2008, ACM Multimedia.

[31]  Jon M. Kleinberg,et al.  Mapping the world's photos , 2009, WWW '09.