Style modeling for tagging personal photo collections

While current image annotation methods treat each input image individually, users in practice tend to take multiple pictures at the same location, with the same setup, or over the same trip, such that the images to be labeled come in groups sharing a coherent "style". We present an approach for annotating such style-consistent batches of pictures. The method is inspired by previous work in handwriting recognition and models style as a latent random variable. For each style, a separate image annotation model is learned. When annotating a batch of images, style is inferred using maximum likelihood over the whole batch, and the style-specific model is used for an accurate tagging. In quantitative experiments on the COREL dataset and real-world photo stock downloaded from Flickr, we demonstrate that -- by making use of the additional information that images come in style-consistent groups -- our approach outperforms several baselines that tag images individually. Relative performance improvements of up to 80% are achieved, and on the COREL-5K benchmark the proposed method gives a mean recall/precision of 39%/25%, which is the best result reported to date.

[1]  Andrew Zisserman,et al.  Video Google: Efficient Visual Search of Videos , 2006, Toward Category-Level Object Recognition.

[2]  Daniel Gatica-Perez,et al.  Analyzing Flickr groups , 2008, CIVR '08.

[3]  George Nagy,et al.  Self-correcting 100-font classifier , 1994, Electronic Imaging.

[4]  George Nagy,et al.  Style consistent classification of isogenous patterns , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[6]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[7]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[8]  Jiayu Tang,et al.  A Study of Quality Issues for Image Auto-Annotation With the Corel Dataset , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Nando de Freitas,et al.  A Constrained Semi-supervised Learning Approach to Data Association , 2004, ECCV.

[10]  Y. Mori,et al.  Image-to-word transformation based on dividing and vector quantizing images with words , 1999 .

[11]  Thomas M. Breuel,et al.  Classification Using a Hierarchical Bayesian Approach , 2002, ICPR.

[12]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  James Ze Wang,et al.  Real-Time Computerized Annotation of Pictures , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Carman Neustaedter,et al.  Image annotation using personal calendars as context , 2008, ACM Multimedia.

[15]  Daniel Gatica-Perez,et al.  PLSA-based image auto-annotation: constraining the latent space , 2004, MULTIMEDIA '04.

[16]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[17]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[18]  Alessandro Perina,et al.  Geo-located image analysis using latent representations , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Jiebo Luo,et al.  Annotating collections of photos using hierarchical event and scene models , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[21]  Farshad Fotouhi,et al.  Region based image annotation through multiple-instance learning , 2005, MULTIMEDIA '05.

[22]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[23]  Stefan Rüger,et al.  Can a probabilistic image annotation system be improved using a co-occurrence approach? , 2008 .

[24]  Charles Elkan,et al.  Using the Triangle Inequality to Accelerate k-Means , 2003, ICML.

[25]  Mor Naaman,et al.  Leveraging context to resolve identity in photo albums , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[26]  Jean-Marc Odobez,et al.  A Thousand Words in a Scene , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.