Automatic Image Tagging

Automatic Image Tagging seeks to assign relevant words (e.g. “jungle”, “boat”, “trees”) to images that describe the actual content found in the images without intermediate manual labelling. Current approaches are largely based on categorization, and treat the tags independently, so an annotation (jungle,trees) is just as plausible as (jungle,snow). The goal of this dissertation was to develop a probabilistic model (the Continuous Relevance Model) to take into account the dependencies between keywords so as to provide more precise annotations. The main findings suggest that, under certain conditions, taking into account keyword correlation, coupled with an efficient method (beam search) to search over sets of tags is an effective method to increase annotation accuracy.

[1]  Stefan M. Rüger,et al.  Automated Image Annotation Using Global Features and Robust Nonparametric Density Estimation , 2005, CIVR.

[2]  Jitendra Malik,et al.  Blobworld: A System for Region-Based Image Indexing and Retrieval , 1999, VISUAL.

[3]  Xiaojun Qi,et al.  Incorporating multiple SVMs for automatic image annotation , 2007, Pattern Recognit..

[4]  Alejandro Pazos Sierra,et al.  Encyclopedia of Artificial Intelligence , 2008 .

[5]  Vladimir Pavlovic,et al.  A New Baseline for Image Annotation , 2008, ECCV.

[6]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[7]  Christopher Chute,et al.  The Diverse and Exploding Digital Universe , 2011 .

[8]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[9]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[10]  Yong Wang,et al.  Refining image annotation using contextual relations between words , 2007, CIVR '07.

[11]  Eero Sormunen,et al.  End-User Searching Challenges Indexing Practices in the Digital Newspaper Photo Archive , 2004, Information Retrieval.

[12]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  K. Ramchandran,et al.  A factor graph framework for semantic indexing and retrieval in video , 2000, 2000 Proceedings Workshop on Content-based Access of Image and Video Libraries.

[14]  Michael Brady,et al.  Saliency, Scale and Image Description , 2001, International Journal of Computer Vision.

[15]  Sanjeev Khudanpur,et al.  Hidden Markov models for automatic annotation and content-based retrieval of images and video , 2005, SIGIR '05.

[16]  Patrick Haffner,et al.  Support vector machines for histogram-based image classification , 1999, IEEE Trans. Neural Networks.

[17]  Jing Huang,et al.  An automatic hierarchical image classification scheme , 1998, MULTIMEDIA '98.

[18]  Daniel Gatica-Perez,et al.  PLSA-based image auto-annotation: constraining the latent space , 2004, MULTIMEDIA '04.

[19]  Tao Mei,et al.  Correlative multi-label video annotation , 2007, ACM Multimedia.

[20]  Paul H. Lewis,et al.  Automatic Annotation of Images from the Practitioner Perspective , 2005, CIVR.

[21]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[23]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[24]  Jonathon S. Hare,et al.  Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and Bottom-up approaches , 2006 .

[25]  Gang Wang,et al.  Building text features for object image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[27]  Stuart C. Shapiro,et al.  Encyclopedia of artificial intelligence, vols. 1 and 2 (2nd ed.) , 1992 .

[28]  Wei-Ying Ma,et al.  An adaptive graph model for automatic image annotation , 2006, MIR '06.

[29]  Nenghai Yu,et al.  Image Annotation in a Progressive Way , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[30]  Christos Faloutsos,et al.  Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).

[31]  R. Manmatha,et al.  A discrete direct retrieval model for image and video retrieval , 2008, CIVR '08.

[32]  Rong Jin,et al.  Correlated Label Propagation with Application to Multi-label Learning , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[33]  Pinar Duygulu Sahin,et al.  Joint visual-text modeling for automatic retrieval of multimedia documents , 2005, ACM Multimedia.

[34]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  David Furcy,et al.  Limited Discrepancy Beam Search , 2005, IJCAI.

[36]  Jorge M. S. Valente,et al.  Filtered and recovering beam search algorithms for the early/tardy scheduling problem with no idle time , 2005, Comput. Ind. Eng..

[37]  Eric A. Hansen,et al.  Beam-Stack Search: Integrating Backtracking with Beam Search , 2005, ICAPS.

[38]  Luo Si,et al.  Effective automatic image annotation via a coherent language model and active learning , 2004, MULTIMEDIA '04.

[39]  Qi Zhang,et al.  Automatic image annotation by an iterative approach: incorporating keyword correlations and region matching , 2007, CIVR '07.

[40]  R. Manmatha,et al.  Statistical models for automatic video annotation and retrieval , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[41]  Anil K. Jain,et al.  A Multichannel Approach to Fingerprint Classification , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[42]  Jean Ponce,et al.  Computer Vision: A Modern Approach , 2002 .

[43]  Milan Sonka,et al.  Image Processing, Analysis and Machine Vision , 1993, Springer US.

[44]  Latifur Khan,et al.  Image annotations by combining multiple evidence & wordNet , 2005, ACM Multimedia.

[45]  Robert Wing Pong Luk,et al.  A Generative Theory of Relevance , 2008, The Information Retrieval Series.

[46]  Thierry Pun,et al.  The Truth about Corel - Evaluation in Image Retrieval , 2002, CIVR.

[47]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[48]  R. Manmatha,et al.  Using Maximum Entropy for Automatic Image Annotation , 2004, CIVR.

[49]  Y. Mori,et al.  Image-to-word transformation based on dividing and vector quantizing images with words , 1999 .

[50]  Derek Hoiem,et al.  Building text features for object image classification , 2009, CVPR.

[51]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[52]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[53]  Jiayu Tang Automatic image annotation and object detection , 2008 .

[54]  Pietro Perona,et al.  Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[55]  Hermann Ney,et al.  Word Reordering and a Dynamic Programming Beam Search Algorithm for Statistical Machine Translation , 2003, CL.

[56]  Dan I. Moldovan,et al.  Exploiting ontologies for automatic image annotation , 2005, SIGIR '05.

[57]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[58]  Changhu Wang,et al.  Content-Based Image Annotation Refinement , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Gertjan J. Burghouts,et al.  Performance evaluation of local colour invariants , 2009, Comput. Vis. Image Underst..

[60]  Steven Michael Rubin,et al.  The argos image understanding system. , 1978 .

[61]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[62]  Mor Naaman,et al.  Why we tag: motivations for annotation in mobile and online media , 2007, CHI.

[63]  Andreas Nürnberger,et al.  Automatic Image Annotation Using a Visual Dictionary Based on Reliable Image Segmentation , 2007, Adaptive Multimedia Retrieval.

[64]  Hermann Ney,et al.  Improvements in beam search for 10000-word continuous-speech recognition , 1994, IEEE Trans. Speech Audio Process..