A feature-word-topic model for image annotation and retrieval

Image annotation is a process of finding appropriate semantic labels for images in order to obtain a more convenient way for indexing and searching images on the Web. This article proposes a novel method for image annotation based on combining feature-word distributions, which map from visual space to word space, and word-topic distributions, which form a structure to capture label relationships for annotation. We refer to this type of model as Feature-Word-Topic models. The introduction of topics allows us to efficiently take word associations, such as {ocean, fish, coral} or {desert, sand, cactus}, into account for image annotation. Unlike previous topic-based methods, we do not consider topics as joint distributions of words and visual features, but as distributions of words only. Feature-word distributions are utilized to define weights in computation of topic distributions for annotation. By doing so, topic models in text mining can be applied directly in our method. Our Feature-word-topic model, which exploits Gaussian Mixtures for feature-word distributions, and probabilistic Latent Semantic Analysis (pLSA) for word-topic distributions, shows that our method is able to obtain promising results in image annotation and retrieval.

[1]  R. Lienhart,et al.  Continuous visual vocabulary modelsfor pLSA-based scene recognition , 2008, CIVR '08.

[2]  Daniel Gatica-Perez,et al.  Modeling Semantic Aspects for Cross-Media Image Indexing , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Marcel Worring,et al.  Concept-Based Video Retrieval , 2009, Found. Trends Inf. Retr..

[4]  Kun Zhang,et al.  Multi-label learning by exploiting label dependency , 2010, KDD.

[5]  Chong Wang,et al.  Simultaneous image classification and annotation , 2009, CVPR.

[6]  Bin Wang,et al.  A graph-based image annotation framework , 2008, Pattern Recognit. Lett..

[7]  Hsuan-Tien Lin,et al.  A note on Platt’s probabilistic outputs for support vector machines , 2007, Machine Learning.

[8]  Susumu Horiguchi,et al.  A Hidden Topic-Based Framework toward Building Applications with Short Web Documents , 2011, IEEE Transactions on Knowledge and Data Engineering.

[9]  Yuhong Guo,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Multi-Label Classification Using Conditional Dependency Networks , 2022 .

[10]  Luo Si,et al.  Effective automatic image annotation via a coherent language model and active learning , 2004, MULTIMEDIA '04.

[11]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[12]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, CVPR 2004.

[13]  Tao Mei,et al.  Joint multi-label multi-instance learning for image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Stefanie Nowak,et al.  The CLEF 2011 Photo Annotation and Concept-based Retrieval Tasks , 2011, CLEF.

[15]  Marina Bosch,et al.  ImageCLEF, Experimental Evaluation in Visual Information Retrieval , 2010 .

[16]  J. Jeon,et al.  Automatic Image Annotation of News Images with Large Vocabularies and Low Quality Training Data , 2004 .

[17]  Natsuda Kaothanthong,et al.  A feature-word-topic model for image annotation , 2010, CIKM '10.

[18]  Andrew McCallum,et al.  Collective multi-label classification , 2005, CIKM '05.

[19]  Susumu Horiguchi,et al.  Learning to classify short and sparse text & web with hidden topics from large-scale data collections , 2008, WWW.

[20]  Vladimir Pavlovic,et al.  A New Baseline for Image Annotation , 2008, ECCV.

[21]  Zhongfei Zhang,et al.  Multimedia Data Mining , 2010, Data Mining and Knowledge Discovery Handbook.

[22]  Nuno Vasconcelos,et al.  Image indexing with mixture hierarchies , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[23]  Razvan C. Bunescu,et al.  Multiple instance learning for sparse positive bags , 2007, ICML '07.

[24]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Yong Wang,et al.  Refining image annotation using contextual relations between words , 2007, CIVR '07.

[26]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[27]  Latifur Khan,et al.  Image annotations by combining multiple evidence & wordNet , 2005, ACM Multimedia.

[28]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[29]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[30]  Rainer Lienhart,et al.  Multilayer pLSA for multimodal image retrieval , 2009, CIVR '09.

[31]  Jonathon S. Hare,et al.  Semantic spaces revisited: investigating the performance of auto-annotation and semantic retrieval using semantic spaces , 2008, CIVR '08.

[32]  Chong Wang,et al.  Simultaneous image classification and annotation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[34]  Susumu Horiguchi,et al.  Web Search Clustering and Labeling with Hidden Topics , 2009, TALIP.

[35]  Rainer Lienhart,et al.  Image retrieval on large-scale image databases , 2007, CIVR '07.

[36]  Tao Mei,et al.  Correlative multi-label video annotation , 2007, ACM Multimedia.

[37]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, ICDM.

[38]  Antonio Torralba,et al.  Using the forest to see the trees: exploiting context for visual object detection and localization , 2010, CACM.

[39]  Hermann Ney,et al.  Features for image retrieval: an experimental comparison , 2008, Information Retrieval.

[40]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[41]  Joemon M. Jose,et al.  Bayesian Mixture Hierarchies for Automatic Image Annotation , 2009, ECIR.

[42]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[43]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[44]  Paul Clough,et al.  ImageCLEF: Experimental Evaluation in Visual Information Retrieval , 2010 .

[45]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[46]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[47]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[48]  Zhi-Hua Zhou,et al.  Multi-Instance Multi-Label Learning with Application to Scene Classification , 2006, NIPS.

[49]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[51]  Daniel Gatica-Perez,et al.  PLSA-based image auto-annotation: constraining the latent space , 2004, MULTIMEDIA '04.

[52]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.