Automatic Image Annotation Using Random Projection in a Conceptual Space Induced from Data

The main drawback of a detailed representation of visual content, whatever is its origin, is that significant features are very high dimensional. To keep the problem tractable while preserving the semantic content, a dimensionality reduction of the data is needed. We propose the Random Projection techniques to reduce the dimensionality. Even though this technique is sub-optimal with respect to Singular Value Decomposition its much lower computational cost make it more suitable for this problem and in particular when computational resources are limited such as in mobile terminals. In this paper we present the use of a “conceptual” space, automatically induced from data, to perform automatic image annotation. Images are represented by visual features based on color and texture and arranged as histograms of visual terms and bigrams to partially preserve the spatial information [1]. Using a set of annotated images as training data, the matrix of visual features is built and dimensionality reduction is performed using the Random Projection algorithm. A new unannotated image is then projected into the dimensionally reduced space and the labels of the closest training images are assigned to the unannotated image itself. Experiments on large real collection of images showed that the approach, despite of its low computational cost, is very effective.

[1]  Silvia Coradeschi,et al.  Perceptual anchoring via conceptual spaces , 2004, AAAI 2004.

[2]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[3]  Hua Wang,et al.  Multi-label image annotation via Maximum Consistency , 2010, 2010 IEEE International Conference on Image Processing.

[4]  Giovanni Pilato,et al.  A Conceptual Probabilistic Model for the Induction of Image Semantics , 2010, 2010 IEEE Fourth International Conference on Semantic Computing.

[5]  Ramin Zabih,et al.  Comparing images using color coherence vectors , 1997, MULTIMEDIA '96.

[6]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[7]  Y. Mori,et al.  Image-to-word transformation based on dividing and vector quantizing images with words , 1999 .

[8]  Qian Zhang,et al.  A survey and analysis on automatic image annotation , 2018, Pattern Recognit..

[9]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[10]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[11]  Hideyuki Tamura,et al.  Textural Features Corresponding to Visual Perception , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  R. Manmatha,et al.  A discrete direct retrieval model for image and video retrieval , 2008, CIVR '08.

[14]  Heikki Mannila,et al.  Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[15]  Jiajun Wu,et al.  Deep multiple instance learning for image classification and auto-annotation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Giovanni Pilato,et al.  TSVD as a Statistical Estimator in the Latent Semantic Analysis Paradigm , 2015, IEEE Transactions on Emerging Topics in Computing.

[17]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[18]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, CVPR 2004.

[19]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Stefan M. Rüger,et al.  Evaluation of Texture Features for Content-Based Image Retrieval , 2004, CIVR.

[21]  Saso Dzeroski,et al.  Hierarchical annotation of medical images , 2011, Pattern Recognit..

[22]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[23]  Ying He,et al.  Mining social images with distance metric learning for automated image tagging , 2011, WSDM '11.

[24]  Salvatore Gaglio,et al.  Anchoring symbols to conceptual spaces: the case of dynamic scenarios , 2003, Robotics Auton. Syst..

[25]  Daniel Gatica-Perez,et al.  PLSA-based image auto-annotation: constraining the latent space , 2004, MULTIMEDIA '04.

[26]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[27]  E.J. Candes Compressive Sampling , 2022 .

[28]  C. V. Jawahar,et al.  Image Annotation Using Metric Learning in Semantic Neighbourhoods , 2012, ECCV.

[29]  Seth Pettie,et al.  Mind the gap , 2006, Nature Reviews Drug Discovery.

[30]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[31]  David Salesin,et al.  Fast multiresolution image querying , 1995, SIGGRAPH.

[32]  Anil K. Jain,et al.  Unsupervised texture segmentation using Gabor filters , 1990, 1990 IEEE International Conference on Systems, Man, and Cybernetics Conference Proceedings.

[33]  Feng Xu,et al.  Evaluation and comparison of texture descriptors proposed in MPEG-7 , 2006, J. Vis. Commun. Image Represent..

[34]  M.,et al.  Statistical and Structural Approaches to Texture , 2022 .

[35]  Dimitris Achlioptas,et al.  Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..

[36]  Alessandro Saffiotti,et al.  An introduction to the anchoring problem , 2003, Robotics Auton. Syst..

[37]  Chihli Hung and Chih-Fong Tsai,et al.  Automatically Annotating Images with Keywords: A Review of Image Annotation Systems , 2008 .

[38]  Zhiwu Lu,et al.  Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation , 2017, IEEE Transactions on Image Processing.

[39]  Nenghai Yu,et al.  Distance metric learning from uncertain side information with application to automated photo tagging , 2009, ACM Multimedia.

[40]  Thomas Sikora,et al.  The MPEG-7 visual standard for content description-an overview , 2001, IEEE Trans. Circuits Syst. Video Technol..

[41]  Chin-Hui Lee,et al.  Boosting of Maximal Figure of Merit Classifiers for Automatic Image Annotation , 2007, 2007 IEEE International Conference on Image Processing.

[42]  Subhransu Maji,et al.  Automatic Image Annotation using Deep Learning Representations , 2015, ICMR.