Evaluation of dimensionality reduction methods for image auto-annotation

Image auto-annotation is a challenging task in computer vision. The goal of this task is to predict multiple words for generic images automatically. Recent state-of-theart methods are based on a non-parametric approach that uses several visual features to calculate distances between image samples. While this approach is successful from the viewpoint of annotation accuracy, the computational costs, in terms of both complexity and memory use, tend to be high, since non-parametric methods require many training instances to be stored in memory to compute distances from a query. In this paper, we investigate several linear dimensionality reduction methods for efficient image annotation. Using the additional information provided by multiple labels, we can obtain a small representation preserving (and hopefully improving) the semantic distance of a visual feature. Linear methods are computationally reasonable and are suitable for practical large-scale systems, although only limited comparison of such methods is available in this research field. Extensive experiments and analyses on various datasets and visual features show how these simple methods can be applied effectively to image annotation.

[1]  Antonio Torralba,et al.  Small codes and large image databases for recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[3]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[4]  Jing Liu,et al.  Image annotation via graph learning , 2009, Pattern Recognit..

[5]  John Shawe-Taylor,et al.  A Correlation Approach for Automatic Image Annotation , 2006, ADMA.

[6]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[7]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[8]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  H. Knutsson,et al.  A Unified Approach to PCA, PLS, MLR and CCA , 1997 .

[10]  Wei-Ying Ma,et al.  An adaptive graph model for automatic image annotation , 2006, MIR '06.

[11]  Kristen Grauman,et al.  Kernelized locality-sensitive hashing for scalable image search , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[12]  Prateek Jain,et al.  Fast Similarity Search for Learned Metrics , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Michael I. Jordan,et al.  A Probabilistic Interpretation of Canonical Correlation Analysis , 2005 .

[14]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[15]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[16]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[17]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[18]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[19]  Yasuo Kuniyoshi,et al.  Canonical contextual distance for large-scale image annotation and retrieval , 2009, LS-MMRM '09.

[20]  Cordelia Schmid,et al.  Coloring Local Feature Extraction , 2006, ECCV.

[21]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[22]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[23]  Raimondo Schettini,et al.  Image annotation using SVM , 2003, IS&T/SPIE Electronic Imaging.

[24]  Edward Y. Chang,et al.  CBSA: content-based soft annotation for multimodal image retrieval using Bayes point machines , 2003, IEEE Trans. Circuits Syst. Video Technol..

[25]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[26]  Larry S. Davis,et al.  Human detection using partial least squares analysis , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[27]  Christoph H. Lampert,et al.  Correlational spectral clustering , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[29]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[30]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[31]  Vasant Honavar,et al.  Multiple label prediction for image annotation with multiple Kernel correlation models , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[32]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[33]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[34]  Vladimir Pavlovic,et al.  A New Baseline for Image Annotation , 2008, ECCV.

[35]  Daniel Gatica-Perez,et al.  PLSA-based image auto-annotation: constraining the latent space , 2004, MULTIMEDIA '04.