A Correlation Approach for Automatic Image Annotation

The automatic annotation of images presents a particularly complex problem for machine learning researchers. In this work we experiment with semantic models and multi-class learning for the automatic annotation of query images. We represent the images using scale invariant transformation descriptors in order to account for similar objects appearing at slightly different scales and transformations. The resulting descriptors are utilised as visual terms for each image. We first aim to annotate query images by retrieving images that are similar to the query image. This approach uses the analogy that similar images would be annotated similarly as well. We then propose an image annotation method that learns a direct mapping from image descriptors to keywords. We compare the semantic based methods of Latent Semantic Indexing and Kernel Canonical Correlation Analysis (KCCA), as well as using a recently proposed vector label based learning method known as Maximum Margin Robot.

[1]  Jonathon S. Hare,et al.  Saliency-based Models of Image Content and their Application to Auto-Annotation by Semantic Propagation , 2005 .

[2]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[3]  Cordelia Schmid,et al.  Indexing Based on Scale Invariant Interest Points , 2001, ICCV.

[4]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[5]  Rong Yan,et al.  Mining Associated Text and Images with Dual-Wing Harmoniums , 2005, UAI.

[6]  Nicu Sebe,et al.  Evaluation of Salient Point Techniques , 2002, CIVR.

[7]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[8]  Jonathon S. Hare,et al.  On Image Retrieval Using Salient Regions with Vector-Spaces and Latent Semantics , 2005, CIVR.

[9]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[10]  Juho Rousu,et al.  Learning hierarchical multi-category text classification models , 2005, ICML.

[11]  Wei-Ying Ma,et al.  Image and Video Retrieval , 2003, Lecture Notes in Computer Science.

[12]  David R. Hardoon,et al.  Semantic models for machine learning , 2006 .

[13]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[14]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[15]  Christos Faloutsos,et al.  GCap: Graph-based Automatic Image Captioning , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[16]  John Shawe-Taylor,et al.  Two view learning: SVM-2K, Theory and Practice , 2005, NIPS.

[17]  Daniel Gatica-Perez,et al.  On image auto-annotation with latent space models , 2003, ACM Multimedia.

[18]  Colin Fyfe,et al.  Kernel and Nonlinear Canonical Correlation Analysis , 2000, IJCNN.