Supervised Learning of Semantic Classes for Image Annotation and Retrieval

A probabilistic formulation for semantic image annotation and retrieval is proposed. Annotation and retrieval are posed as classification problems where each class is defined as the group of database images labeled with a common semantic label. It is shown that, by establishing this one-to-one correspondence between semantic labels and semantic classes, a minimum probability of error annotation and retrieval are feasible with algorithms that are 1) conceptually simple, 2) computationally efficient, and 3) do not require prior semantic segmentation of training images. In particular, images are represented as bags of localized feature vectors, a mixture density estimated for each image, and the mixtures associated with all images annotated with a common semantic label pooled into a density estimate for the corresponding semantic class. This pooling is justified by a multiple instance learning argument and performed efficiently with a hierarchical extension of expectation-maximization. The benefits of the supervised formulation over the more complex, and currently popular, joint modeling of semantic label and visual feature distributions are illustrated through theoretical arguments and extensive experiments. The supervised formulation is shown to achieve higher accuracy than various previously published methods at a fraction of their computational cost. Finally, the proposed method is shown to be fairly robust to parameter tuning

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  Aviezri S. Fraenkel,et al.  Local Feedback in Full-Text Retrieval Systems , 1977, JACM.

[3]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[4]  Christos Faloutsos,et al.  QBIC project: querying images by content, using color, texture, and shape , 1993, Electronic Imaging.

[5]  Rosalind W. Picard Digital Libraries: Meeting Place for Low-Level And High-Level Vision , 1995, ACCV.

[6]  Markus A. Stricker,et al.  Similarity of color images , 1995, Electronic Imaging.

[7]  James Lee Hafner,et al.  Efficient Color Histogram Indexing for Quadratic Form Distance Functions , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Rosalind W. Picard Light-years from Lena: video and image libraries of the future , 1995, Proceedings., International Conference on Image Processing.

[9]  B. S. Manjunath,et al.  Texture Features for Browsing and Retrieval of Image Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Anil K. Jain,et al.  Image retrieval using color and shape , 1996, Pattern Recognit..

[11]  Fang Liu,et al.  Periodicity, Directionality, and Randomness: Wold Features for Image Modeling and Retrieval , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Paul A. Viola,et al.  Structure Driven Image Database Retrieval , 1997, NIPS.

[13]  R. Manmatha,et al.  Syntactic characterization of appearance and its application to image retrieval , 1997, Electronic Imaging.

[14]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[15]  Shih-Fu Chang,et al.  Visually Searching the Web for Content , 1997, IEEE Multim..

[16]  David A. Forsyth,et al.  Body plans , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  N. Haering,et al.  Locating deciduous trees , 1997, 1997 Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries.

[18]  Tomás Lozano-Pérez,et al.  A Framework for Multiple-Instance Learning , 1997, NIPS.

[19]  Nuno Vasconcelos,et al.  Library-based coding: a representation for efficient video compression and retrieval , 1997, Proceedings DCC '97. Data Compression Conference.

[20]  Peter Auer,et al.  On Learning From Multi-Instance Examples: Empirical Evaluation of a Theoretical Approach , 1997, ICML.

[21]  Shih-Fu Chang,et al.  VisualSEEk: a fully automated content-based image query system , 1997, MULTIMEDIA '96.

[22]  Martin Szummer,et al.  Indoor-outdoor image classification , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[23]  Oded Maron,et al.  Multiple-Instance Learning for Natural Scene Classification , 1998, ICML.

[24]  Leonidas J. Guibas,et al.  A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[25]  Anil K. Jain,et al.  On image classification: city images vs. landscapes , 1998, Pattern Recognit..

[26]  Anil K. Jain,et al.  On image classification: city vs. landscape , 1998, Proceedings. IEEE Workshop on Content-Based Access of Image and Video Libraries (Cat. No.98EX173).

[27]  Y. Mori,et al.  Image-to-word transformation based on dividing and vector quantizing images with words , 1999 .

[28]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[30]  Murat Kunt,et al.  Content-based retrieval from image databases: current solutions and future directions , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[31]  Nuno Vasconcelos,et al.  Image indexing with mixture hierarchies , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[32]  Yi Li,et al.  Consistent line clusters for building recognition in CBIR , 2002, Object recognition supported by user interaction for service robots.

[33]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[34]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[35]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[36]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Alex Pentland,et al.  Photobook: Content-based manipulation of image databases , 1996, International Journal of Computer Vision.

[38]  Nuno Vasconcelos,et al.  Scalable discriminant feature selection for image retrieval and recognition , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[39]  Nando de Freitas,et al.  A Statistical Model for General Contextual Object Recognition , 2004, ECCV.

[40]  Christos Faloutsos,et al.  GCap: Graph-based Automatic Image Captioning , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[41]  Jing Huang,et al.  Spatial Color Indexing and Applications , 2004, International Journal of Computer Vision.

[42]  Adam Tauman Kalai,et al.  A Note on Learning from Multiple-Instance Examples , 2004, Machine Learning.

[43]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[44]  Nando de Freitas,et al.  A Constrained Semi-supervised Learning Approach to Data Association , 2004, ECCV.

[45]  Nuno Vasconcelos,et al.  Minimum probability of error image retrieval , 2012, IEEE Transactions on Signal Processing.

[46]  Gustavo Carneiro,et al.  A database centric view of semantic image annotation and retrieval , 2005, SIGIR '05.

[47]  Gustavo Carneiro,et al.  Formulating semantic image annotation as a supervised learning problem , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[48]  Ames StreetCambridge Digital Libraries: Meeting Place for High-level and Low-level Vision , 2007 .