Using Statistics to Search and Annotate Pictures: an Evaluation of Semantic Image Annotation and Retrieval on Large Databases

We present the results of an extensive experimental evaluation of the supervised multi-class labeling (SML) model for semantic image annotation proposed by [16]. We test the robustness of this model to various parameters, and its scalability in both database and vocabulary size. The results of this study complement previous evaluations by [12], and [16], which were limited to smaller databases and vocabularies. We further compare the performance of SML to that of a model that explicitly trades off retrieval performance for scalability: the supervised category-based labeling (SCBL) model of [14]. This establishes a unifying view of the performance of two classes of labeling and retrieval systems that were previously only evaluated under different experimental protocols. This unification simplifies the evaluation of future proposals for semantic labeling and retrieval.

[1]  Anil K. Jain,et al.  On image classification: city vs. landscape , 1998, Proceedings. IEEE Workshop on Content-Based Access of Image and Video Libraries (Cat. No.98EX173).

[2]  Martin Szummer,et al.  Indoor-outdoor image classification , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[3]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Nando de Freitas,et al.  A Statistical Model for General Contextual Object Recognition , 2004, ECCV.

[5]  Nuno Vasconcelos Image indexing with mixture hierarchies , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[6]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  David A. Forsyth,et al.  Body plans , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[9]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[10]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[11]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[12]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[13]  Ames StreetCambridge Digital Libraries: Meeting Place for High-level and Low-level Vision , 2007 .

[14]  N. Haering,et al.  Locating deciduous trees , 1997, 1997 Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries.

[15]  Yi Li,et al.  Consistent line clusters for building recognition in CBIR , 2002, Object recognition supported by user interaction for service robots.

[16]  Gustavo Carneiro,et al.  Formulating semantic image annotation as a supervised learning problem , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17]  Gustavo Carneiro,et al.  A database centric view of semantic image annotation and retrieval , 2005, SIGIR '05.

[18]  Nando de Freitas,et al.  A Constrained Semi-supervised Learning Approach to Data Association , 2004, ECCV.

[19]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[20]  David G. Stork,et al.  Pattern Classification , 1973 .